PP-2009-50: A syllable frequency list for Dutch

PP-2009-50: Zuidema, Willem (2009) A syllable frequency list for Dutch. [Report]

[thumbnail of Full Text]
Text (Full Text)

Download (203kB) | Preview
[thumbnail of Abstract] Text (Abstract)

Download (1kB)


The Corpus Gesproken Nederlands (CGN) is a large corpus of spoken
Dutch, partly annotated with syntactic and phonological information
(see http://lands.let.kun.nl/cgn/). Although it contains files with
syllabified words, and word frequency counts, there is no direct way to
extract from it a list of syllable frequencies. This document describes
some simple scripts to combine the relevant information from various CGN
files (using version 6 and the linux utilities grep, sed, sort, uniq,
awk, cut and paste), and gives a complete list of syllable frequencies
obtained by running the scripts. The list is made available in the hope
that it might be helpful, for instance for experimental studies where
one must control for syllable frequency. Depending on the intended use
or required level of accuracy, the scripts might have to be adapted and
the frequency counts changed accordingly.

Item Type: Report
Report Nr: PP-2009-50
Series Name: Prepublication (PP) Series
Year: 2009
Uncontrolled Keywords: Dutch; phonology; computational linguistics
Subjects: Computation
Depositing User: Jelle Zuidema
Date Deposited: 12 Oct 2016 14:37
Last Modified: 12 Oct 2016 14:37
URI: https://eprints.illc.uva.nl/id/eprint/379

Actions (login required)

View Item View Item