PP-2009-50: Zuidema, Willem (2009) A syllable frequency list for Dutch. [Report]
Preview |
Text (Full Text)
PP-2009-50.text.pdf Download (203kB) | Preview |
Text (Abstract)
PP-2009-50.abstract.txt Download (1kB) |
Abstract
The Corpus Gesproken Nederlands (CGN) is a large corpus of spoken
Dutch, partly annotated with syntactic and phonological information
(see http://lands.let.kun.nl/cgn/). Although it contains files with
syllabified words, and word frequency counts, there is no direct way to
extract from it a list of syllable frequencies. This document describes
some simple scripts to combine the relevant information from various CGN
files (using version 6 and the linux utilities grep, sed, sort, uniq,
awk, cut and paste), and gives a complete list of syllable frequencies
obtained by running the scripts. The list is made available in the hope
that it might be helpful, for instance for experimental studies where
one must control for syllable frequency. Depending on the intended use
or required level of accuracy, the scripts might have to be adapted and
the frequency counts changed accordingly.
Item Type: | Report |
---|---|
Report Nr: | PP-2009-50 |
Series Name: | Prepublication (PP) Series |
Year: | 2009 |
Uncontrolled Keywords: | Dutch; phonology; computational linguistics |
Subjects: | Computation |
Depositing User: | Jelle Zuidema |
Date Deposited: | 12 Oct 2016 14:37 |
Last Modified: | 12 Oct 2016 14:37 |
URI: | https://eprints.illc.uva.nl/id/eprint/379 |
Actions (login required)
View Item |