MoL-2017-01: Dongen, Nina (2017) Analysis and Prediction of Dutch-English Code-switching in Dutch Social Media Messages. [Report]
Preview |
Text
MoL-2017-01.text.pdf Download (459kB) | Preview |
Abstract
Multi-lingual phenomena as code-switching disturb widely used language interpretation tools, while the demand for such tools is rising due to the expanding worldwide popularity of online applications. This study explores code-switching between the lexically strong related languages Dutch and English in Twitter messages. Contrary to similar studies on code-switching, the focus is centred on the occurrence of English words in everyday Dutch, instead of a specific bilingual community. This research covers five main stages. First, a new Twitter corpus is collected of which a subset is manually annotated. Second, linguistic analysis of Dutch-English code-switching is performed. Third, several models are explored to perform a language identification task at word level. Fourth, several models are explored to perform automatic prediction of code-switching at word level. Finally, the best models for both tasks are combined and tested. Results show that multi-language data remain a challenge for computational approaches.
Item Type: | Report |
---|---|
Report Nr: | MoL-2017-01 |
Series Name: | Master of Logic Thesis (MoL) Series |
Year: | 2017 |
Subjects: | Language Logic |
Depositing User: | Dr Marco Vervoort |
Date Deposited: | 02 Mar 2017 14:43 |
Last Modified: | 02 Mar 2017 14:43 |
URI: | https://eprints.illc.uva.nl/id/eprint/1527 |
Actions (login required)
View Item |