MoL-2017-01: Dongen, Nina (2017) Analysis and Prediction of Dutch-English Code-switching in Dutch Social Media Messages. [Report]
| Preview | Text MoL-2017-01.text.pdf Download (459kB) | Preview | 
Abstract
Multi-lingual phenomena as code-switching disturb widely used language interpretation tools, while the demand for such tools is rising due to the expanding worldwide popularity of online applications. This study explores code-switching between the lexically strong related languages Dutch and English in Twitter messages. Contrary to similar studies on code-switching, the focus is centred on the occurrence of English words in everyday Dutch, instead of a specific bilingual community. This research covers five main stages. First, a new Twitter corpus is collected of which a subset is manually annotated. Second, linguistic analysis of Dutch-English code-switching is performed. Third, several models are explored to perform a language identification task at word level. Fourth, several models are explored to perform automatic prediction of code-switching at word level. Finally, the best models for both tasks are combined and tested. Results show that multi-language data remain a challenge for computational approaches.
| Item Type: | Report | 
|---|---|
| Report Nr: | MoL-2017-01 | 
| Series Name: | Master of Logic Thesis (MoL) Series | 
| Year: | 2017 | 
| Subjects: | Language Logic | 
| Depositing User: | Dr Marco Vervoort | 
| Date Deposited: | 02 Mar 2017 14:43 | 
| Last Modified: | 02 Mar 2017 14:43 | 
| URI: | https://eprints.illc.uva.nl/id/eprint/1527 | 
Actions (login required)
|  | View Item | 
 
        