MoL-2017-01: Analysis and Prediction of Dutch-English Code-switching in Dutch Social Media Messages

MoL-2017-01: Dongen, Nina (2017) Analysis and Prediction of Dutch-English Code-switching in Dutch Social Media Messages. [Report]

[img]
Preview
Text
MoL-2017-01.text.pdf

Download (459kB) | Preview

Abstract

Multi-lingual phenomena as code-switching disturb widely used language interpretation tools, while the demand for such tools is rising due to the expanding worldwide popularity of online applications. This study explores code-switching between the lexically strong related languages Dutch and English in Twitter messages. Contrary to similar studies on code-switching, the focus is centred on the occurrence of English words in everyday Dutch, instead of a specific bilingual community. This research covers five main stages. First, a new Twitter corpus is collected of which a subset is manually annotated. Second, linguistic analysis of Dutch-English code-switching is performed. Third, several models are explored to perform a language identification task at word level. Fourth, several models are explored to perform automatic prediction of code-switching at word level. Finally, the best models for both tasks are combined and tested. Results show that multi-language data remain a challenge for computational approaches.

Item Type: Report
Report Nr: MoL-2017-01
Series Name: Master of Logic Thesis (MoL) Series
Year: 2017
Subjects: Language
Logic
Depositing User: Dr Marco Vervoort
Date Deposited: 02 Mar 2017 14:43
Last Modified: 02 Mar 2017 14:43
URI: https://eprints.illc.uva.nl/id/eprint/1527

Actions (login required)

View Item View Item