MoL-2017-01: Analysis and Prediction of Dutch-English Code-switching in Dutch Social Media Messages

MoL-2017-01: Dongen, Nina (2017) Analysis and Prediction of Dutch-English Code-switching in Dutch Social Media Messages. [Report]

Preview

Text
MoL-2017-01.text.pdf
Download (459kB) | Preview

Abstract

Multi-lingual phenomena as code-switching disturb widely used language interpretation tools, while the demand for such tools is rising due to the expanding worldwide popularity of online applications. This study explores code-switching between the lexically strong related languages Dutch and English in Twitter messages. Contrary to similar studies on code-switching, the focus is centred on the occurrence of English words in everyday Dutch, instead of a specific bilingual community. This research covers five main stages. First, a new Twitter corpus is collected of which a subset is manually annotated. Second, linguistic analysis of Dutch-English code-switching is performed. Third, several models are explored to perform a language identification task at word level. Fourth, several models are explored to perform automatic prediction of code-switching at word level. Finally, the best models for both tasks are combined and tested. Results show that multi-language data remain a challenge for computational approaches.

Item Type:	Report
Report Nr:	MoL-2017-01
Series Name:	Master of Logic Thesis (MoL) Series
Year:	2017
Subjects:	Language Logic
Depositing User:	Dr Marco Vervoort
Date Deposited:	02 Mar 2017 14:43
Last Modified:	02 Mar 2017 14:43
URI:	https://eprints.illc.uva.nl/id/eprint/1527

Actions (login required)

View Item