DS-2020-11: Linguistic Variation in Online Communities: A Computational Perspective

DS-2020-11: Tredici, Marco Del (2020) Linguistic Variation in Online Communities: A Computational Perspective. Doctoral thesis, University of Amsterdam.

	Text (Full Text) DS-2020-11.text.pdf Download (5MB)
	Text (Samenvatting) DS-2020-11.samenvatting.txt Download (3kB)

Abstract

Linguistic Variation in Online Communities: A Computational Perspective
Marco Del Tredici

The same word can be used by different people to mean different things. The observed meaning variation is not random, but determined by the social characteristics of the speakers using it. In particular, a crucial factor in determining the observed variation is the community individuals belong to. This thesis investigates meaning variation in online communities of speakers with a twofold goal: providing an empirical account of the phenomenon in online setups, and leveraging it to improve the performance of NLP models.
I build on theoretical frameworks introduced in Linguistics and Sociolinguistics which describe meaning variation in offline communities. To investigate variation using digital data from online communities, I leverage the tools and methodologies developed in the fields of Natural Language Processing and Computational Linguistics.
The thesis consists of two parts. The first part focuses on the general research question: how to identify and represent meaning variation in online communities of speakers? In the second part, I take a task-oriented approach, as I address the question: how can social information be used to improve the performance of NLP models?
Overall, this dissertation presents an extensive study of meaning variation in online communities of speakers, making two main contributions: First, it contributes empirical confirmation of the findings of traditional sociolinguistic studies and provides new theoretical insights about meaning variation in online communities of speakers. Second, it introduces new methodologies which, by leveraging information about the social context where language is produced, help to improve the performance of NLP systems for text classification.

Item Type:	Thesis (Doctoral)
Report Nr:	DS-2020-11
Series Name:	ILLC Dissertation (DS) Series
Year:	2020
Subjects:	Computation Language
Depositing User:	Dr Marco Vervoort
Date Deposited:	14 Jun 2022 15:17
Last Modified:	14 Jun 2022 15:17
URI:	https://eprints.illc.uva.nl/id/eprint/2180

Actions (login required)

View Item