DS-2025-06: Towards Language Models that benefit us all: Studies on stereotypes, robustness, and values

DS-2025-06: Leidinger, Alina (2025) Towards Language Models that benefit us all: Studies on stereotypes, robustness, and values. Doctoral thesis, Universiteit van Amsterdam.

[thumbnail of DS-2025-06.text.pdf] Text
DS-2025-06.text.pdf - Published Version

Download (8MB)
[thumbnail of Samenvatting] Text (Samenvatting)
DS-2025-06.samenvatting.txt - Other

Download (1kB)

Abstract

As Large Language Models have evolved from single-task solvers to general-purpose chat engines, demarcating their capabilities and harms is posing a significant challenge. Systematic investigation of both is needed as the cornerstone to well-informed policy and technological advancement. In this dissertation, we study stereotypes, robustness and values in Large Language Models (LLMs), drawing on insights from search engine studies, linguistics, formal semantics, logic and philosophy. In Part One, we investigate stereotyping harms in Natural Language Processing systems, namely search autocomplete engines and LLMs, finding uneven safety behaviour across a diverse set of social groups in both cases. These findings lead us to investigate variability in LLM behaviour more broadly in Part Two where we study robustness of LLM capabilities across tasks and for reasoning in particular. Based on our findings, we chart a path towards more holistic evaluation practices for the field of Natural Language Processing. In Part Three, we take steps towards aligning LLMs so that they represent a variety of social groups and speakers of different languages. Firstly, we collect and annotate a multilingual dataset to assess LLM agreement with values across languages. Secondly, we develop a direct alignment approach for LLMs to improve the robustness of alignment across demographics and languages.

Item Type: Thesis (Doctoral)
Report Nr: DS-2025-06
Series Name: ILLC Dissertation (DS) Series
Year: 2025
Subjects: Computation
Language
Depositing User: Dr Marco Vervoort
Date Deposited: 07 Jul 2025 12:16
Last Modified: 28 Aug 2025 13:18
URI: https://eprints.illc.uva.nl/id/eprint/2367

Actions (login required)

View Item View Item