MoL-2023-21: Investigations into Semantic Underspecification in Language Models

MoL-2023-21: Wildenburg, Franciscus Cornelis Lambertus (2023) Investigations into Semantic Underspecification in Language Models. [Report]

[thumbnail of MoL-2023-21.text.pdf] Text
MoL-2023-21.text.pdf - Published Version

Download (870kB)


Several (position) papers have drawn attention to the challenges semantic underspecification may bring to modern language models, yet relatively little research has been done on this topic. We contribute to this area of research by presenting DUST, a dataset of underspecified sentences annotated with their domain of underspecification. Using this dataset and three experiments using prompts, language model perplexity, and diagnostic classifiers, we study the way modern language models process sentences containing semantic underspecification. We find that the ability of language models to recognize underspecification does not correlate with some commonly used metrics for language models, and that a fine-grained approach to underspecification could greatly benefit the research community.

Item Type: Report
Report Nr: MoL-2023-21
Series Name: Master of Logic Thesis (MoL) Series
Year: 2023
Subjects: Language
Depositing User: Dr Marco Vervoort
Date Deposited: 26 Sep 2023 13:09
Last Modified: 26 Sep 2023 13:09

Actions (login required)

View Item View Item