DS-2026-09: Dissecting Incongruity: Metaphor and Humor Understanding of Large Language Models

DS-2026-09: Tong, Xiaoyu (2026) Dissecting Incongruity: Metaphor and Humor Understanding of Large Language Models. Doctoral thesis, University of Amsterdam.

[thumbnail of DS-2026-09.text.pdf] Text
DS-2026-09.text.pdf - Published Version

Download (15MB)
[thumbnail of Samenvatting] Text (Samenvatting)
DS-2026-09.samenvatting.txt - Other

Download (4kB)

Abstract

This thesis investigates the capabilities of large language models (LLMs) with regard to the processing of metaphor and humor. Metaphor and humor are indispensable parts of human cognition and communication, yet they can pose challenges to LLMs. As LLMs enter the lives of people around the world, it is important to know how well the models understand metaphor and humor, and how they can be improved. This thesis studies LLMs' metaphor and humor processing capabilities in the following respects---

**Paraphrasing of linguistic metaphors.** Following prior research on automatic metaphor interpretation, I frame metaphor understanding as a paraphrasing task. I sample sentences containing metaphor use from the VU Amsterdam Metaphor Corpus (VUA) and create a dataset that includes over 10,000 manually created apt paraphrases for these metaphorical sentences. I also manually construct ~1,500 < reference sentence, paraphrase 1, paraphrase 2 > instances that involve inapt paraphrases; the apt-inapt paraphrase pairs capture differences between a contextual, metaphorical interpretation and a literal interpretation of the vehicle terms. I evaluate LLMs on two tasks: paraphrase generation (using all apt paraphrases in the dataset) and paraphrase judgement (a multiple choice task based on the apt-inapt pairs). The experiments show that LLMs face challenges in correctly paraphrasing linguistic metaphors.

**Intentions behind metaphor use.** I co-develop a taxonomy that contains nine categories of possible intentions behind metaphor use. Based on the taxonomy, I co-annotate a dataset that provides intentions annotation for ~1,000 metaphorical sentences sampled from VUA. I then use the dataset to examine LLMs' capabilities to predict the intentions behind linguistic metaphors. Our zero- and few-shot experiments show that inferring the intentions behind linguistic metaphors is a challenging task for current LLMs.

**Humorous multimodal metaphor use.** With regard to multimodal meta-phor use, I focus on the interplay between metaphor and humor in multimodal communication: The two phenomena share common grounds, and metaphor is one of the most common humorous mechanisms. Taking inspiration from the Incongruity Theory of humor, Conceptual Metaphor Theory and the annotation scheme behind VUA, I develop a novel annotation scheme for humorous multimodal metaphor use in image-caption pairs. I annotate 1,000 image-caption pairs sampled from the New Yorker Caption Contest corpus. Based on the dataset, I design a set of tasks to test multimodal LLMs' ability to detect and understand humorous multimodal metaphor use. The experiments show that current LLMs still struggle with processing humorous multimodal metaphors, particularly with regard to integrating visual and textual information.

**Cultural differences in humor appreciation.** Humor exhibits both universality and cultural variance. The ability to align with the "sense of humor" of individual cultures is important in human-AI interaction. As a first step towards a framework for evaluating LLMs' cultural alignment in humor processing, this study aims to establish human baselines representing cultural differences in humor appreciation. Specifically, I consider the association between humor, metaphor, and emotion, and how it differs across culture. To this end, I recruit participants from Chinese, Mexican, Polish, and the U.S. culture, and collect 25,600 funniness ratings and annotation of emotional reactions for 800 captioned New Yorker cartoons, including 482 with detailed annotation of humorous multimodal metaphor use. My quantitative and qualitative analyses reveal both general patterns and intricacies of what is considered humorous in different cultures, how humor appreciation is associated with emotional reactions, and how metaphor may affect humor appreciation depending on the culture.

Item Type: Thesis (Doctoral)
Report Nr: DS-2026-09
Series Name: ILLC Dissertation (DS) Series
Year: 2026
Subjects: Computation
Language
Depositing User: Dr Marco Vervoort
Date Deposited: 07 May 2026 21:50
Last Modified: 28 May 2026 13:45
URI: https://eprints.illc.uva.nl/id/eprint/2420

Actions (login required)

View Item View Item