DS-2024-04: Visual and Linguistic Processes in Deep Neural Networks: A Cognitive Perspective

DS-2024-04: Takmaz, Ece (2024) Visual and Linguistic Processes in Deep Neural Networks: A Cognitive Perspective. Doctoral thesis, Universiteit van Amsterdam.

[thumbnail of DS-2024-04.text.pdf] Text
DS-2024-04.text.pdf - Published Version

Download (21MB)
[thumbnail of Samenvatting] Text (Samenvatting)
DS-2024-04.samenvatting.txt - Other

Download (2kB)

Abstract

When people describe an image, there are complex visual and linguistic processes at work. For instance, speakers tend to look at an object right before mentioning it, but not every time. Similarly, during a conversation, speakers can refer to an entity multiple times, using expressions evolving in the common ground. In this thesis, I develop computational models of such visual and linguistic processes, drawing inspiration from theories and findings from cognitive science and psycholinguistics. This work, where I aim to capture the intricate relationship between non-linguistic modalities and language within deep artificial neural networks, contributes to the line of research into multimodal Natural Language Processing. This thesis consists of two parts: (1) modeling human gaze in language use (production and comprehension), and (2) modeling communication strategies in referential tasks in visually grounded dialogue. In the first part, I delve into enhancing image description generation models using eye-tracking data; evaluating the variation in human signals while describing images; and predicting human reading behavior in the form of eye movements. In the second part, I build models quantifying, generating, resolving, and adapting utterances in referential tasks situated within visual and conversational contexts. The outcomes advance our understanding of human visuo-linguistic processes by revealing intricate strategies at play in such processes, and point to the importance of accounting for them when developing and utilizing multimodal models. The findings shed light on how the advancements in artificial intelligence could contribute to advancing the research on crossmodal processes in humans and vice versa.

Item Type: Thesis (Doctoral)
Report Nr: DS-2024-04
Series Name: ILLC Dissertation (DS) Series
Year: 2024
Subjects: Computation
Language
Depositing User: Dr Marco Vervoort
Date Deposited: 07 Apr 2024 23:46
Last Modified: 11 Apr 2024 13:52
URI: https://eprints.illc.uva.nl/id/eprint/2308

Actions (login required)

View Item View Item