DS-2023-04: Inductive Biases for Learning Natural Language

DS-2023-04: Abnar, Samira (2023) Inductive Biases for Learning Natural Language. Doctoral thesis, University of Amsterdam.

[thumbnail of DS-2023-04.text.pdf] Text
DS-2023-04.text.pdf - Published Version

Download (30MB)
[thumbnail of Samenvatting] Text (Samenvatting)
DS-2023-04.samenvatting.txt - Other

Download (2kB)

Abstract

A classic question in the study of human cognition is: what are the learning biases that make it possible for them to learn and process language? A similar question can now be asked in the study of machine intelligence: to build machine learning models for language, what are the necessary inductive biases that enable learning in an efficient and generalisable manner? We need to identify the learning biases that enable the learning of natural language, and find ways to incorporate them into machine learning models.
Taking a step toward this goal, this thesis explores different techniques to illustrate the impact of inductive biases on different aspects of the solutions these models converge to. We study the sensitivity of the representational spaces of the models to different factors. Furthermore, we propose new techniques to study the attention patterns in models with attention mechanisms.
Using these techniques we study the effect of context, context length, architectural factors, and training objective on the solutions learned by different types of neural language models. We find that different choices in designing neural networks lead towards solutions with different characteristics. While some factors such as training objective and connectivity patterns lead to more divergent solutions, the final solutions are sometimes less sensitive to other factors such as scaling model size.
We build on top of prior work to study the connection between the inductive biases of language models and the underlying mechanisms in the human brain. We find that, among existing neural network architectures, recurrence has a significant role in facilitating learning structures needed to learn language more similar to the human brain. Looking further into the impact of recurrence, we identify and empirically evaluate different sources of inductive biases in recurrent neural networks: (1) sequentiality, (2) memory bottleneck, and (3) parameter sharing in time.
We demonstrate that the process of distilling knowledge from one model to another can shed light on the difference in the inductive biases and expressivity of the teacher and student model. Moreover, we find that some of the effects of inductive biases can potentially transfer through knowledge distillation.
In the end, considering the recent impressive progress in deep learning, and the contribution of the scaling factor in this progress, we believe it is important to have evaluation frameworks that allow us to understand the different ways in which models generalize in different settings and under different conditions. In this thesis, we take a small step toward building such frameworks.

Item Type: Thesis (Doctoral)
Report Nr: DS-2023-04
Series Name: ILLC Dissertation (DS) Series
Year: 2023
Subjects: Computation
Language
Divisions: Faculteit der Natuurwetenschappen, Wiskunde en Informatica
Depositing User: Dr Marco Vervoort
Date Deposited: 13 Mar 2023 12:40
Last Modified: 13 Apr 2023 13:28
URI: https://eprints.illc.uva.nl/id/eprint/2238

Actions (login required)

View Item View Item