DS-2020-09: A Tale of Two Sequences: Interpretable and Linguistically-Informed Deep Learning for Natural Language Processing

DS-2020-09: Bastings, Jasmijn (2020) A Tale of Two Sequences: Interpretable and Linguistically-Informed Deep Learning for Natural Language Processing. Doctoral thesis, University of Amsterdam.

[thumbnail of Full Text] Text (Full Text)
DS-2020-09.text.pdf

Download (1MB)
[thumbnail of Samenvatting] Text (Samenvatting)
DS-2020-09.samenvatting.txt

Download (2kB)

Abstract

A Tale of Two Sequences: Interpretable and Linguistically-Informed Deep Learning for Natural Language Processing
Jasmijn Bastings

Deep Learning (DL) has swiftly taken over our field of NLP. It caused a shift from exploiting linguistic features and structures, such as POS-tags, dependency and constituency trees, to relying solely on the input words, and treating a sentence as a mere sequence of words. As performance records in NLP benchmarks keep being broken, we can ask ourselves: are linguistic structures now obsolete? Is there still a way to make use of them?
In the first part of this thesis, we try to answer these questions in the context of machine translation. We find that we can exploit a Graph Convolutional Network (GCN) to condition a neural machine translation model on linguistic input structures, and we show empirically that we can gain performance improvements while conditioning on syntactic dependency structures, semantic role labeling structures, and both. In addition to conditioning on explicit linguistic structure, we also investigate if we can induce structure in a machine translation setting. We find that it is possible to learn useful structure on top of word embeddings and CNN representations, while obtaining trivial (mostly diagonal) structure on top of LSTM representations. This latent structure is related to the now popular Transformer model, which can be seen as performing graph convolution over dense graphs.
In the second part of the thesis, we look at two common criticisms of neural networks: (1) their lack of interpretability, and (2) their hunger for labeled data to generalize well. We first study neural text classifiers, and make them interpretable by having them provide an explanation, a rationale, for their predictions. This is done by showing exactly which part of the input text is used for classification, rendering the model more transparent than a model that does not provide a rationale. We show that our method is more aligned with human rationales than previous work. Finally, we investigate generalization of neural networks. In particular, we look at the SCAN benchmark and find obtaining a high score does not have to imply strong generalization behavior, due to the simple nature of the data set. We propose a remedy for this problem in the form of the NACS data set.

Item Type: Thesis (Doctoral)
Report Nr: DS-2020-09
Series Name: ILLC Dissertation (DS) Series
Year: 2020
Subjects: Language
Depositing User: Dr Marco Vervoort
Date Deposited: 14 Jun 2022 15:17
Last Modified: 14 Jun 2022 15:17
URI: https://eprints.illc.uva.nl/id/eprint/2178

Actions (login required)

View Item View Item