LP-1996-13: Data-Oriented Language Processing: An Overview

LP-1996-13: Bod, Rens and Scha, Remko (1996) Data-Oriented Language Processing: An Overview. [Report]

Text (Full Text (PDF))

Download (178kB) | Preview
[img] Text (Full Text (PS))

Download (60kB)
[img] Text (Abstract)

Download (1kB)


Data­oriented models of language processing embody the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract grammar rules. Such models therefore maintain large corpora of linguistic representations of previously occurring utterances. When processing a new input utterance, analyses of this utterance are constructed by combining fragments from the corpus; the occurrence­frequencies of the fragments are used to estimate which analysis is the most probable one. This paper motivates the idea of data­oriented language processing by considering the problem of syntactic disambiguation. One relatively simple parsing/disambiguation model that implements this idea is described in some detail. This model assumes a corpus of utterances annotated with labelled phrase­structure trees, and parses new input by combining subtrees from the corpus; it selects the most probable parse of an input utterance by considering the sum of the probabilities of all its derivations. The paper discusses some experiments carried out with this model. Finally, it reviews some other models that instantiate the data­oriented processing approach. Many of these models also employ labelled phrase­structure trees, but use different criteria for extracting subtrees from the corpus or employ different disambiguation strategies; other models use richer formalisms for their corpus annotations.

Item Type: Report
Report Nr: LP-1996-13
Series Name: Logic, Philosophy and Linguistics (LP)
Year: 1996
Date Deposited: 12 Oct 2016 14:39
Last Modified: 12 Oct 2016 14:40
URI: https://eprints.illc.uva.nl/id/eprint/1249

Actions (login required)

View Item View Item