MoL-2003-02: A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia

MoL-2003-02: Tala, Fadillah (2003) A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia. [Report]

[thumbnail of Full Text]
Preview
Text (Full Text)
MoL-2003-02.text.pdf

Download (460kB) | Preview
[thumbnail of Abstract] Text (Abstract)
MoL-2003-02.abstract.txt

Download (1kB)

Abstract

Stemming is a process which provides a mapping of different morphological variants of words into their base/common word (stem). This process is also known as conflation. Based on the assumption that terms which have a common stem will usually have similar meaning, the stemming process is widely used in Information Retrieval as a way to improve retrieval performance. In addition to its ability to improve the retrieval performance, the stemming process, which is done at indexing time, will also reduce the size of the index file.

This thesis is about a study of stemming algorithms in Bahasa Indonesia, especially their effect on the information retrieval. We try to evaluate the existing stemmer for Bahasa Indonesia and compare it with a purely rule-based stemmer, which we created for this purpose. This rule-based stemmer is developed based on a study of morphological structure of Bahasa Indonesia words.

Item Type: Report
Report Nr: MoL-2003-02
Series Name: Master of Logic Thesis (MoL) Series
Year: 2003
Uncontrolled Keywords: Stemming, Conflation, Information Retrieval, Bahasa, Indonesia
Date Deposited: 12 Oct 2016 14:38
Last Modified: 12 Oct 2016 14:38
URI: https://eprints.illc.uva.nl/id/eprint/740

Actions (login required)

View Item View Item