MoL-2003-04: Source Code Retrieval using Conceptual Graphs

MoL-2003-04: Mishne, Gilad (2003) Source Code Retrieval using Conceptual Graphs. [Report]

[img]
Preview
Text (Full Text)
MoL-2003-04.text.pdf

Download (789kB) | Preview
[img] Text (Abstract)
MoL-2003-04.abstract.txt

Download (2kB)

Abstract

The sharp increase in the amount of easily accessible information in the last decade resulted in a growing amount of research regarding information extraction and retrieval. Inside the general information retrieval framework, specialized methods emerged for specific domains, trying to exploit features of the information in these domains to improve its accessibility. Source code documents written in a computer programming language is one of these domains. Source code is a form of structured data, data in which information is stored both in the structure and in the content; it is, however, different from other structured document domains both in the nature of the structure and the nature of the content. Retrieval of information from source code is crucial for large-scale software development and maintenance, and is recognized as a problem both by software developers and information retrieval researchers; it is a vast research area with multiple interests and various subtasks. This thesis focuses on one aspect of these: the usage of the structure of the code to improve the retrieval. Our approach for improving the retrieval from source code uses conceptual modeling of the code. We employ conceptual graphs - a knowledge representation formalism to design a retrieval model for code that uses both its structure and its content. The model contains a formal definition of the problem, a method for representing the code as conceptual graphs, and procedures for ranking their similarity to a query. We evaluate our model and show that for the code retrieval task it performs better than standard information retrieval approaches. The associated complexity implications are analyzed and discussed, searching for a balance of computational costs and retrieval performance. Our main conclusion is that using the structure for retrieval of code improves retrieval results substantially, and further work in this direction can improve the results even more.

Item Type: Report
Report Nr: MoL-2003-04
Series Name: Master of Logic Thesis (MoL) Series
Year: 2003
Uncontrolled Keywords: Information retrieval, source code
Date Deposited: 12 Oct 2016 14:38
Last Modified: 12 Oct 2016 14:38
URI: https://eprints.illc.uva.nl/id/eprint/742

Actions (login required)

View Item View Item