MoL-2003-04: Source Code Retrieval using Conceptual Graphs

MoL-2003-04: Mishne, Gilad (2003) Source Code Retrieval using Conceptual Graphs. [Report]

[thumbnail of Full Text]
Text (Full Text)

Download (789kB) | Preview
[thumbnail of Abstract] Text (Abstract)

Download (2kB)


The sharp increase in the amount of easily accessible information in
the last decade resulted in a growing amount of research regarding
information extraction and retrieval. Inside the general information
retrieval framework, specialized methods emerged for specific domains,
trying to exploit features of the information in these domains to
improve its accessibility.
Source code documents written in a computer programming language is
one of these domains. Source code is a form of structured data, data
in which information is stored both in the structure and in the
content; it is, however, different from other structured document
domains both in the nature of the structure and the nature of the
content. Retrieval of information from source code is crucial for
large-scale software development and maintenance, and is recognized as
a problem both by software developers and information retrieval
researchers; it is a vast research area with multiple interests and
various subtasks. This thesis focuses on one aspect of these: the
usage of the structure of the code to improve the retrieval.
Our approach for improving the retrieval from source code uses
conceptual modeling of the code. We employ conceptual graphs - a
knowledge representation formalism to design a retrieval model for
code that uses both its structure and its content. The model contains
a formal definition of the problem, a method for representing the code
as conceptual graphs, and procedures for ranking their similarity to a
We evaluate our model and show that for the code retrieval task it
performs better than standard information retrieval approaches. The
associated complexity implications are analyzed and discussed,
searching for a balance of computational costs and retrieval
performance. Our main conclusion is that using the structure for
retrieval of code improves retrieval results substantially, and
further work in this direction can improve the results even more.

Item Type: Report
Report Nr: MoL-2003-04
Series Name: Master of Logic Thesis (MoL) Series
Year: 2003
Uncontrolled Keywords: Information retrieval, source code
Date Deposited: 12 Oct 2016 14:38
Last Modified: 12 Oct 2016 14:38

Actions (login required)

View Item View Item