LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERN


LANGUAGE-AGNOSTIC SOURCE CODE RETRIEVAL USING KEYWORD & IDENTIFIER LEXICAL PATTERN

Oscar Karnalim
Faculty of Information Technology
Maranatha Christian University
Prof. Drg. Surya Sumantri Street No.65, Bandung, West Java, 40164, Indonesia
This email address is being protected from spambots. You need JavaScript enabled to view it.

ABSTRACT
Despite the fact that source code retrieval is a promising mechanism to support software reuse, it suffers an emerging issue along with programming language development. Most of them rely on programming-language-dependent features to extract source code lexicons. Thus, each time a new programming language is developed, such retrieval system should be updated manually to handle that language. Such action may take a considerable amount of time, especially when parsing mechanism of such language is uncommon (e.g. Python parsing mechanism). To handle given issue, this paper proposes a source code retrieval approach which does not rely on programming-language-dependent features. Instead, it relies on Keyword & Identifier lexical pattern which is typically similar across various programming languages. Such pattern is adapted to four components namely tokenization, retrieval model, query expansion, and document enrichment. According to our evaluation, these components are effective to retrieve relevant source codes agnostically, even though the improvement for each component varies.

Keywords: source code retrieval, language-agnostic approach, lexical pattern, domainspecific ranking;

pdf ico FULL PAPER
 
 
 
 
 

Contact Us

Managing Editor of IJSECS
Faculty of Computer Systems & Software Engineering (FSKKP)

Universiti Malaysia Pahang
Lebuhraya Tun Razak
26300 Gambang,
Kuantan, Pahang Darul Makmur.

Tel: +609 549 2133
Fax: +609 549 2144
Email: ijsecsfskkp@ump.edu.my

Visitor Counter

0047982
Today
Yesterday
This Week
Last Week
This Month
Last Month
All days
51
47
90
398
1521
2534
47982