Code Similarity via Natural Language Descriptions
نویسندگان
چکیده
Code similarity is a central challenge in many programming related applications, such as code search, automatic translation, and plagiarism detection. In this work, we reduce the problem of semantic relatedness between code fragments into a problem of semantic relatedness of textual descriptions. Our main idea is that we can use the relationship between code and its textual descriptions as established in question-answering sites such as STACKOVERFLOW. Consequently, we can determine semantic relatedness and similarity, of code fragments across different programming languages, a task considered extremely difficult using traditional approaches. We have implemented our approach, and used crowed-sourced labeling of similarity to evaluate it over 1500 pairs of code fragments. Results show that we gain around 80% precision and 75% recall, and demonstrate the promise of this approach.
منابع مشابه
Semantic approaches to software component retrieval with English queries
Enabling code reuse is an important goal in software engineering, and it depends crucially on effective code search interfaces. We propose to ground word meanings in source code and use such language-code mappings in order to enable a search engine for programming library code where users can pose queries in English. We exploit the fact that there are large programming language libraries which ...
متن کاملUsing english to retrieve software
This paper describes ROSA, a software reuse system based on the processing of the natural language descriptions of software artifacts. Lexical, syntactic and semantic analysis of software descriptions is performed to automatically extract both verbal and nominal phrases from descriptions and use this information to create frame-based indexing units for software components. Retrieval similarity ...
متن کاملA similarity measure for retrieving software artifacts
presents the mechanism for query processing and retrieval with the measures used for the similarity analysis of the indexing structures. Section 6 describes an experiment conducted to evaluate the effectiveness of the proposed approach. Section 7 summarizes related work in the area of re-use systems. Section 8 concludes the paper with some remarks on planned experiments with the system and furt...
متن کاملA Syntactic Neural Model for General-Purpose Code Generation
We consider the problem of parsing natural language descriptions into source code written in a general-purpose programming language like Python. Existing datadriven methods treat this problem as a language generation task without considering the underlying syntax of the target programming language. Informed by previous work in semantic parsing, in this paper we propose a novel neural architectu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014