A similarity measure for retrieving software artifacts

نویسندگان

  • M. R. Girardi
  • Bertrand Ibrahim
چکیده

presents the mechanism for query processing and retrieval with the measures used for the similarity analysis of the indexing structures. Section 6 describes an experiment conducted to evaluate the effectiveness of the proposed approach. Section 7 summarizes related work in the area of re-use systems. Section 8 concludes the paper with some remarks on planned experiments with the system and further research. Figure 1 shows an overview of the current version of the re-use system. The system consists of a classification mechanism and a retrieval mechanism. The classification system catalogues the software components in a software base through their descriptions in natural language. An acquisition mechanism automatically extracts from software descriptions the knowledge needed to catalogue them in the software base. The system extracts lexical, syntactic and semantic information and this knowledge is used to create a frame-like internal representation for the software component. The interpretation mechanism used for the analysis of a description does not pretend to understand the meaning of a description but to automatically acquire enough information to construct useful indexing units for software components. Semantic analysis of descriptions follows the rules of a semantic formalism. The formalism consists of a case system , constraints and heuristics to perform the translation of the description into an internal representation. Both syntactic and semantic rules are implemented in a grammar to parse descriptions into a set of frames. The semantic formalism is based on some semantic relationships between noun phrases and the verb in a sentence. These semantic relationships provide that similar software descriptions have similar internal representations. A classification scheme for software components derives from the semantic formalism, through a set of generic frames. The internal representation of a description constitutes the indexing unit for the software component, constructed as an instance of these generic frames. The WordNet [8] lexicon is used to obtain morphological information, grammatical categories of terms and lexical relationships between terms. Abstract This paper introduces the main features and the retrieval mechanism of ROSA, a software reuse system based on the processing of the natural language descriptions of software artifacts. The system supports the automatic indexing of components by acquiring lexical, syntactic and semantic knowledge from software descriptions. The retrieval mechanism is based on a similarity analysis that provides good retrieval effectiveness through partial matching of descriptions , processing of synonyms, generalizations and special-izations of terms and considering the syntactic and semantic information available …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A partition-based algorithm for clustering large-scale software systems

Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...

متن کامل

Similarity for Analogical Software Reuse: A Conceptual Modelling Approach

We present our approach to defining similarity between software artifacts and discuss its potential exploitation in software reuse by analogy. We first establish properties of similarity which support its role in retrieving and mapping software descriptions. Then we develop a systematic basis for comparison within a fairly general conceptual modelling framework, whereby comparable elements of t...

متن کامل

Calibrating a Metric for Similarity of Stories against Human Judgement

The identification of similarity is crucial for reusing experience, where it provides the criterion for which elements to reuse in a given context, and for creativity, where generation of artifacts that are similar to those that already existed is not considered creative. Yet similarity is difficult to compute between complex artifacts such as stories. The present paper compares the judgment on...

متن کامل

Compatible Service Retrieval Using Improved Similarity Measure

Now-a-days retrieving suitable services become a prominent need for the user. However available service retrieving mechanism uses the compatible similarity between the services so that user can get likely homogeneous services. This paper work proposes a document based search which uses cosine measure for comparing WSDL files for retrieving similar services. The development of Web Services and W...

متن کامل

Applying Concept Formation Methods to Software Reuse

This paper describes an approach to software reuse that involves generating and retrieving abstractions from existing software systems using concept formation methods. The potential of the approach is illustrated through two important activities of the reuse process. First, the concept hierarchy generated by the concept formation methods is used for organizing and retrieving the artifacts insid...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994