Detecting asynchrony and dephase change patterns by mining software repositories
نویسندگان
چکیده
Software maintenance accounts for the largest part of the costs of any program. During maintenance activities, developers implement changes (sometimes simultaneously) on artefacts in order to fix bugs and to implement new requirements. To reduce this part of the costs, previous work proposed approaches to identify the artefacts of programs that change together. These approaches analyse historical data, mined from version control systems, and report change patterns, which lead at the causes, consequences, and actors of the changes to source code files. They also introduce so-called change patterns that describe some typical change dependencies among files. In this paper, we introduce two novel change patterns: the Asynchrony change pattern, corresponding to macro co-changes (MC), i.e., of files that co-change within a large time interval (change periods), and the Dephase change pattern, corresponding to dephase macro co-changes (DC), i.e., macro co-changes that always happen with the same shifts in time. We present our approach, that we named Macocha, to identify these two change patterns in large programs. We use the k-nearest neighbor algorithm to group changes into change periods.We also use the Hamming distance to detect approximate occurrences of Macro co-changes and Dephase macro co-changes. We apply Macocha and compare its performance in terms of precision and recall with UMLDiff (file stability) and Association Rules (co-changing files) on seven systems: ArgoUML, FreeBSD, JFreeChart, Openser, SIP, XalanC, and XercesC, developed with three different languages (C, C++, and Java). These systems have a size ranging from 532 to 1,693 files and during the study period they have undergone 1,555 to 23,944 change commits. We use external information and static analysis to validate (approximate) Macro co-changes and Dephase macro co-changes found byMacocha. Through our case study, we show the existence and usefulness of these novel change patterns to ease software maintenance and, potentially, reduce related costs.
منابع مشابه
Analysing Artefacts Dependencies to Evolving Software Systems
Program maintenance accounts for the largest part of the costs of any program. During maintenance activities, developers implement changes (sometimes simultaneously) on artefacts to fix bugs and to implement new requirements. Thus, developers need knowledge to identify hidden dependencies among programs artefacts and detect correlated artefacts. As programs evolved, their designs become more co...
متن کاملMining evolutionary dependencies from web-localization repositories
An approach to mining repositories of web-based user documentation for patterns of evolutionary change in the context of internationalization and localization is presented. Localized web documents that are frequently co-changed (i.e., an evolutionary dependency) during the natural language translation process are uncovered to support the future evolution of the system. A sequential-pattern mini...
متن کاملMining Software Repositories for Software Change Impact Analysis: A Case Study
Data mining algorithms have been recently applied to software repositories to help on the maintenance of evolving software systems. In the past, information about what classes changed together, obtained by mining software repositories, were used to guide future changes. We use this information to measure the possible impacts of a proposed change. In this paper we propose and compare two approac...
متن کاملMining API Usage Patterns by Applying Method Categorization to Improve Code Completion
Developers often face difficulties while using APIs. API usage patterns can aid them in using APIs efficiently, which are extracted from source code stored in software repositories. Previous approaches have mined repositories to extract API usage patterns by simply applying data mining techniques to the collection of method invocations of API objects. In these approaches, respective functional ...
متن کاملA Proposed Data Mining Methodology and its Application to Industrial Procedures
Data mining is the process of discovering correlations, patterns, trends or relationships by searching through a large amount of data stored in repositories, corporate databases, and data warehouses. Industrial procedures with the help of engineers, managers, and other specialists, comprise a broad field and have many tools and techniques in their problem-solving arsenal. The purpose of this st...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Software: Evolution and Process
دوره 26 شماره
صفحات -
تاریخ انتشار 2014