BioPig: a Hadoop-based analytic toolkit for large-scale sequence data
نویسندگان
چکیده
منابع مشابه
BioPig: a Hadoop-based analytic toolkit for large-scale sequence data
MOTIVATION The recent revolution in sequencing technologies has led to an exponential growth of sequence data. As a result, most of the current bioinformatics tools become obsolete as they fail to scale with data. To tackle this 'data deluge', here we introduce the BioPig sequence analysis toolkit as one of the solutions that scale to data and computation. RESULTS We built BioPig on the Apach...
متن کاملA partition-based algorithm for clustering large-scale software systems
Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...
متن کاملA Web-based Toolkit for Large-scale Ontologies
There is no doubt that those large-scale domain ontologies are playing a critical role in building a large variety of semantic-based systems. It’s important and urgent to share and reuse large-scale ontologies to support semantic-based applications in a more efficient way. In this paper, we propose a web-based toolkit for building and reusing large-scale ontologies. The toolkit consists of a we...
متن کاملScripting for large-scale sequencing based on Hadoop
Motivation and Objectives The large volumes of data generated by modern sequencing experiments present significant challenges in their manipulation and analysis. Traditional approaches, such as scripting and relational database queries, are often found to be inadequate, frustratingly slow, or complicated to scale. These problems have already been faced by the “big data revolution” in data-based...
متن کاملAn Efficient Data Replication Strategy in Large-Scale Data Grid Environments Based on Availability and Popularity
The data grid technology, which uses the scale of the Internet to solve storage limitation for the huge amount of data, has become one of the hot research topics. Recently, data replication strategies have been widely employed in distributed environment to copy frequently accessed data in suitable sites. The primary purposes are shortening distance of file transmission and achieving files from ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bioinformatics
سال: 2013
ISSN: 1367-4803,1460-2059
DOI: 10.1093/bioinformatics/btt528