On patterns and re-use in bioinformatics databases
نویسندگان
چکیده
Motivation As the quantity of data being depositing into biological databases continues to increase, it becomes ever more vital to develop methods that enable us to understand this data and ensure that the knowledge is correct. It is widely-held that data percolates between different databases, which causes particular concerns for data correctness; if this percolation occurs, incorrect data in one database may eventually affect many others while, conversely, corrections in one database may fail to percolate to others. In this paper, we test this widely-held belief by directly looking for sentence reuse both within and between databases. Further, we investigate patterns of how sentences are reused over time. Finally, we consider the limitations of this form of analysis and the implications that this may have for bioinformatics database design. Results We show that reuse of annotation is common within many different databases, and that also there is a detectable level of reuse between databases. In addition, we show that there are patterns of reuse that have previously been shown to be associated with percolation errors. Availability and implementation Analytical software is available on request. Contact [email protected].
منابع مشابه
Protein Databases
Proteins are sources of many peptides with diverse biological activity. Some of them are considered as valuable components of foods and drug targets with desired and designed biological activity. We are now entering an era rich in biological data in which the field of bioinformatics is poised to exploit this information in increasingly powerful ways. There are currently many databases all over ...
متن کاملAn Agent-oriented Notification System for Sequence (Re) Annotation in Genomic Databases
Most bioinformatics research projects have adopted database management systems. Each project builds its own database schema to store sequence (re) annotations. However, as the number of bioinformatics projects grows, new database management issues emerge, such as, project management, project collaboration, schema integration, data distribution, data provenance, etc. This work investigates some ...
متن کاملThe use of design patterns in the design and production of electronic content in e-learning environment
Introduction: The development of electronic content based on one of the main challenges facing e-learning instructional design patterns and the education system is the key to success. This study examines the position of design patterns in the design and production of electronic content in their e-learning environment. Methods: This article is a review article, and a library. In its edition...
متن کاملBioinformatics-Based Prediction of FUT8 as a Therapeutic Target in Estrogen Receptor-Positive Breast Cancer
Abstract Introduction: Estrogen receptor-positive (ER-positive) breast cancer is a subgroup of breast tumors that is more likely to respond to hormone therapy. ER-positive and ER- negative breast cancers tend to show different patterns of metastasis because of different signaling cascade and genes that are activated by estrogen response. Genetic factors can contribute to high rates of metastas...
متن کاملBioinformatics-Based Prediction of FUT8 as a Therapeutic Target in Estrogen Receptor-Positive Breast Cancer
Abstract Introduction: Estrogen receptor-positive (ER-positive) breast cancer is a subgroup of breast tumors that is more likely to respond to hormone therapy. ER-positive and ER- negative breast cancers tend to show different patterns of metastasis because of different signaling cascade and genes that are activated by estrogen response. Genetic factors can contribute to high rates of metastas...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 33 شماره
صفحات -
تاریخ انتشار 2017