Graph Mining Methods for Predictive Toxicology

نویسنده

Andreas Maunz

چکیده

The graph structures of molecules can be a rich source of information about their biological activity or chemical reactivity – however, very efficient methods are required for analyzing them. Due to its complexity, any representation of a chemical database can only convey some characteristics of the whole graph corpus. Additionally, the interesting patterns emerge only from the whole set of graphs that constitute the database, not from individual ones, which places a demand for timeand memory-efficient algorithms. A primary goal of graph mining is to find subgraphs that occur with a certain frequency in a given dataset. The amount of such patterns is usually enormous for chemical structure graphs, even when additional filters are employed, such as restricting the result set to subgraphs that primarily occur in the toxic or non-toxic compounds. Therefore, the patterns can often not be used directly for predictive modeling, since they would overfit and/or place a high load on learning algorithms, while at the same time provide a much too fine-grained information to experts. More concise representations would have a significant value to the user, even if more time was needed to calculate them. Concise representations may be obtained, for example, by compression of the pattern set, or lifted representations of molecular fragments. This work shows that such representations may be obtained efficiently in practice, and that they can be of considerable utility for predictive models. It presents a set of algorithmic tools for the extraction of interesting subgraphs and subgraph patterns from molecular databases, and reports on experiments that assess their utility in the context of predictive models. For discovering the most expressive patterns, a combination of structural and statistical constraints is employed. The structural constraints make use of the partial order, in which subgraphs can be put, and on which a refinement operator can be defined. The statistical constraints have the convexity property, allowing for efficient search in combination with the structural constraints. While the approaches are not restricted to chemical structures and toxicological databases, I find the problem of graph mining particularly compelling in this domain, because there has been a rapidly increasing need for efficient and precise computational models in chemical risk assessment during the last decade.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Combined Descriptive and Predictive Methods of Data Mining for Coronary Artery Disease Prediction: a Case Study Approach

Heart disease is one of the major causes of morbidity in the world. Currently, large proportions of healthcare data are not processed properly, thus, failing to be effectively used for decision making purposes. The risk of heart disease may be predicted via investigation of heart disease risk factors coupled with data mining knowledge. This paper presents a model developed using combined descri...

متن کامل

First order models for the Predictive ToxicologyChallenge 2001

This paper discusses the \Leuven" submission 1 to the Predictive Toxicology Challenge 2001. A brief account of some preparatory work is given, followed by a more detailed description of the approach that in the end led to the submitted model, and of the model itself. Both the approach and the model are evaluated from a data mining point of view, and a number of conclusions are drawn.

متن کامل

Knowledge Specification for Versatile Hybrid Intelligent Systems

The increasing amount and complexity of data used in predictive data mining call for new and flexible approaches based on hybrid intelligent methods to mine the data. This paper proposes a formal description for integrated data structures of Hybrid Intelligent Systems and the specification language HISML based on the open XML standard, introduced to fill the gap between simple soft computing mo...

متن کامل

CPM: A Graph Pattern Matching Kernel with Diffusion for Accurate Graph Classification

Graph data mining is an active research area. Graphs are general modeling tools to organize information from heterogenous sources and have been applied in many scientific, engineering, and business fields. With the fast accumulation of graph data, building highly accurate predictive models for graph data emerges as a new challenge that has not been fully explored in the data mining community. I...

متن کامل

Predictive Graph Mining

Graph mining approaches are extremely popular and effective in molecular databases. The vast majority of these approaches first derive interesting, i.e. frequent, patterns and then use these as features to build predictive models. Rather than building these models in a two step indirect way, the SMIREP system introduced in this paper, derives predictive rule models from molecular data directly....

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Graph Mining Methods for Predictive Toxicology

نویسنده

چکیده

منابع مشابه

Using Combined Descriptive and Predictive Methods of Data Mining for Coronary Artery Disease Prediction: a Case Study Approach

First order models for the Predictive ToxicologyChallenge 2001

Knowledge Specification for Versatile Hybrid Intelligent Systems

CPM: A Graph Pattern Matching Kernel with Diffusion for Accurate Graph Classification

Predictive Graph Mining

عنوان ژورنال:

اشتراک گذاری