Optimizing a Cost Matrix to Solve Rare-Class Biological Problems

نویسندگان

  • Mark J. Lawson
  • Lenwood S. Heath
  • Hai Zhao
  • Liqing Zhang
چکیده

In a binary dataset, a rare-class problem occurs when one class of data (typically the class of interest) is far outweighed by the other. Such a problem is typically difficult to learn and classify and is quite common, especially among biological problems such as the identification of gene conversions. A multitude of solutions for this problem exist with varying levels of success. In this paper we present our solution, which involves using the MetaCost algorithm, a cost-sensitive “meta-classifier” that requires a cost matrix to adjust the learning of an underlying classifier. Our method finds this cost matrix for a given dataset and classification algorithm, creating a final classification model. Through a detailed description, a basic evaluation, and the application to the problem of identifying gene conversions, we show the effectiveness of this approach. Our novel approach to generating a cost matrix has proven to be quite effective in the identification of gene conversions and represents a robust way to tackle the rare-class data problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Search for a Cost Matrix to Solve Rare-Class Biological Problems

The rare-class data classification problem is a common one. It occurs when, in a dataset, the class of interest is far outweighed by other classes, thus making it difficult to classify using typical classification algorithms. These types of problems are found quite often in biological datasets, where data can be sparse and the class of interest has few representatives. A variety of solutions to...

متن کامل

A Projected Alternating Least square Approach for Computation of Nonnegative Matrix Factorization

Nonnegative matrix factorization (NMF) is a common method in data mining that have been used in different applications as a dimension reduction, classification or clustering method. Methods in alternating least square (ALS) approach usually used to solve this non-convex minimization problem.  At each step of ALS algorithms two convex least square problems should be solved, which causes high com...

متن کامل

Solution of Fractional Optimal Control Problems with Noise Function Using the Bernstein Functions

This paper presents a numerical solution of a class of fractional optimal control problems (FOCPs) in a bounded domain having a noise function by the spectral Ritz method‎. ‎The Bernstein polynomials with the fractional operational matrix are applied to approximate the unknown functions‎. ‎By substituting these estimated functions into the cost functional‎, ‎an unconstrained nonlinear optimizat...

متن کامل

A differential evolution algorithm to solve new green VRP model by optimizing fuel consumption considering traffic limitations for collection of expired products

The purpose of this research is to present a new mathematical modeling for a vehicle routing problem considering concurrently the criteria such as distance, weight, traffic considerations, time window limitation, and heterogeneous vehicles in the reverse logistics network for collection of expired products. In addition, we aim to present an efficient solution approach according to differential ...

متن کامل

Moving Towards Accountability for Reasonableness – A Systematic Exploration of the Features of Legitimate Healthcare Coverage Decision-Making Processes Using Rare Diseases and Regenerative Therapies as a Case Study

Background The accountability for reasonableness (A4R) framework defines 4 conditions for legitimate healthcare coverage decision processes: Relevance, Publicity, Appeals, and Enforcement. The aim of this study was to reflect on how the diverse features of decision-making processes can be aligned with A4R conditions to guide decisio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011