Gene selection using support vector machines with non-convex penalty

نویسندگان

  • Hao Helen Zhang
  • Jeongyoun Ahn
  • Xiaodong Lin
  • Cheolwoo Park
چکیده

MOTIVATION With the development of DNA microarray technology, scientists can now measure the expression levels of thousands of genes simultaneously in one single experiment. One current difficulty in interpreting microarray data comes from their innate nature of 'high-dimensional low sample size'. Therefore, robust and accurate gene selection methods are required to identify differentially expressed group of genes across different samples, e.g. between cancerous and normal cells. Successful gene selection will help to classify different cancer types, lead to a better understanding of genetic signatures in cancers and improve treatment strategies. Although gene selection and cancer classification are two closely related problems, most existing approaches handle them separately by selecting genes prior to classification. We provide a unified procedure for simultaneous gene selection and cancer classification, achieving high accuracy in both aspects. RESULTS In this paper we develop a novel type of regularization in support vector machines (SVMs) to identify important genes for cancer classification. A special nonconvex penalty, called the smoothly clipped absolute deviation penalty, is imposed on the hinge loss function in the SVM. By systematically thresholding small estimates to zeros, the new procedure eliminates redundant genes automatically and yields a compact and accurate classifier. A successive quadratic algorithm is proposed to convert the non-differentiable and non-convex optimization problem into easily solved linear equation systems. The method is applied to two real datasets and has produced very promising results. AVAILABILITY MATLAB codes are available upon request from the authors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Primal-Dual Framework for Feature Selection using Least Squares Support Vector Machines

Least Squares Support Vector Machines (LSSVM) perform classification using L2-norm on the weight vector and a squared loss function with linear constraints. The major advantage over classical L2-norm support vector machine (SVM) is that it solves a system of linear equations rather than solving a quadratic programming problem. The L2norm penalty on the weight vectors is known to robustly select...

متن کامل

A Comparative Study of Extreme Learning Machines and Support Vector Machines in Prediction of Sediment Transport in Open Channels

The limiting velocity in open channels to prevent long-term sedimentation is predicted in this paper using a powerful soft computing technique known as Extreme Learning Machines (ELM). The ELM is a single Layer Feed-forward Neural Network (SLFNN) with a high level of training speed. The dimensionless parameter of limiting velocity which is known as the densimetric Froude number (Fr) is predicte...

متن کامل

Robust Support Vector Machine Using Least Median Loss Penalty

It is found that data points used for training may contain outliers that can generate unpredictable disturbance for some Support Vector Machines (SVMs) classification problems. No theoretical limit for such bad influence is held in traditional convex SVM methods. We present a novel robust misclassification penalty function for SVM which is inspired by the concept of “Least Median Regression”. I...

متن کامل

Class-specific Variable Selection for Multicategory Support Vector Machines

This paper proposes a class-specific variable selection method for multicategory support vector machines (MSVMs). Different from existing variable selection methods for MSVMs, the proposed method not only captures the important variables for classification, but also identifies the discriminable and non discriminable classes so as to enhance the interpretation for multicategory classification pr...

متن کامل

Asynchronous Parallel Evolutionary Model Selection for Support Vector Machines

The application of a parallel evolutionary strategy (ES) to model selection for support vector machines is examined. The problem of model selection is a computationally intense non-convex optimization problem. For this reason a parallel search strategy is desirable. A new non-blocking asynchronous ES is developed for this task. The algorithm is tested on five standard test sets optimizing a num...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 22 1  شماره 

صفحات  -

تاریخ انتشار 2006