Classification of Anti-learnable Biological and Synthetic Data

نویسنده

  • Adam Kowalczyk
چکیده

We demonstrate a binary classification problem in which standard supervised learning algorithms such as linear and kernel SVM, naive Bayes, ridge regression, k-nearest neighbors, shrunken centroid, multilayer perceptron and decision trees perform in an unusual way. On certain data sets they classify a randomly sampled training subset nearly perfectly, but systematically perform worse than random guessing on cases unseen in training. We demonstrate this phenomenon in classification of a natural data set of cancer genomics microarrays using crossvalidation test. Additionally, we generate a range of synthetic datasets, the outcomes of 0-sum games, for which we analyse this phenomenon in the i.i.d. setting. Furthermore, we propose and evaluate a remedy that yields promising results for classifying such data as well as normal datasets. We simply transform the classifier scores by an additional 1-dimensional linear transformation developed, for instance, to maximize classification accuracy of the outputs of an internal cross-validation on the training set. We also discuss the relevance to other fields such as learning theory, boosting, regularization, sample bias and application of kernels.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Need of Systems Approach for Biological Explanation of Anti-learnable Signatures

We present simple formal models explaining unusual properties of several biological classification tasks as follows. For these datasets the whole range of supervised learning techniques generate predictive models which classify independent test samples systematically below the performance of random guessing (hence the name anti-learning). We show that explanation of such “counter-intuitive” sup...

متن کامل

The Antiglycation Ability of Typical Medicinal Plants, Natural and Synthetic Compounds: A Review

Given the prevalence of diabetes and the increasing number of diabetics, it is essential to find medicines to decrease the chronic complications of diabetes. Several studies have demonstrated that chronic hyperglycemia and its complications are directly related to protein glycation. Thus, identifying natural inhibitors to stop glycation of proteins may play a crucial role in managing the chroni...

متن کامل

Palarimetric Synthetic Aperture Radar Image Classification using Bag of Visual Words Algorithm

Land cover is defined as the physical material of the surface of the earth, including different vegetation covers, bare soil, water surface, various urban areas, etc. Land cover and its changes are very important and influential on the Earth and life of living organisms, especially human beings. Land cover change monitoring is important for protecting the ecosystem, forests, farmland, open spac...

متن کامل

Expression of an Innate Immune Element (Mouse Hepcidin-1) in Baculovirus Expression System and the Comparison of Its Function with Synthetic Human Hepcidin-25

Hepcidin is an innate immune element which decreases the iron absorption from diet and iron releasing from macrophage cell. In contrast to the chemical iron chelators, there has been limited effort applied to the specific use of hepcidin as a new drug for decreasing the iron overload. Hepcidin is produced in different biological systems. For instance, E-coli is used for human hepcidin expressio...

متن کامل

Expression of an Innate Immune Element (Mouse Hepcidin-1) in Baculovirus Expression System and the Comparison of Its Function with Synthetic Human Hepcidin-25

Hepcidin is an innate immune element which decreases the iron absorption from diet and iron releasing from macrophage cell. In contrast to the chemical iron chelators, there has been limited effort applied to the specific use of hepcidin as a new drug for decreasing the iron overload. Hepcidin is produced in different biological systems. For instance, E-coli is used for human hepcidin expressio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007