The PKDD Discovery Challenges on Thrombosis Data

نویسنده

  • Petr Berka
چکیده

The aim of the Discovery Challenge workshops held during PKDD conferences is to encourage a collaborative research effort when analyzing real world data. For PKDD’99 and PKDD2000 two data sets were available; from the financial and from the medical domain, for PKDD2001 only the medical data are used. There are two basic types of contibutions to the Challenge; in “method oriented” papers the authors describe their own approach and use the data mainly for demonstration, in “problem oreinted” papers the authors tried to solve a problem that can be interesting for the end user. 1 Discovery Challenges at the PKDD Conferences The aim of the Discovery Challenge workshops held during PKDD conferences is to encourage a collaborative research effort when analyzing real world data. The idea came from Jan Zytkow, who suggested to organize such an event during PKDD' 99 in Prague. In contrast to competitive nature of KDD Cups held within KDD Conferences, the Discovery Challenge emphasises the aspect of cooperation. Two data sets were available for the PKDD’99 and PKDD2000 Discovery Challenges. In the financial domain, the dataset describes clients of a bank, their accounts, transactions, permanent orders, granted loans and issued credit cards. In the medical domain, the dataset describes patients with collagen diseases. The PKDD2001 Challenge is organized only around the medical data. Each participant could use any KDD techniques and discover as much knowledge as possible. Ideally each contribution includes the proposed business objectives (goals that may be of interest to database users), a brief summary of datamining effort, presentation of the discovered knowledge, and an explanation for database users how they can apply the discovered knowledge. 2 Medical Domain the Thrombosis Data The Thrombosis Data for the PKDD1999 Discovery Challenge were organized into three tables, TSUM_A, TSUM_B, TSUM_C. The tables can be connected by the ID number unique for each patient. Table TSUM_A gives basic information about patients (input by doctors). This dataset includes all patients (about 1000 records). Table TSUM_B gives special laboratory examinations (input by doctors) (measured by the Laboratory on Collagen Diseases). This dataset does not include all the patients, but includes the patients with these special tests. The data in table TSUM_C are data about laboratory examinations stored in Hospital Information Systems (Stored from 1980 to March 1999); all the data include ordinary laboratory examinations and have temporal stamps. The tests are not necessarily connected to thrombosis. For the PKDD2000 Discovery Challenge, the data was restructured into 7 tables to eliminate problems with multi-valued attributes in the original tables (for details see the data description by Zytkow and Gupta in this volume). The same data tables are used also for the challenge this year. 3 The PKDD experience Altogether ten papers on the thrombosis data analysis have been presented at the PKDD’99, PKDD2000 and PKDD2001 conferences. Most of the contributions deal with the classification of thrombosis, but there have also been papers dealing with temporal aspects of the data. We can distinguish two basic types of the contributions. The “method/algorithm oriented” papers focus on describing a new approach or system and use the data more or less for demonstration of the features of the method. The “problem oriented” papers try to formulate (and solve) a problem which can be interesting for end users or domain experts. Tables 1-3 summarize all the papers in terms of solved problem (task), described KDD steps, used mining algorithms and used system. All the papers are available from web at http://lisp.vse.cz/challenge. Table 1. PKDD’99 results 1st. author KDD task KDD steps DM method tool Beilken correlations between lab. Test + thrombosis vizualization Display correlations InfoZoom (own) Levin predict yes/no thrombosis description association rules, ranking objects WizWhy (own) Taylor predict thrombosis, diagnoses preprocessing, classification classification and regression trees Table 2. PKDD 2000 results 1. Author KDD task KDD steps DM method tool Meidan predict yes/no thrombosis description association rules, ranking objects WizWhy (own) Tawfik causal and temporal patterns preprocessing, description, (classification) statistical techniques (Bayesian networks) Tetrad Table 3. PKDD 2001 results 1. Author KDD task KDD steps DM method tool Boulicaut Classify collagen disease preprocessing, description association rules, classification rules ac-miner-12 (own) Coursac classify thrombosis preprocessing, classification decision trees and rules C5.0 Jensen classify thormbosis CRISP-DM neural networks, decision rules, sequnece analysis, association rules Clementine Werner classify severity of disease classification genetic programming LilGP (own) Zytkow classify severity of disease description, classification, interpretation SQL, contingency tables 4 Thrombosis data at another challenges Beside the PKDD conferences, another challenges used the Thrombosis data as well. In September 1999, Shusaku Tsumoto organized a special session in the 38 SIG-FAI and the 45 SIG/KBS of Japanese Society for Artificial Intelligence.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rough Set Based Feature Extraction for Medical Data

Rough-set-based KDD tools are applied to analysis of PKDD’2001 Discovery Challenge data set thrombosis. We focus on the phase of the feature extraction, aiming at finding new attributes which enable better classification of new cases.

متن کامل

Medical ( Thrombosis ) Data Description

Collagen diseases are ofter dangerous and can be lethal. A severe complication common to those diseases of auto-immune system is called thrombosis. It occurs when coagulation of blood clogs blood vessels. Data relevant to the analysis of patients with collagen diseases have been donated to the PKDD Discovery Challenge in the hope that the discovered knowledge will illuminate the mechanisms resp...

متن کامل

Song, H., & Flach, P. (2015). Model Reuse with Subgroup Discovery. In Proceedings of the ECML/PKDD 2015 Discovery Challenges: co-located with European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2015) (CEUR

In this paper we describe a method to reuse models with Model-Based Subgroup Discovery (MBSD), which is a extension of the Subgroup Discovery scheme. The task is to predict the number of bikes at a new rental station 3 hours in advance. Instead of training new models with the limited data from these new stations, our approach first selects a number of pre-trained models from old rental stations...

متن کامل

Knowledge Discovery in Databases: PKDD 2005, 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, October 3-7, 2005, Proceedings

knowledge discovery in databases pkdd 2005 9th european conference on principles and practice of knowledge discovery in databases porto portugal . Book lovers, when you need a new book to read, find the book here. Never worry not to find what you need. Is the knowledge discovery in databases pkdd 2005 9th european conference on principles and practice of knowledge discovery in databases porto p...

متن کامل

’ introduction : special issue of the

This special issue is a collection of papers that were submitted to the ECML/PKDD 2014 journal track and have been accepted for publication in “DataMining andKnowledge Discovery”. The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML/PKDD, launched its journal track last year in 2013. In order to cover the full scope of the conference,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001