Performance Implication of Knowledge Discovery Techniques in Databases
نویسندگان
چکیده
This chapter introduces knowledge discovery techniques as a means of identifying critical trends and patterns for business decision support. It suggests that effective implementation of these techniques requires a careful assessment of the various data mining tools and algorithms available. Both statistical and machine-learning based algorithms have been widely applied to discover knowledge from data. In this chapter we describe some of these algorithms and investigate their relative performance for classification problems. Simulation based results support the proposition that machinelearning algorithms outperform their statistical counterparts, albeit only under certain conditions. Further, the authors hope that the discussion on performance related issues will foster a better understanding of the application and appropriateness of knowledge discovery techniques. 701 E. Chocolate Avenue, Hershey PA 17033-1240, USA Tel: 717/533-8845; Fax 717/533-8661; URL-http://www.idea-group.com IDEA GROUP PUBLISHING This chapter appears in the book, Advanced Topics in Database Research, edited by Keng Sia . Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. 192 Rajagopalan and Krovi Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. INTRODUCTION The volume of data collected by businesses today is phenomenal and is increasing exponentially. The challenge is to integrate and correlate data related to both online and offline sales, customer satisfaction surveys, and server log files. To this end, data mining (DM) the process of sifting through the mass of organizational (internal and external) data to identify patterns, is critical for decision support. Effective data mining has several applications, like fraud detection and bankruptcy prediction (Tam & Kiang, 1992; Lee, Han, & Kwon, 1996; Kumar, Krovi, & Rajagopalan, 1997), strategic decision-making (Nazem & Shin, 1999), and database marketing (Brachman, R.J. Khabaza, T. Kloesgen, W. PiatetskyShapiro, G. & Simoudis, E, 1996). Today, businesses have the unique opportunity for using such techniques for target marketing and customer relationship management. Analysis of massive data collected by businesses can support intelligence-gathering efforts about their competition, product, or market. Intelligent tools based on rules derived from web mining can also play an important role in personalization related to site content and presentation. Recently, there has been considerable interest on how to integrate and mine such data (Mulvenna, Anand, & Buchner, 2000; Brachman et al., 1996). Business databases in general pose a unique problem for pattern extraction because of their complex nature. This complexity arises from anomalies such as discontinuity, noise, ambiguity, and incompleteness (Fayyad, Piatetsky-Shapiro & Smyth, 1996). Historically, decision makers had to manually deduce patterns using information generated by query reporting systems. One level of analytical sophistication above this was the ability to look at the data and perform analyses such as What-If and goal seeking. More recently, online analytical processing
منابع مشابه
Application of Rough Set Theory in Data Mining for Decision Support Systems (DSSs)
Decision support systems (DSSs) are prevalent information systems for decision making in many competitive business environments. In a DSS, decision making process is intimately related to some factors which determine the quality of information systems and their related products. Traditional approaches to data analysis usually cannot be implemented in sophisticated Companies, where managers ne...
متن کاملبررسی کاربردهای داده کاوی در نظام سلامت
Introduction: Extensive amounts of data stored in medical databases require the development of specialized tools for accessing the data, data analysis, knowledge discovery, and the effective use of the data. Data mining is one of the most important methods. The article sketches the used Data Mining techniques, and illustrates their applicability to medical diagnostic and prognostic problems. ...
متن کاملThe Expanded Implication Problem of
The implication problem is the problem of deciding whether a given set of dependencies entails other dependencies. Up to now, the entailment of excluded dependencies or independencies is only regarded on a metalogical level, which is not suitable for an automatic inference process. But, the inference of independencies is of great importance for new topics in database research like knowledge dis...
متن کاملDiscovery of Spatial Association Rules in Geographic Information Databases
Spatial data mining i e discovery of interesting implicit knowledge in spatial databases is an important task for understanding and use of spatial data and knowledge bases In this paper an e cient method for mining strong spatial association rules in geographic infor mation databases is proposed and studied A spatial association rule is a rule indicating certain association relationship among a...
متن کاملQualitative Discovery in Medical Databases
Implication rules have been used in uncertainty reasoning systems to confirm and draw hypotheses or conclusions. However a m_jor bottleneck in developing such systems lies in the elicitation of these rules. This paper empirically examines the performance of evidential inferencing with implication networks generated using a rule induction tool called KAT. KAT utilizes an algorithm for the statis...
متن کاملKnowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification
Both, the number and the size of spatial databases are rapidly growing because of the large amount of data obtained from satellite images, X-ray crystallography or other scientific equipment. Therefore, automated knowledge discovery becomes more and more important in spatial databases. So far, most of the methods for knowledge discovery in databases (KDD) have been based on relational database ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003