Computer and Information Science, ISSN 1913-8989, Vol. 1, No. 1, February 2008
نویسنده
چکیده
This work implements an enhanced Bayesian classifier with better performance as compared to the ordinary naïve Bayes classifier when used with domains and datasets of varying characteristics. Text classification is an active and on-going research field of Artificial Intelligence (AI). Text classification is defined as the task of learning methods for categorising collections of electronic text documents into their annotated classes, based on its contents. An increasing number of statistical approaches have been developed for text classification, including k-nearest neighbor classification, naïve Bayes classification, decision tree, rules induction, and the algorithm implementing the structural risk minimisation theory called the support vector machine. Among the approaches used in these applications, naïve Bayes classifiers have been widely used because of its simplicity. However this generative method has been reported to be less accurate than the discriminative methods such as SVM. Some researches have proven that the naïve Bayes classifier performs surprisingly well in many other domains with certain specialised characteristics. The main aim of this work is to quantify the weakness of traditional naïve Bayes classification and introduce an enhance Bayesian classification approach with additional innovative techniques to perform better than the traditional naïve Bayes classifier. Our research goal is to develop an enhanced Bayesian probabilistic classifier by introducing different tournament structures ranking algorithms along with a high relevance keywords extraction facility and an accurately calculated weighting factors facility. These were done to improve the performance of the classification tasks for specific datasets with different characteristics. Other researches have used general datasets, such as Reuters-21578 and 20_newsgroups to validate the performance of their classifiers. Our approach is easily adapted to datasets with different characteristics in terms of the degree of similarity between classes, multi-categorised documents, and different dataset organisations. As previously mentioned we introduce several techniques such as tournament structures ranking algorithms, higher relevance keyword extraction, and automatically computed document dependent (ACDD) weighting factors. Each technique has unique response while been implemented in datasets with different characteristics but has shown to give outstanding performance in most cases. We have successfully optimised our techniques for individual datasets with different characteristics based on our experimental results.
منابع مشابه
Computer and Information Science, ISSN 1913-8989, Vol. 1, No. 1, February 2008
This article introduces the design principle and implementation method of the automatic judgment software of terminal in sports competition. Image recognition technology makes the competition management level achieve equality, justice, precision and high efficiency, This actualizing automatic judgment by means of recognition model for sports imaging, which is based on the principle of radio fre...
متن کاملComputer and Information Science, ISSN 1913-8989, Vol. 1, No. 1, February 2008
On the basis of different expert knowledge structure, from the greatest factors in power limiting distribution, using the method combining Analytical Hierarchy Process (AHP) and fuzzy set theory, a fuzzy comprehensive group decision model for multi-objects and multi-zones power limiting distribution in peak load shifting is built up. The problem of power limiting distribution between multi-zone...
متن کاملComputer and Information Science, ISSN 1913-8989, Vol. 1, No. 1, February 2008
The adoption of the Internet in societies is becoming more important as more and more public and business services are delivered via the Internet. Understanding the dynamics of the adoption is vital in developing policies to stimulate greater adoption. This paper highlights the dynamics of internet adoption in a general society based on the segmentation of the society into subcultures according...
متن کاملComputer and Information Science, ISSN 1913-8989, Vol. 1, No. 2, May 2008
The Automated Teller Machine has become an integral part of our society. Using the ATM however can often be a frustrating experience. How often have some of us experienced the people in the queue in front of you reinserting their card for another transaction. Why does this happen? Is there a design flaw in the user interface? It seems that many ATM navigation menus are not as intuitive or as ef...
متن کاملComputer and Information Science, ISSN 1913-8989, Vol. 1, No. 2, May 2008
Electric power line overhaul plan is an important issue on power system and engineering practice. As particle swarm optimization is to be a new intelligent algorithm. It is gradually applied into power system these years. This paper provides a relative mathematical model to solve the problems in power line overhaul. Particle swarm optimization algorithm has advantages of less parameters setting...
متن کاملComputer and Information Science, ISSN 1913-8989, Vol. 1, No. 1, February 2008
Xiaowen Xu Management School, Xi’an Jiaotong University, Xi’an 710049, China E-mail: [email protected] Jiayin Wang School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China E-mail: [email protected] The research is supported by National Natural Science Foundation of China (No. 70702030) and National Undergraduate Innovation Experimental Pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009