Text Classification by PNN-based Term Re-weighting
نویسنده
چکیده
Current approaches to feature selection for text classification aim to reduce the number of terms that are used to describe documents. Thus, documents can be classified and found with greater ease and precision. A key shortcoming of these approaches is that they select the topmost terms to describe documents after ranking all terms using a feature selection measure (scoring function). Lesser high-ranking terms below the topmost terms are discarded to reduce computational costs. Nevertheless, in many cases, they may have considerable discriminative power to enhance the text classification precision. In order to address this issue, we proposed a new feature weighting formalism that ties the topmost terms with lesser high-ranking terms using probabilistic neural networks. In the proposed method, probabilistic neural networks are formed using relative category distribution matrix and topmost terms are re-weighted and passed to Rocchio classifier. This is achieved without increasing the dimensionality of the feature space. Through experiments on datasets from Reuters news collection RCV1, we show that the proposed method is a significant supplement to the statistical feature selection measures for better text classification at extreme term filtering ranges.
منابع مشابه
Balancing between over-weighting and under-weighting in supervised term weighting
Supervised term weighting could improve the performance of text categorization. A way proven to be effective is to give more weight to terms with more imbalanced distributions across categories. This paper shows that supervised term weighting should not just assign large weights to imbalanced terms, but should also control the trade-off between over-weighting and under-weighting. Overweighting,...
متن کاملThe Role of Rare Terms in Enhancing the Performance of Polynomial Networks Based Text Categorization
In this paper, the role of rare or infrequent terms in enhancing the accuracy of English Text Categorization using Polynomial Networks (PNs) is investigated. To study the impact of rare terms in enhancing the accuracy of PNs-based text categorization, different term reduction criteria as well as different term weighting schemes were experimented on the Reuters Corpus using PNs. Each term weight...
متن کاملComparative Study and Analysis of Supervised and Unsupervised Term Weighting Methods on Text Classification
Text Classification is one of the booming area in research with the availability of huge amount of electronic data in the form of news article, research articles, email message, blog, web pages etc. Text Representation is a vital step for text classification. In text representation, term weighting method assigns appropriate weights to the term to get better performance; the term weighting metho...
متن کاملInvestigation of Term Weighting Schemes in Classification of Imbalanced Texts
Class imbalance problem in data, plays a critical role in use of machine learning methods for text classification since feature selection methods expect homogeneous distribution as well as machine learning methods. This study investigates two different kinds of feature selection metrics (one-sided and two-sided) as a global component of term weighting schemes (called as tffs) in scenarios where...
متن کاملReducing Over-Weighting in Supervised Term Weighting for Sentiment Analysis
Recently the research on supervised term weighting has attracted growing attention in the field of Traditional Text Categorization (TTC) and Sentiment Analysis (SA). Despite their impressive achievements, we show that existing methods more or less suffer from the problem of over-weighting. Overlooked by prior studies, over-weighting is a new concept proposed in this paper. To address this probl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011