Adapting Hidden Naive Bayes for Text Classification
نویسندگان
چکیده
Due to its simplicity, efficiency, and effectiveness, multinomial naive Bayes (MNB) has been widely used for text classification. As in (NB), assumption of the conditional independence features is often violated and, therefore, reduces classification performance. Of numerous approaches alleviating features, structure extension attracted less attention from researchers. To best our knowledge, only structure-extended MNB (SEMNB) proposed so far. SEMNB averages all weighted super-parent one-dependence estimators; it an ensemble learning model. In this paper, we propose a single model called hidden (HMNB) by adapting well-known NB (HNB). HMNB creates parent each feature, which synthesizes other qualified features’ influences. For learn, simple but effective algorithm without incurring high-computational-complexity structure-learning process. Our improved idea can also be improve complement (CNB) one-versus-all-but-one (OVA), resulting models are simply denoted as HCNB HOVA, respectively. The extensive experiments on eleven benchmark datasets validate effectiveness HMNB, HCNB, HOVA.
منابع مشابه
A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier
With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...
متن کاملTransferring Naive Bayes Classifiers for Text Classification
A basic assumption in traditional machine learning is that the training and test data distributions should be identical. This assumption may not hold in many situations in practice, but we may be forced to rely on a different-distribution data to learn a prediction model. For example, this may be the case when it is expensive to label the data in a domain of interest, although in a related but ...
متن کاملHidden Naive Bayes
The conditional independence assumption of naive Bayes essentially ignores attribute dependencies and is often violated. On the other hand, although a Bayesian network can represent arbitrary attribute dependencies, learning an optimal Bayesian network from data is intractable. The main reason is that learning the optimal structure of a Bayesian network is extremely time consuming. Thus, a Baye...
متن کاملA Comparison of Event Models for Naive Bayes Text Classification
Recent approaches to text classification have used two different first-order probabilistic models for classification, both of which make the naive Bayes assumption. Some use a multi-variate Bernoulli model, that is, a Bayesian Network with no dependencies between words and binary word features (e.g. Larkey and Croft 1996; Koller and Sahami 1997). Others use a multinomial model, that is, a uni-g...
متن کاملA Term Association Translation Model for Naive Bayes Text Classification
Text classi cation (TC) has long been an important research topic in information retrieval (IR) related areas. In the literature, the bag-of-words (BoW) model has been widely used to represent a document in text classi cation and many other applications. However, BoW, which ignores the relationships between terms, o ers a rather poor document representation. Some previous research has shown tha...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Mathematics
سال: 2021
ISSN: ['2227-7390']
DOI: https://doi.org/10.3390/math9192378