نتایج جستجو برای: classification of text documents

تعداد نتایج: 21200175  

2015
J. G. R. Sathiaseelan

Automatic text classification is a popular research topic in text mining. Automatic text classification is an eminent field of research in text mining, which is tries to automatically classify the text documents into pre-specified categories. Text mining involves several pre-processing and classification techniques. In this paper, we have analysed several feature selection methods with support ...

Journal: :journal of advances in computer engineering and technology 2015
mozhgan rahimirad mohammad mosleh amir masoud rahmani

with the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. one of the major problems in text classification relates to the high dimensional feature spaces. therefore, the main goal of text classification is to reduce the dimensionality of features space. there are many feature selection methods. however...

2010
Boris Debić

This paper presents the design of a system for feature extraction and classification of news articles from Croatian news sources. An overview of supervised and unsupervised text classification and clustering machine learning techniques is presented. The techniques described are those most widely used for text classification tasks. The paper discusses a number of issues particular to text classi...

2017
Graham McDonald Craig MacDonald Iadh Ounis

Government documents must be reviewed to identify any sensitive information they may contain, before they can be released to the public. However, traditional paper-based sensitivity review processes are not practical for reviewing born-digital documents. Therefore, there is a timely need for automatic sensitivity classification techniques, to assist the digital sensitivity review process. Howev...

2014
Sadiq Sani

The Vector Space Model (VSM) of text representation suffers a number of limitations for text classification. Firstly, the VSM is based on the Bag-Of-Words (BOW) assumption where terms from the indexing vocabulary are treated independently of one another. However, the expressiveness of natural language means that lexically different terms often have related or even identical meanings. Thus, fail...

2014
Said Bahassine Mohamed Kissi Abdellah Madani

In this paper we conduct a comparative study between two stemming algorithms: khoja stemmer and our new stemmer for Arabic text classification (categorization), using Chisquare statistics as feature selection and focusing on decision tree classifier. Evaluation used a corpus that consists of 5070 documents independently classified into six categories: sport, entertainment, business, middle east...

Journal: :Intell. Data Anal. 2010
Dong Li Anne Laurent Pascal Poncelet Mathieu Roche

Sentiment classification in text documents is an active data mining research topic in opinion retrieval and analysis. Different from previous studies concentrating on the development of effective classifiers, in this paper, we focus on the extraction and validation of unexpected sentences issued in sentiment classification. In this paper, we propose a general framework for determining unexpecte...

2010
Horacio Saggion Adam Funk

We describe a set of tools, resources, and experiments for opinion classification in business-related datasources in two languages. In particular we concentrate on SentiWordNet text interpretation to produce word, sentence, and text-based sentiment features for opinion classification. We achieve good results in experiments using supervised learning machine over syntactic and sentiment-based fea...

2018
Jiangning Chen John Dever Rundong Du

We define a new centroid estimator for text classification based on the KLdivergence of the classes. The score favors documents that have a similar distribution in documents of the same class but different distributions in documents of different classes. Experiments on several standard data sets indicate that the new method outperforms better than traditional Naive Bayes classifier, especially ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید