classification of text documents

Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap Sampling

2014

Dharmendra S Panwar Kshitij Pathak

Although publicly accessible databases containing speech documents. It requires a great deal of time and effort required to keep them up to date is often burdensome. In an effort to help identify speaker of speech if text is available, text-mining tools, from the machine learning discipline, it can be applied to help in this process also. Here, we describe and evaluate document classification a...

متن کامل

Comparative Assessment of the Performance of Three WEKA Text Classifiers Applied to Arabic Text

2013

Abdullah H. Wahbeh Mohammed Al-Kabi

This research is conducted in order to compare the performance of three known text classification techniques namely, Support Vector Machine (SVM) classifier, Naïve Bayes (NB) classifier, and C4.5 Classifier. Text classification aims to automatically assign the text to a predefined category based on linguistic features, and content. These three techniques are compared using a set of Arabic text ...

متن کامل

Support Vector Machines for Text Categorization

2003

Atreya Basu Carolyn R. Watters Michael A. Shepherd

Text categorization is the process of sorting text documents into one or more predefined categories or classes of similar documents. Differences in the results of such categorization arise from the feature set chosen to base the association of a given document with a given category. Advocates of text categorization recognize that the sorting of text documents into categories of like documents r...

متن کامل

Improving Multi-Document Summarization via Text Classification

2017

Ziqiang Cao Wenjie Li Sujian Li Furu Wei

Developed so far, multi-document summarization has reached its bottleneck due to the lack of sufficient training data and diverse categories of documents. Text classification just makes up for these deficiencies. In this paper, we propose a novel summarization system called TCSum, which leverages plentiful text classification data to improve the performance of multi-document summarization. TCSu...

متن کامل

Text Document Classification: an Approach Based on Indexing

2012

B S Harish S Manjunath

In this paper we propose a new method of classifying text documents. Unlike conventional vector space models, the proposed method preserves the sequence of term occurrence in a document. The term sequence is effectively preserved with the help of a novel datastructure called ‘Status Matrix’. Further the corresponding classification technique has been proposed for efficient classification of tex...

متن کامل

توسعه محصول جدید بر پایه مدیریت دانش مشتری

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه تربیت مدرس - دانشکده فنی مهندسی 1387

زینب رضوانی, محمد مهدی سپهری,

the outcome of this research is a practical framework for “idea generation phase of new product development process based on customer knowledge”. in continue, the mentioned framework implemented in a part of iran n.a.b market and result in segmenting and profiling this market. also, the critical new product attributes and bases of communication message and promotion campaigns extracted. we have...

15 صفحه اول

Machine Learning approach to Document Classification using Concept based Features

2015

Saranya Jothi D. Thenmozhi

Text mining refers to the process of deriving high-quality information from text. Text processing involves in search and replace in electronic format of text. A number of approaches have been developed to represent and classify text documents. Most of the approach tries to attain good classification performance while taking a document only by words. We propose a concept based methodology instea...

متن کامل

On the use of text classification methods for text summarisation

2013

Matias Garcia-Constantino

This thesis describes research work undertaken in the fields of text and questionnaire mining. More specifically, the research work is directed at the use of text classification techniques for the purpose of summarising the free text part of questionnaires. In this thesis text summarisation is conceived of as a form of text classification in that the classes assigned to text documents can be vi...

متن کامل

Combining Unigrams and Bigrams in Semi-Supervised Text Classification

2009

Igor Assis Braga Maria Carolina Monard Edson Takashi Matsubara

Unlabeled documents vastly outnumber labeled documents in text classification. For this reason, semi-supervised learning is well suited to the task. Representing text as a combination of unigrams and bigrams has not shown consistent improvements compared to using unigrams in supervised text classification. Therefore, a natural question is whether this finding extends to semi-supervised learning...

متن کامل

Fast Text Classification Using Sequential Sampling Processes

2001

Michael D. Lee

A central problem in information retrieval is the automated classification of text documents. While many existing methods achieve good levels of performance, they generally require levels of computation that prevent them from making sufficiently fast decisions in some applied setting. Using insights gained from examining the way humans make fast decisions when classifying text documents, two ne...

متن کامل