نتایج جستجو برای: classification of text documents

تعداد نتایج: 21200175  

2007
Nikitas N. Karanikolas Christos Skourlas N. N. Karanikolas

The hard problem of the Text Classification usually has various aspects and potential solutions. In this paper, two main research directions for narrative documents’ classification are considered. The first one is based on data mining and rule induction techniques, while the second combines the traditional Text Retrieval techniques (use of the vector space model,

2006
Ioannis Antonellis Christos Bouras Vassilis Poulopoulos Anastasios Zouzias

We explore scalability issues of the text classification problem where using (multi)labeled training documents we try to build classifiers that assign documents into classes permitting classification in multiple classes. A new class of classification problems, called ‘scalable’ is introduced that models many problems from the area of Web mining. The property of scalability is defined as the abi...

2003
Yefeng Zheng Huiping Li David S. Doermann

In this paper we address the problem of the identification of text from noisy documents. We segment and identify handwriting from machine printed text because 1) handwriting in a document often indicates corrections, additions or other supplemental information that should be treated differently from the main or body content, and 2) the segmentation and recognition techniques for machine printed...

2015
Suhad A. Yousif Islam Elkabani Rached Zantout

A massive amount of documents are being posted online every minute. The task of document classification requires extensive background work on the content of documents, where keyword-based matching alone may not be sufficient. Much research has been carried out in several languages that has revealed significant results. However, Arabic documents still pose a great challenge due to the nature of ...

2006
Laila Khreisat

This paper presents the results of classifying Arabic text documents using the N-gram frequency statistics technique employing a dissimilarity measure called the “Manhattan distance”, and Dice’s measure of similarity. The Dice measure was used for comparison purposes. Results show that N-gram text classification using the Dice measure outperforms classification using the Manhattan measure.

2011
Seyyed Mohammad Reza Farshchi

The assignment of natural language texts to one or more predefined categories based on their content – is an important component in many information organization and management tasks. This research proposes a novel approach for documents classification with using novel method that combined competitive self organizing neural text categorizer with new vectors that we called, string vectors. Even ...

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه تربیت مدرس 1390

abstract:‎ literature is said beautiful words of poetry or prose that excites reader’s or listener’s feel. certainly, to ‎be effective, such a text should have certain characteristics. the four elements of the literature: thought, ‎imagination, emotion and style make a text to be literary and effective. emotion and imagination are ‎specific elements of literary texts, while thought and style a...

2002
Hyo-Jung Oh Moon-Soo Chang Myung-Gil Jang Sung Hyon Myaeng

With the exponential growth of information on the WWW, it is becoming increasingly difficult to find and organize relevant documents. Automatic text classification has been considered as a solution to the problem with its focus mostly on the subject or content of text [1]. Recently, researchers have realized that user information needs are not just based on the subject of a document but also on...

2001
Aixin Sun Ee-Peng Lim

Hierarchical Classification refers to assigning of one or more suitable categories from a hierarchical category space to a document. While previous work in hierarchical classification focused on virtual category trees where documents are assigned only to the leaf categories, we propose a topdown level-based classification method that can classify documents to both leaf and internal categories. ...

2003
Núria Bel Cornelis H. A. Koster Marta Villegas

This article deals with the problem of Cross-Lingual Text Categorization (CLTC), which arises when documents in different languages must be classified according to the same classification tree. We describe practical and cost-effective solutions for automatic Cross-Lingual Text Categorization, both in case a sufficient number of training examples is available for each new language and in the cas...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید