Document representations for classification of short web-page descriptions
نویسندگان
چکیده
منابع مشابه
Document Representations for Classification of Short Web-Page Descriptions
Motivated by applying Text Categorization to classification of Web search results, this paper describes an extensive experimental study of the impact of bag-ofwords document representations on the performance of five major classifiers – Naïve Bayes, SVM, Voted Perceptron, kNN and C4.5. The texts, representing short Web-page descriptions sorted into a large hierarchy of topics, are taken from th...
متن کاملClassification of Document Page Images
Searching in a large heterogeneous collection of scanned document images often produces uncertain results in part because of the size of the collection and the lack of an ability to focus queries appropriately. Searching for documents by their type is a natural way to enhance the effectiveness of document retrieval in the workplace [2], and a such system is proposed in [4]. The goal of our work...
متن کاملIncremental Document Clustering for Web Page Classiication
Motivated by the beneets in organizing the documents in Web search engines, we consider the problem of automatic Web page classiication. We employ the clustering techniques. Each document is represented by a feature vector. By analyzing the clusters formed by these vectors, we can assign the documents within the same cluster to the same class automatically. Our contributions are the following: ...
متن کاملAutomatic Web Page Classification
Aim of this paper is to describe a method of automatic web page classification to semantic domains and its evaluation. The classification method exploits machine learning algorithms and several morphological as well as semantical text processing tools. In contrast to general text document classification, in the web document classification there are often problems with short web pages. In this p...
متن کاملAutomatic Web Page Classification
To facilitate user browsing of Web, some websites such as Yahoo! (http://dir.yahoo.com) and Open Directory Project (http://dmoz.org) manually maintain a hierarchical structure. While manual classification of web pages provides high accuracy, it is very expensive. To automatically include new emerging pages into these hierarchies, web page classification becomes a hot research topic in web infor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: YUJOR
سال: 2008
ISSN: 0354-0243,1820-743X
DOI: 10.2298/yjor0801123r