Grouping by association: using associative networks for document categorization
نویسنده
چکیده
In this thesis we describe a method of using associative networks for automatic document grouping. Associative networks are networks of ideas or concepts in which each concept is linked to concepts that are semantically similar to it. By activating concepts in the network based on the text of a document and spreading this activation to related concepts, we can determine which concepts are related to the document, even if the document itself does not contain words linked directly to those concepts. Based on this information, we can group documents by the concepts they refer to. In the first part of the thesis we describe the method itself, as well as the details of various algorithms used in the implementation. We additionally discuss the theory upon which the method is based and compare it to various related methods. In the second part of the thesis we evaluate techniques to create associative networks from easily accessible knowledge sources, as well as different methods for the training of the associative network. Additionally, we evaluate techniques to improve the extraction of concepts from documents, we compare methods of spreading activation from concept to concept, and we present a novel technique by which the extracted concepts can be used to categorize documents. We also extend the method of associative networks to enable application to multilingual document libraries and compare the method to other state-ofthe-art methods for document grouping. Finally, we present a practical application of associative networks, as implemented in a corporate environment in the form of the Pagelink Knowledge Centre. We demonstrate the practical usability of our work, and discuss the various advantages and disadvantages that the method of associative networks offers.
منابع مشابه
Document Categorization using Multilingual Associative Networks based on Wikipedia
Associative networks are a connectionist language model with the ability to categorize large sets of documents. In this research we combine monolingual associative networks based on Wikipedia to create a larger, multilingual associative network, using the cross-lingual connections between Wikipedia articles. We prove that such multilingual associative networks perform better than monolingual as...
متن کاملUsing Natural Language Processing to Improve Document Categorization with Associative Networks
Associative networks are a connectionist language model with the ability to handle large sets of documents. In this research we investigated the use of natural language processing techniques (part-of-speech tagging and parsing) in combination with Associative Networks for document categorization and compare the results to a TF-IDF baseline. By filtering out unwanted observations and preselectin...
متن کاملHierarchical Document Categorization Using Associative Networks
Associative networks are a connectionist language model with the ability to handle dynamic data. We used two associative networks to categorize random sets of related Wikipedia articles with only their raw text as input. We then compared the resulting categorization to a gold standard: the manual categorization by Wikipedia authors and used a neural network as a baseline. We also determined a h...
متن کاملACNB: Associative Classification Mining Based on Naïve Bayesian Method
Integrating association rule discovery and classification in data mining brings a new approach known as associative classification. Associative classification is a promising approach that often constructs more accurate classification models (classifiers) than the traditional classification approaches such as decision trees and rule induction. In this research, the authors investigate the use of...
متن کاملAssociative Classification in Text Categorization
Text categorization has become one of the key techniques for handling and organizing text data. This model is used to classify new article to its most relevant category. In this paper, we propose a novel associative classification algorithm ACTC for text categorization. ACTC aims at extracting the k-best strong correlated positive and negative association rules directly from training set for cl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015