Improving Text Classification with LSI Using Background Knowledge
نویسندگان
چکیده
We present work in progress that uses Latent Semantic Indexing (LSI) in conjunction with background knowledge and unlabeled examples to improve text classification accuracy. The singular value decomposition (SVD) that is performed by LSI is done on an expanded term by document matrix that includes the labeled training examples as well as the unlabeled examples. We report classification accuracy on different data sets both with and without the inclusion of background knowledge and compare it to other known work.
منابع مشابه
Sprinkled Latent Semantic Indexing for Text Classification with Background Knowledge
In text classification, one key problem is its inherent dichotomy of polysemy and synonym; the other problem is the insufficient usage of abundant useful, but unlabeled text documents. Targeting on solving these problems, we incorporate a sprinkling Latent Semantic Indexing (LSI) with background knowledge for text classification. The motivation comes from: 1) LSI is a popular technique for info...
متن کاملEvaluation of Background Knowledge for Latent Semantic Indexing Classification
This paper presents work that evaluates background knowledge for use in improving accuracy for text classification using Latent Semantic Indexing (LSI). LSI’s singular value decomposition process can be performed on a combination of training data and background knowledge. Intuitively, the closer the background knowledge is to the classification task, the more helpful it will be in terms of crea...
متن کاملImproving Methods for Single-label Text Categorization
As the volume of information in digital form increases, the use of Text Categorization techniques aimed at finding relevant information becomes more necessary. To improve the quality of the classification, I propose the combination of different classification methods. The results show that k-NN-LSI, the combination of k-NNwith LSI, presents an average Accuracy on the five datasets that is highe...
متن کاملCombining the Classifiers and Lsi Method for Efficient and Accurate Text Classification
Text classification involves assignment of predetermined categories to textual resources. Applications of text classification include recommendation systems. Personalization, help desk automation, content filtering and routing, selective alerting, and training. This paper describes an experiment for improving the classification accuracy of a large text corpus by the use of dimensionality reduct...
متن کاملVisualization of Text Document Corpus
From the automated text processing point of view, natural language is very redundant in the sense that many different words share a common or similar meaning. For computer this can be hard to understand without some background knowledge. Latent Semantic Indexing (LSI) is a technique that helps in extracting some of this background knowledge from corpus of text documents. This can be also viewed...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001