classification of text documents

Sprinkling Topics for Weakly Supervised Text Classification

2014

Swapnil Hingmire Sutanu Chakraborti

Supervised text classification algorithms require a large number of documents labeled by humans, that involve a laborintensive and time consuming process. In this paper, we propose a weakly supervised algorithm in which supervision comes in the form of labeling of Latent Dirichlet Allocation (LDA) topics. We then use this weak supervision to “sprinkle” artificial words to the training documents...

متن کامل

Research on Text Categorization Based on a Weakly-Supervised Transfer Learning Method

2012

Dequan Zheng Chenghe Zhang Geli Fei Tiejun Zhao

This paper presents a weakly-supervised transfer learning based text categorization method, which does not need to tag new training documents when facing classification tasks in new area. Instead, we can take use of the already tagged documents in other domains to accomplish the automatic categorization task. By extracting linguistic information such as part-of-speech, semantic, co-occurrence o...

متن کامل

Feature Reduction for High-Precision Text Classifi- cation

2011

Yi-Xian Lin Been-Chian Chien

Processing high dimensional features is the key of documents analysis and text classification. Traditional technologies for selecting or extracting rely heavily on the distribution of term features in the set of documents. It generally needs high computation cost to find the significant features. In this paper, we propose a new feature reduction method based on the analysis of discriminant coef...

متن کامل

Evaluation of Text Classification Algorithms for a Web-based Market Data Warehouse

2005

Carsten Felden Peter Chamoni

Decision makers in enterprises cannot handle information flooding without serious problems. A market data information system (MAIS), which is the foundation of a decision support system for German energy trading, uses search and filter components to provide decision-relevant information from Web-documents for enterprises. The already implemented filter component in form of a Multilayer Perceptr...

متن کامل

Biomedical literature classification using encyclopedic knowledge: a Wikipedia-based bag-of-concepts approach

2015

Marcos Antonio Mouriño García Roberto Pérez Rodríguez Luis E. Anido Rifón George Perry

Automatic classification of text documents into a set of categories has a lot of applications. Among those applications, the automatic classification of biomedical literature stands out as an important application for automatic document classification strategies. Biomedical staff and researchers have to deal with a lot of literature in their daily activities, so it would be useful a system that...

متن کامل

Determining Expert Research Areas with Multi-Instance Learning of Hierarchical Multi-Label Classification Model

2015

Tao Wu Qifan Wang Zhiwei Zhang Luo Si

Automatically identifying the research areas of academic/industry researchers is an important task for building expertise organizations or search systems. In general, this task can be viewed as text classification that generates a set of research areas given the expertise of a researcher like documents of publications. However, this task is challenging because the evidence of a research area ma...

متن کامل

نیازهای راهبردی ذانشجویان ایرانی در خواندن متون ادبی و غیرادبی از طریق روش تولید سوال

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه اصفهان - دانشکده زبانهای خارجی 1389

کتایون افضلی, محمد عموزاده, عباس اسلامی راسخ,

abstract the current study sets out 1) to investigate the strategic needs of iranian efl learners in reading literary and non-literary texts; 2) to shed some light on the differences between reading literary and non-literary texts; and 3) to specify the differences in the interaction of participants with texts while reading two literary subgenres ( i.e., short story and literary essays). to ...

15 صفحه اول

Combining Learning and Word Sense Disambiguation for Intelligent User Profiling

2007

Giovanni Semeraro Marco Degemmis Pasquale Lops Pierpaolo Basile

Understanding user interests from text documents can provide support to personalized information recommendation services. Typically, these services automatically infer the user profile, a structured model of the user interests, from documents that were already deemed relevant by the user. Traditional keyword-based approaches are unable to capture the semantics of the user interests. This work p...

متن کامل

Human Document Classification Using Bags of Words

2006

Florian Wolf Tomaso Poggio Pawan Sinha

Humans are remarkably adept at classifying text documents into categories. For instance, while reading a news story, we are rapidly able to assess whether it belongs to the domain of finance, politics or sports. Automating this task would have applications for content-based search or filtering of digital documents. To this end, it is interesting to investigate the nature of information humans u...

متن کامل

Leveraging the Legacy of Conventional Libraries for Organizing Digital Libraries

2009

Arash Joorabchi Abdulhussain E. Mahdi

With the significant growth in the number of available electronic documents on the Internet, intranets, and digital libraries, the need for developing effective methods and systems to index and organize E-documents is felt more than ever. In this paper we introduce a new method for automatic text classification for categorizing E-documents by utilizing classification metadata of books, journals...

متن کامل