نتایج جستجو برای: tfidf vector space model

تعداد نتایج: 2616913  

1999
Robert Cooley

Given a data set and a data mining task such as classiication, there are two main reasons for performing feature space reduction. The rst is to improve the accuracy of the algorithm. In a domain such as text mining, the common technique of parameterizing each document as a vector of words results in thousands of dimensions. The performance of many learning algorithms decreases as the dimensiona...

2006
Pensiri Manomaisupat Bogdan Vrusias Khurshid Ahmad

Automatic text categorization requires the construction of appropriate surrogates for documents within a text collection. The surrogates, often called document vectors, are used to train learning systems for categorising unseen documents. A comparison of different measures (tfidf and weirdness) for creating document vectors is presented together with two different state-of-theart classifiers: s...

2014
Qiuli Qin Xin Peng

With the rapid development of network and information technology, there is a wealth of huge amounts of data on the internet. But it’s a major problem faced by the majority of researchers how to effectively filter out a particular subject or field of information from these data. In this paper, we try to builder a focused crawler based on vector space model and TFIDF text correlation analysis. We...

2013
ZHANG LEI

We present a novel scheme of spoken document classification based on locality sensitive hash because of its ability of solving the approximate near neighbor search in high dimensional spaces. In speechtext conversion stage, although lattice can provide multi-hypothesis during speech recognition, it is too complex to extract proper word information. Confusion network is adopted to improve word r...

Journal: :International Journal of Intelligent Computing Research 2011

Journal: :International Journal of Computer Mathematics 2014

Journal: :JSW 2014
Jun Long Luda Wang Zude Li Zuping Zhang Huiling Li Guihu Zhao

Structured link vector model (SLVM) and its improved version depend on statistical term measures to implement XML document representation. As a result, they ignore the lexical semantics of terms and its mutual information, leading to text classification errors. This paper proposed a XML document representation method, WordNet-based lexical-semantic SLVM, to solve the problem. Using WordNet, thi...

2016
Chenxi Pang Hai Zhao Zhongyi Li

We introduce a monolingual query method with additional webpage data to improve the translation quality for more and more official use requirement of statistical machine translation outputs. The motivation behind this method is that we can improve the readability of sentence once for all if we replace translation sentences with the most related sentences generated by human. Based on vector spac...

1999
Carolyn J. Crouch

The importance of a thesaurus in the successful operation of an information retrieval system is well recognized. Yet techniques which support the automatic generation of thesauri remain largely undiscovered. This paper describes one approach to the automatic generation of global thesauri, based on the discrimination value model of Salton, Yang, and Yu and on an appropriate clustering algorithm....

Journal: :Mokslas - Lietuvos ateitis 2010

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید