نتایج جستجو برای: tfidf vector space model
تعداد نتایج: 2616913 فیلتر نتایج به سال:
Given a data set and a data mining task such as classiication, there are two main reasons for performing feature space reduction. The rst is to improve the accuracy of the algorithm. In a domain such as text mining, the common technique of parameterizing each document as a vector of words results in thousands of dimensions. The performance of many learning algorithms decreases as the dimensiona...
Automatic text categorization requires the construction of appropriate surrogates for documents within a text collection. The surrogates, often called document vectors, are used to train learning systems for categorising unseen documents. A comparison of different measures (tfidf and weirdness) for creating document vectors is presented together with two different state-of-theart classifiers: s...
With the rapid development of network and information technology, there is a wealth of huge amounts of data on the internet. But it’s a major problem faced by the majority of researchers how to effectively filter out a particular subject or field of information from these data. In this paper, we try to builder a focused crawler based on vector space model and TFIDF text correlation analysis. We...
We present a novel scheme of spoken document classification based on locality sensitive hash because of its ability of solving the approximate near neighbor search in high dimensional spaces. In speechtext conversion stage, although lattice can provide multi-hypothesis during speech recognition, it is too complex to extract proper word information. Confusion network is adopted to improve word r...
Structured link vector model (SLVM) and its improved version depend on statistical term measures to implement XML document representation. As a result, they ignore the lexical semantics of terms and its mutual information, leading to text classification errors. This paper proposed a XML document representation method, WordNet-based lexical-semantic SLVM, to solve the problem. Using WordNet, thi...
We introduce a monolingual query method with additional webpage data to improve the translation quality for more and more official use requirement of statistical machine translation outputs. The motivation behind this method is that we can improve the readability of sentence once for all if we replace translation sentences with the most related sentences generated by human. Based on vector spac...
The importance of a thesaurus in the successful operation of an information retrieval system is well recognized. Yet techniques which support the automatic generation of thesauri remain largely undiscovered. This paper describes one approach to the automatic generation of global thesauri, based on the discrimination value model of Salton, Yang, and Yu and on an appropriate clustering algorithm....
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید