Adaptive context trees and text clustering

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive context trees and text clustering

In the finite-alphabet context we propose four alternatives to fixed-order Markov models to estimate a conditional distribution. They consist in working with a large class of variablelength Markov models represented by context trees, and building an estimator of the conditional distribution with a risk of the same order as the risk of the best estimator for every model simultaneously, in a cond...

متن کامل

Text Categorization Using Adaptive Context Trees

A new way of representing texts written in natural language is introduced, as a conditional probability distribution at the letter level learned with a variable length Markov model called adaptive context tree model. Text categorization experiments demonstrates the ability of this representation to catch information about the semantic content of the text.

متن کامل

Context - Dependent Conflation , Text Filtering and Clustering

The presence of trivial words in text databases can impact record or concept (words/ phrases) clustering adversely. Additionally, the determination of whether a word/ phrase is trivial is context-dependent. The objective of the present paper is to demonstrate a context-dependent trivial word filter to improve clustering quality. Factor analysis was used as a context-dependent trivial word filte...

متن کامل

Annotated Suffix Trees for Text Clustering

In this paper an extension of tf -idf weighting on annotated suffix tree (AST) structure is described. The new weighting scheme can be used for computing similarity between texts, which can further serve as in input to clustering algorithm. We present preliminary tests of using AST for computing similarity of Russian texts and show slight improvement in comparison to the baseline cosine similar...

متن کامل

Adaptive channel equalization using context trees

The maximum likelihood sequence estimator is the optimal receiver for the inter-symbol interference (ISI) channel with additive white noise. A receiver is demonstrated that estimates sequence likelihood using a variable order Markov model constructed from a crudely quantized training sequence. Receiver performance is relatively unaffected by heavy-tailed noise that can undermine the performance...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Information Theory

سال: 2001

ISSN: 0018-9448

DOI: 10.1109/18.930925