Analysis of Statistical Keyword Extraction Methods for Incremental Clustering

نویسندگان

  • Rafael Geraldeli Rossi
  • Ricardo Marcondes Marcacini
  • Solange Oliveira Rezende
چکیده

Incremental clustering is a very useful approach to organize dynamic text collections. Due to the time/space restrictions for incremental clustering, the textual documents must be preprocessed to maintain only their most important information. Statistical keyword extraction methods from single documents are useful in this scenario. However, different statistical methods have different assumptions about the properties of keywords in a text, and different methods extract different set of keywords. In this paper we analyze the different methods for keyword extraction and the impact of the number of keywords on the quality of the incremental clustering. We also define a framework for statistical keyword extraction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Trees for Topic Detection

Extracting topic keywords from on-line text documents is highly significant in text mining applications. In our work, extracted keywords are represented as a hierarchical topic tree. For this, we basically use incremental clustering technique for incoming online documents. Moreover, we define a cluster-based measure similar to the tfidf measure and a probabilistic inequality to determine subsum...

متن کامل

Keyword Extraction for Webpage Clusters

The volume of unstructured information presented on the Internet is constantly increasing, together with the total amount of websites and their contents. To process this vast amount of information it is important to distinguish different clusters of related webpages. Such clusters are used, for example, for template induction, keyword extraction, and recommendation algorithms. A variety of appl...

متن کامل

Log based Keyword Extraction and Spread based Clustering for an Efficient Information Searching

Today an efficient information search is very important to extract and analyze user requirements in vast amount of web information. Due to this reason, this paper proposes the log based keyword extraction method which finds the associated keywords in a certain domain. Also, this paper proposes the spread based clustering method as clustering the keywords with high association among the keyword-...

متن کامل

Text analysis of MEDLINE for discovering functional relationships among genes: evaluation of keyword extraction weighting schemes

One of the key challenges of microarray studies is to derive biological insights from the gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the functional links among genes. However, the quality of the keyword lists significantly affects the clustering results. We compared two keyword weighting schemes: normalised z-score and term ...

متن کامل

Graph Based Algorithms for Word Sense Induction and Disambiguation

This paper presents a survey of graph based methods for word sense induction and disambiguation. Many areas of Natural Language Processing like Word Sense Disambiguation (WSD), text summarization, keyword extraction make use of Graph based methods. The very idea behind graph based approach is to formulate the problems in graph setting and apply clustering to obtain a set of clusters (senses). T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013