Thread Cleaning and Merging for Microblog Topic Detection
نویسندگان
چکیده
As a classic natural language processing technology, topic detection recently attracts more research interests due largely to the rapid development of microblog. The most challenging issue in microblog topic detection is sparse data problem. In this paper, the temporal-author-topic (TAT) model is designed to accomplish microblog topic detection in two phases. In the first phase, the TAT model is applied to clean the thread, namely, to filter noisy microblog texts out of each thread. In the second phase, microblog texts within each thread are merged to form the thread text so that the TAT model is applied to find global topics. The new approach differs from the Hierarchical Agglomerative Clustering (HAC) algorithm by making use of microblog threads to overcome the sparse data problem. Experimental results justify our claims.
منابع مشابه
Chinese Microblog Topic Detection Based on the Latent Semantic Analysis and Structural Property
traditional topic detection method can not be applied to the microblog topic detection directly, because the microblog text is a kind of the short, fractional and grass-roots text. In order to detect the hot topic in the microblog text effectively, we propose a microblog topic detection method based on the combination of the latent semantic analysis and the structural property. According to the...
متن کاملAn Improved Topic Detection Method for Chinese Microblog Based On Incremental Clustering
A topic detection model based on hierarchical clustering for Chinese microblog is proposed in this paper. In order to minimize the impact of noise, we optimize the feature selection and weight computation method and use a new scoring method to filter out those topic-unrelated tweets. We also give an improved topic detection algorithm which uses a new vector distance calculation method and cente...
متن کاملFinding Topic-Related Tweets Using Conversational Thread
Microblog has gained more and more users around the world, the popularity of which makes information spreading in microblog the most important and influential activities on the Internet. Therefore, search in microblog is of the most significant issue for both academic and industrial world. Search in webpages has been studied for several decades, but as for microblog it is still an open and bran...
متن کاملLate Data Fusion for Microblog Search
The character of microblog environments raises challenges for microblog search because relevancy becomes one of the many aspects for ranking documents. We concentrate on merging multiple ranking strategies at postretrieval time for the TREC Microblog task. We compare several state-of-the-art late data fusion methods, and present a new semi-supervised variant that accounts for microblog characte...
متن کاملMeasurement and Analysis of Burst Topic in Microblog
Microblog provides the first communication platform for burst event due to the immediacy and interactivity of microblog. In this paper, we research on user-oriented and message-oriented measurements of burst topic in Sina microblog. The measurements and analysis on large-scale Sina microblog data set show that our proposed measurement method can measure the characteristics of user and message p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011