Facet Classification of Blogs: Know-Center at the TREC 2009 Blog Distillation Task
نویسندگان
چکیده
In this paper, we outline our experiments carried out at the TREC 2009 Blog Distillation Task. Our system is based on a plain text index extracted from the XML feeds of the TREC Blogs08 dataset. This index was used to retrieve candidate blogs for the given topics. The resulting blogs were classified using a Support Vector Machine that was trained on a manually labelled subset of the TREC Blogs08 dataset. Our experiments included three runs on different features: firstly on nouns, secondly on stylometric properties, and thirdly on punctuation statistics. The facet identification based on our approach was successful, although a significant number of candidate blogs were not retrieved at all.
منابع مشابه
RMIT at TREC 2010 Blog Track: Faceted Blog Distillation Task
This paper reports RMIT’s participation in the TREC Blog Track 2010. For the baseline task, we adopted the BM25 model implemented in the Zettair search engine to establish a retrieval system of blog posts based on topic relevance. We then experimented with a number of different approaches to aggregate the post similarity scores to retrieve the most relevant blogs. Similarly, for the faceted dis...
متن کاملA Study of Faceted Blog Distillation--PRIS at TREC 2009 Blog Track
This paper describes BUPT (pris) participation in faceted blog distillation task at Blog Track 2009. The system adopts a two-stage strategy in faceted blog distillation task. In the first stage, the system carries out a basic topic relevance retrieval to get the top k blogs for each query. In the second stage, different models are designed to judge the facets and ranking.
متن کاملHIT_LTRC at TREC 2010 Blog Track: Faceted Blog Distillation
This paper describes our participation in the faceted blog distillation task at Blog Track 2010. In our approach, indri toolkit is applied for basic topic relevance retrieval. Then the Maximum Entropy (ME) model is adopted to judge the relevance of each blog to specified facet. Feed faceted relevance is calculated by integrating the average relevance of all blogs within a feed and the average r...
متن کاملBIT at TREC 2010 Blog Track: Faceted Blog Distillation
This paper presents the work done for the TREC 2010 faceted blog distillation task. As the approach used in TREC 2009, a mixture of language models based on global representation is employed to rank the entire blogs by relevance and facets. The parameters in our approach are adjusted according to the experimental results in TREC 2009. In addition, we make use of the results evaluated in TREC 20...
متن کاملPersonal Blog Retrieval Using Opinion Features
Faceted blog distillation aims at finding blogs with recurring interest to a topic while satisfying a specific facet of interest. In this paper we focus on the personal facet and propose a method that uses opinion features as indicators of personal content. Experimental results on TREC BLOG08 data-set confirm our intuition that personal blogs are more opinionated.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009