SiSPI: A Short-Passage Clustering System
نویسندگان
چکیده
We describe SiSPI, a clustering tool based on an unsupervised and incremental approach which aims at arranging short passages from one or multiple documents written in Brazilian Portuguese into clusters. In order to identify similar passages, SiSPI makes use of a statistical model, named TF-ISF (Term Frequency Inverse Sentence Frequency). By grouping similar passages into the same cluster, SiSPI enables a subsequent alignment/fusion component to transform each cluster into a single sentence by fusing common information. We present a pilot experiment which evaluates the system performance in the news domain. The results obtained suggest that SiSPI has potential. This work is support by CNPq
منابع مشابه
Fujitsu Laboratories Trec7 Report 2 System Description 2.1 Overall 2.2 the Search System Tera
In our rst participation in TREC, our focus was on improving the basic ranking systems and applying text clustering techniques for query expansion. We tested a variety of techiniques including reference measures, passage retrieval, and data fusion for the basic ranking systems. Some techiniques were used in the o cial run, others were not used because of time limitations. We applied the text cl...
متن کاملBoosting Passage Retrieval through Reuse in Question Answering
Question Answering (QA) is an emerging important field in Information Retrieval. In a QA system the archive of previous questions asked from the system makes a collection full of useful factual nuggets. This paper makes an initial attempt to investigate the reuse of facts contained in the archive of previous questions to help and gain performance in answering future related factoid questions. I...
متن کاملWord Image Matching as a Techique for Degraded Text Recognition
A technique is presented that determines equivalences between word images in a passage of text. A clustering procedure is applied to group visually similar words. Initial hypotheses for the identities of words are then generated by matching the word groups to language statistics that predict the frequency at which certain words will occur. This is followed by a recognition step that assigns ide...
متن کاملCombination of Transformed-means Clustering and Neural Networks for Short-Term Solar Radiation Forecasting
In order to provide an efficient conversion and utilization of solar power, solar radiation datashould be measured continuously and accurately over the long-term period. However, the measurement ofsolar radiation is not available to all countries in the world due to some technical and fiscal limitations. Hence,several studies were proposed in the literature to find mathematical and physical mod...
متن کاملLanguage Model Passage Retrieval for Question-Oriented Multi Docu- ment Summarization
The goal of question-oriented text summarization aims at producing the informative short description according to the given queries. This is somewhat similar to the target of question answering which retrieves exact answers from large raw text collections. In this paper, we present a resource, and training data-free summarization model for DUC multi-document summarization task. Similar as last ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008