MSRA Columbus at GeoCLEF 2006
نویسندگان
چکیده
This paper describes the participation of Columbus Project of Microsoft Research Asia (MSRA) in the GeoCLEF 2006 (a cross-language geographical retrieval track which is part of Cross Language Evaluation Forum). For location extraction from the corpus, we employ a gazetteer and rule based approach. We use the MSRA’s IREngine as our text search engine. Both text indexing and geo-indexing (implicit location indexing and grid indexing) are considered in our system. We only participated in the Monolingual GeoCLEF evaluation (EN-EN) and submitted five runs based on different methods, including MSRAWhitelist, MSRAManual, MSRAExpansion, MSRALocal and MSRAText. In MSRAWhitelist, we expanded the unrecognized locations (such as former Yugoslavia) to several countries manually. In MSRAManual, based on the MSRAWhitelist, we manually modified several queries since these queries are too “natural language” and the keywords of the queries seldom appear in the corpus. In MSRAExpansion, first we use the original queries to search the corpus. Then we extract the locations from the returned documents and calculate the times each location appears in the documents. Finally we will get the top 10 most frequent location names and combine them with the original geo-terms in the queries. However, this may introduce some unrelated locations. In MSRALocal, we do not use white list or query expansion method to expand the query locations. We just utilize our location extraction module to extract the locations automatically from the queries. In MSRAText, we just utilize our pure text search engine “IREngine” to process the queries. The experimental results show that MSRAManual is the best run among the five ones and then the MSRAWhitelist approach. MSRALocal and MSRAText perform similarly. The MSRAExpansion performs worst due to the introduced unrelated locations. One conclusion is that if we only extract the locations from the topics automatically, the retrieval performance does not improve significantly. Another conclusion is that automatic query expansion will weaken the performance. This is because the topics are too difficult to be handled and the corpus may be not large enough. Perhaps, the automatic query expansion may perform better in the web-scale corpus. And we find that if the queries are formed manually, the performance will be improved significantly.
منابع مشابه
MSRA Columbus at GeoCLEF2007
This paper describes the participation of Columbus Project of Microsoft Research Asia (MSRA) in GeoCLEF2007 (a cross-language geographical retrieval track which is part of Cross Language Evaluation Forum). This is the second time we participate in this event. Since the queries in GeoCLEF2007 are similar to those in GeoCLEF2006, we leverage most of the methods that we used in GeoCLEF2006, includ...
متن کاملExploring LDA-Based Document Model for Geographic Information Retrieval
Latent Dirichlet Allocation (LDA) model, a formal generative model, has been used to improve ad-hoc information retrieval recently. However, its feasibility and effectiveness for geographic information retrieval has not been explored. This paper proposes an LDA-based document model to improve geographic information retrieval by inheriting the LDA model with text retrieval model. The proposed mo...
متن کاملNICTA I2D2 Group at GeoCLEF 2006
We report on the experiments undertaken by the NICTA I2D2 Group as part of GeoCLEF 2006, as well as post-GeoCLEF evaluations and improvements to the submitted system. In particular, we used techniques to assign probabilistic likelihoods to geographic candidates for each identified geo-term, and a probabilistic IR engine. A normalisation process that adjusts term weights, so as to prevent expand...
متن کاملTALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing
This paper describes our experiments on the Geographical Query Parsing pilot-task for English at GeoCLEF 2007. Our system uses some modules of a Geographical Information Retrieval system presented at GeoCLEF 2006 [3] and modified for GeoCLEF 2007. The system uses deep linguistic analysis and Geographical Knowledge to perform the task.
متن کاملMonolingual Retrieval Experiments with Spatial Restrictions at GeoCLEF 2007
The participation of the University of Hildesheim focused on the monolingual German and English tasks of GeoCLEF 2007. Based on the results of GeoCLEF 2005 and GeoCLEF 2006, the weighting and expansion of geographic named entities (NE) and Blind Relevance Feedback were combined. This year an improved model for German Named Entity Recognition was evaluated.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006