University of Hagen at CLEF 2005: Towards a Better Baseline for NLP Methods in Domain-Specific Information Retrieval

نویسنده

  • Johannes Leveling
چکیده

The third participation of the University of Hagen at the German Indexing and Retrieval Test (GIRT) task of the Cross Language Evaluation Campaign (CLEF 2005) aims at providing a better baseline for experiments with natural language processing (NLP) methods in domainspecific information retrieval (IR). Our monolingual experiments with the German document collection are based on a setup combining several methods to achieve a better performance. The setup includes an entry vocabulary module (EVM), query expansion with semantically related concepts, and a blind feedback technique. The monolingual experiments focus on comparing two techniques for constructing database queries: creating a ’bag of words’ and creating a semantic network by means of a syntactico-semantic parser for a deep linguistic analysis of the query. The best performance in the official experiments was achieved by a setup using staged logistic regression, a query expansion with semantically related concepts, an entry vocabulary module, a deep linguistic analysis of the query, and blind feedback (0.2875 mean average precision (MAP)). Additional experiments showed a performance improvement when changing to the basic Okapi BM25 search (0.3878 MAP). For the bilingual experiments, the English topics are translated into German queries with several machine translation services available online (Systran, Free translation, WorldLingo, and Promt). Each set of translated topics is processed separately with the same techniques as in the monolingual experiments. The best performance was achieved with a query translation by Promt with a simple keyword extraction from the translation (0.2399 MAP with a staged logistic regression approach vs. 0.2807 MAP with Okapi BM25).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dublin City University at CLEF 2007: Cross Language Speech Retrieval (CL-SR) Experiments

The Dublin City University participated in the CLEF 2007 CL-SR English task. For CLEF 2007 we concentrated primarily on the issues of topic translation, combining this with search field combination and pseudo relevance feedback methods used for our CLEF 2006 submissions. Topics were translated into English using the Yahoo! BabelFish free online translation service combined with domain-specific ...

متن کامل

University of Hagen at CLEF 2004: Indexing and Translating Concepts for the GIRT Task

This paper describes the work done at the University of Hagen for our participation at the German Indexing and Retrieval Test (GIRT) task of the CLEF 2004 evaluation campaign. We conducted both monolingual and bilingual information retrieval experiments. For monolingual experiments with the German document collection, the focus is on applying and comparing three indexing methods targeting full ...

متن کامل

University of Hagen at CLEF2006: Reranking Documents for the Domain-specific Task

This paper describes the participation of the IICS group at the domain-specific task (GIRT) of the CLEF campaign 2006. The focus of our retrieval experiments is on trying to increase precision by reranking documents in an initial result set. The reranking method is based on antagonistic terms, i.e. terms with a semantics different from the terms in a query, for example antonyms or cohyponyms of...

متن کامل

Applying Light Natural Language Processing to Ad-Hoc Cross Language Information Retrieval

In the CLEF 2005 Ad-Hoc Track we experimented with language-specific morphosyntactic processing and light Natural Language Processing (NLP) for the retrieval of Bulgarian, French, Italian, English and Greek.

متن کامل

Domain-Specific Russian Retrieval: A Baseline Approach

Berkeley group 2 chose to perform some very straightforward experiments in retrieval of Russian documents using queries derived from topics in all three languages. Thus we performed two runs with monolingual Russian retrieval and one cross-lingual run each with German topics and English topics. Query translation was done using the online PROMT translator (www.translate.ru). Monolingual results ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005