Deploying Semantic Resources for Open Domain Question Answering
نویسندگان
چکیده
This thesis investigates how semantic resources can be deployed to improve the accuracy of an open domain question answering (QA) system. In particular, two types of semantic resources have been utilized to answer factoid questions: (1) Semantic parsing techniques are applied to analyze questions for semantic structures and to find phrases in the knowledge source that match these structures. (2) Ontologies are used to extract terms from questions and corpus sentences and to enrich these terms with semantically similar concepts. These resources have been integrated in the Ephyra QA framework and were compared to previously developed syntactic answer extraction approaches. A semantic extractor for factoid answers was devised that generates semantic representations of the question and phrases in the corpus and extracts answer candidates from phrases that are similar to the question. Different query generation techniques are used to retrieve relevant text passages from the corpus, ranging from simple keyword queries over compound terms expanded with synonyms to specific query strings built from predicate-argument structures. A fuzzy similarity metric compares semantic structures at the level of key terms by measuring their pairwise syntactic and semantic similarities and aggregates these term similarities into an overall similarity score. This mechanism is flexible and robust to parsing errors and it maximizes the recall of the semantic answer extractor. Score normalization and combination techniques allow merging answer candidates found with different semantic and syntactic extraction strategies. Several ontologies are used to extract compound terms from questions and answer sentences and to expand terms with alternative representations. (1) A framework for domain-specific ontologies allows integrating expert knowledge on restricted domains. (2) WordNet is used as an open-domain resource of ontological knowledge. (3) A new approach for automatically learning semantic relation between entities and events in a textual corpus is introduced. Semantic structures are extracted from the corpus with a semantic parser and are subsequently transformed into a semantic network that reveals relations between the entities and events in the corpus. These semantic query generation and answer extraction techniques were assessed on factoid questions from past TREC evaluations using the Web as a large open domain corpus, as well as a local domain-specific document collection. The evaluation results show that the semantic extraction approach has a higher precision than Ephyra’s syntactic answer extractors and that a hybrid approach of semantic and syntactic answer extractors outperforms each individual technique. Furthermore, the query expansion techniques can be combined with existing syntactic extractors to boost their accuracy.
منابع مشابه
ScoQAS: A Semantic-based Closed and Open Domain Question Answering System
Question Answering (QA) has reappeared in research activities and in companies over the past years. We present an architecture of Semantic-based closed and open domain Question Answering System (ScoQAS ) over ontology resources (not free text) with two different prototyping: Ontology-based closed domain and an open domain under Linked Open Data (LOD) resource. Both scenarios are presented, disc...
متن کاملInvestigating Embedded Question Reuse in Question Answering
The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملDo it your own (DIY) Jeopardy Question Answering System
The evolution and maturity of semantic technologies techniques and frameworks are bringing functionalities which were once considered academic or prototypical into real-life applications. Products such as IBM Watson [1] and Siri are examples of applications which are heavily leveraged on state-of-the-art semantic technologies. These systems provide a synthesis of the functionalities which are a...
متن کاملBoosting Passage Retrieval through Reuse in Question Answering
Question Answering (QA) is an emerging important field in Information Retrieval. In a QA system the archive of previous questions asked from the system makes a collection full of useful factual nuggets. This paper makes an initial attempt to investigate the reuse of facts contained in the archive of previous questions to help and gain performance in answering future related factoid questions. I...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007