Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora
نویسندگان
چکیده
In this paper, we propose the Automatic Taxonomy Construction from Text (ATCT) framework for building taxonomies from text-based Web corpora. The framework is composed of multiple processing steps. Firstly, domain terms are extracted using a filtering method. Subsequently, Word Sense Disambiguation (WSD) is optionally applied in order to determine the senses of these terms. Then, by means of a subsumption technique, the resulting concepts are arranged in a hierarchy. We construct taxonomies with and without WSD and we investigate the effect of WSD on the quality of concept type-of relations using an evaluation framework that uses a golden taxonomy. We find that WSD improves the quality of the built taxonomy in terms of the taxonomic F-Measure.
منابع مشابه
A semantic approach for extracting domain taxonomies from text
In this paper we present a framework for the automatic building of a domain taxonomy from text corpora, called Automatic Taxonomy Construction from Text (ATCT). This framework comprises four steps. First, terms are extracted from a corpus of documents. From these extracted terms the ones that are most relevant for a specific domain are selected using a filtering approach in the second step. Thi...
متن کاملAn Automatic Method for Generating Sense Tagged Corpora
The unavailability of very large corpora with semantically disambiguated words is a major limitation in text processing research. For example, statistical methods for word sense disambiguation of free text are known to achieve high accuracy results when large corpora are available to develop context rules, to train and test them. This paper presents a novel approach to automatically generate ar...
متن کاملEmpirical Textual Mining to Protein Entities Recognition from PubMed Corpus
Wednesday, June 15th 8:00 Conference Registration (Registration desk) 8:45 Session 1: Large-Scale Online Linguistic Resources (I) Chair: "Text Categorization Based on Subtopic Clusters" Francis Chik, Robert Luk, Korris Chung "Automatic Filtering of Bilingual Corpora for Statistical Machine Translation" Shahram Khadivi, Hermann Ney "The Role of Word Sense Disambiguation in Automated Text Categor...
متن کاملInforex - a web-based tool for text corpus management and semantic annotation
The aim of this paper is to present a system for semantic text annotation called Inforex. Inforex is a web-based system designed for managing and annotating text corpora on the semantic level including annotation of Named Entities (NE), anaphora, Word Sense Disambiguation (WSD) and relations between named entities. The system also supports manual text clean-up and automatic text pre-processing ...
متن کاملAutomatic Acquisition of Sense Tagged Corpora
An important problem in Natural Language Processing is identifying thecorrect sense of a word in a particular context. Thus far, statistical methods have been considered the best techniques in word sense disambiguation. Unfortunately, these methods produce high accuracy results only for a small number of preselected words. The reduced applicability of statistical methods is due basically to the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011