Small in Size, Big in Precision: A Case for Using Language-Specific Lexical Resources for Word Sense Disambiguation
نویسندگان
چکیده
Linked open data (LOD) presents an ideal platform for connecting the multilingual lexical resources used in natural language processing (NLP) tasks, but the use of machine translation to fill in gaps in lexical coverage for resource-poor languages means that large amounts of data are potentially unverified. For graph-based word sense disambiguation (WSD), one approach has been to first translate terms into English in order to disambiguate using richer, fuller lexical knowledge bases (LKBs) such as WordNet. In this paper, we show that this approach actually creates more ambiguity and is far less accurate than using languagespecific resources, which, regardless of their smaller size, can provide results comparable in accuracy to the state-of-theart reported for graph-based WSD in English. For LOD, this demonstrates the importance of continuing to grow and extend language-specific resources in order to continually verify and reintegrate them as accurate resources.
منابع مشابه
Design and implementation of Persian spelling detection and correction system based on Semantic
Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors. Also developing Persian tools will provide Persian progr...
متن کاملAutomatic Construction of Persian ICT WordNet using Princeton WordNet
WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...
متن کاملPublished vs. Postgraduate Writing in Applied Linguistics: The Case of Lexical Bundles
Abstract: Lexical bundles, as building blocks of coherent discourse, have been the subject of much research in the last two decades. While many of such studies have been mainly concerned with exploring variations in the use of these word sequences across different registers and disciplines, very few have addressed the use of some particular groups of lexical bundles within some gen...
متن کاملUnsupervised Disambiguation for a Multilingual Medical Information System using UMLS
This paper describes techniques for unsupervised word sense disambiguation of English and German medical documents using the Unified Medical Language System (UMLS). We present both monolingual techniques which rely only on the structure of UMLS, and bilingual techniques which also rely on the availability of parallel corpora. The best results are obtained using relationships between terms given...
متن کاملSemi-Automatic Extension of Large-Scale Linguistic Knowledge Bases
Linguistic resources are essential for the success of many AI tasks. Building a new lexical resource from scratch or combining heterogeneous resources is not only complex and time-consuming, but can also lead to knowledge inconsistency and redundancy. In this paper, we present a novel method for the large-scale semantic enrichment of a computational linguistic resource. To this end, with the ai...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015