ITRI-00-37 Semi-automatic construction of multilingual lexicons

نویسنده

  • Lynne Cahill
چکیده

The construction of lexicons for NLP applications is a potentially very expensive task, but a crucially important one, especially in multilingual applications. The automation of the task from generic data sources or corpora is as yet largely impractical for most applied systems. In this paper we describe a methodology for the semi-automation of the task, used in the CLIME project to develop bilingual lexicons for generation in a restricted domain. We go on to discuss ways in which the same methodology has been used to develop lexicons for a range of applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward an Architecture for the Global Wordnet Initiative

— Enhancing the development of multilingual lexicons is of foremost importance for intercultural collaboration to take place, as multilingual lexicons are the cornerstone of several multilingual applications. However, the development and maintenance of large-scale, robust multilingual dictionaries is a tantalizing task. Moreover, Semantic Web's growing interest towards the availability of high-...

متن کامل

Fostering Intercultural Collaboration: A Web Service Architecture for Cross-Fertilization of Distributed Wordnets

Enhancing the development of multilingual lexicons is of foremost importance for intercultural collaboration to take place, as multilingual lexicons are the cornerstone of several multilingual applications. However, the development and maintenance of large-scale, robust multilingual dictionaries is a tantalizing task. In this paper we present a tool, based on a web service architecture, enablin...

متن کامل

Two Principles and Six Techniques for Rapid Mt Development

In this paper we describe a range of techniques used at NMSU CRL for accelerating the development of MT systems. These techniques enable semi-automatic development of a number of components of a multilingual MT system, thereby enabling rapid deployment of MT capabilities in a new language. First, we describe the core multi-engine, multilingual architecture that enables the different techniques ...

متن کامل

Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages

The last two decades have seen the development of various semantic lexical resources such as WordNet (Miller, 1995) and the USAS semantic lexicon (Rayson et al., 2004), which have played an important role in the areas of natural language processing and corpus-based studies. Recently, increasing efforts have been devoted to extending the semantic frameworks of existing lexical knowledge resource...

متن کامل

Leveraging Parallel Corpora and Existing Wordnets for Automatic Construction of the Slovene Wordnet

The paper reports on a series of experiments conducted in order to test the feasibility of automatically generating synsets for Slovene wordnet. The resources used were the multilingual parallel corpus of George Orwell’s Nineteen Eighty-Four and wordnets for several languages. First, the corpus was word-aligned to obtain multilingual lexicons and then these lexicons were compared to the wordnet...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000