Lexicon-Driven Machine Translation
نویسندگان
چکیده
Machine Translation (MT) systems have historically relied upon explicit grammars in order to analyze the source text and reproduce it in the target language. In this paper, we argue for a style of MT in which the focus of processing is at the level of the lexicon, rather than the grammar. This approach to translation allows an analyzer to map source sentences into an interlingual form, which then can be mapped (perhaps after intermediate inferencing steps) back into target sentence(s) which are paraphrase-equivalent to the original. Advantages of the approach include: 1) the possibility for different paraphrases of the original; 2) the capability for multi-sentence expression of the original when no single word (e.g., a verb) exists in the target language which spans the same meaning complex as a word in the source; 3) a uniform approach to word sense disambiguation and anaphoric reference resolution; and, most importantly, 4) the possibility for robust handling of ungrammatical and ellipsed source text. 1. 0 Introduction: Lexicon-Driven Machine Translation Systems designed for the Machine Translation (MT) of texts between languages have traditionally relied upon explicit grammars of both the source and target languages, in order to analyze the source text and produce well-formed target-language sentences ( c f . , e.g., [Tuck84]). Grammar-based systems have been reasonably successful at production-quality MT. The nature of grammatically driven processing leads to certain problems, however. First of all, an explicit grammar tends to make the system’s computation excessively topdown. Thus, it is usually not particularly robust under deviant ( e .g . , ungrammatical, telegraphic or ellipsed) input. Moreover, explicit-grammar approaches tend to be overly concerned with the form of the language, rather than its content. Issues of preservation of meaning between source and target texts tend to get downgraded. In this paper, we shall argue for an alternative style of MT in which the focus of processing for both input and output texts is at the level of the lexicon, i . e . , the words and phrases of a language, rather than its grammar. No extensive experimentation in MT has been performed within t h i s paradigm ( but see, for example, [Wile81, Lyti84]). We shall suggest, however, that the approach very naturally allows for meaning-preserving MT, and provides solutions for di f f i c u l t problems such as word-sense disambiguation, anaphora resolution, the need for circumlocution when lexical equivalents of source words are not available, etc. Language analysis, in particular, has a very strong bottom-up nature in the approach to be described. Such analyzers [e . g . , Ries75, Birn81, Cull84, Dyer84] tend to produce fragmented meaning structures for ungrammatical or ellipsed inputs. Thus, there is the possibility for diagnosis of the fragments, in order to determine a reasonable reading of the input [e . g . , Boot85a, Boot85b]. We illustrate the approach with a toy MT system which translates Ukrainian texts into English. The overall methodology for designing language interfaces in this paradigm is detailed in [Cull85]. * The research described here was performed while this author was visiting the EE&CS Department, Princeton University, Princeton NJJ08544.
منابع مشابه
Corpus-Driven Bilingual Lexicon Extraction
This paper introduces some key aspects of machine translation in order to situate the role of the bilingual lexicon in transfer-based systems. It then discusses the data-driven approach to extracting bilingual knowledge automatically from bilingual texts, tracing the processes of alignment at different levels of granularity. The paper concludes with some suggestions for future work. 1 Machine T...
متن کاملCreating Term and Lexicon Entries from Phrase Tables
It is a common understanding that machine translation systems need to be adapted to the domain and text type they are supposed to translate. For knowledge-driven systems, such adaptation is done by means of lexicon update: The domain terminology is identified, and coded as a special additional lexicon repository, loaded at runtime. In the age of data-driven technology, terminology is extracted ...
متن کاملIterative refinement of lexicon and phrasal alignment
In a data-driven machine translation system, the lexicon is a core component. Sometimes it is used directly in translation, and sometimes in building other resources, such as a phrase table. But up to now little attention has been paid to how the information contained in these resources can also used backwards to help build or improve the lexicon. The system we propose here alternates lexicon b...
متن کاملA Comparison of Various Types of Extended Lexicon Models for Statistical Machine Translation
In this work we give a detailed comparison of the impact of the integration of discriminative and trigger-based lexicon models in state-ofthe-art hierarchical and conventional phrasebased statistical machine translation systems. As both types of extended lexicon models can grow very large, we apply certain restrictions to discard some of the less useful information. We show how these restrictio...
متن کاملAutomated Translation between Lexicon and Corpora
In this work we will show the role of lexical resources in machine translation processes, giving several examples after a brief overview of Machine Translation studies. Then we will advocate the need for a richer lexicon in MT processes and sketch a methodology to obtain it through a mix of corpus-based and machine learning approaches.
متن کاملImproving the Performance of an Example-Based Machine Translation System Using a Domain-specific Bilingual Lexicon
In this paper, we study the impact of using a domain-specific bilingual lexicon on the performance of an Example-Based Machine Translation system. We conducted experiments for the EnglishFrench language pair on in-domain texts from Europarl (European Parliament Proceedings) and out-of-domain texts from Emea (European Medicines Agency Documents), and we compared the results of the Example-Based ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007