bilingual lexicon

Extracting Invertible Translations from pre aligned Texts

2010

Michael Carl

This paper presents an approach to extract invert ible translations from pre aligned bilingual texts The extracted set of invertible translations is unam biuous because each string occurs only once in either language side Two variants of the algorithms are presented using di erent knowledge resources The knowledge rich variant of the algorithm makes use of a bilingual lexicon in addition to a m...

متن کامل

Minimally Supervised Multilingual Taxonomy and Translation Lexicon Induction

2008

Nikesh Garera David Yarowsky

We present a novel algorithm for the acquisition of multilingual lexical taxonomies (including hyponymy/hypernymy, meronymy and taxonomic cousinhood), from monolingual corpora with minimal supervision in the form of seed exemplars using discriminative learning across the major WordNet semantic relationships. This capability is also extended robustly and effectively to a second language (Hindi) ...

متن کامل

Lexical Functions And Machine Translation

1994

Dirk Heylen Kerry G. Maxwell Marc Verhagen

This paper discusses the lexicographical concept of lexical functions (Mel'~uk and Zolkovsky, 1984) and their potential exploitation in the development of a machine translation lexicon designed to handle collocations. We show how lexical functions can be thought to reflect cross-linguistic meaning concepts for collocational structures and their translational equivalents, and therefore suggest t...

متن کامل

Measuring Comparability of Multilingual Corpora Extracted from Wikipedia

2011

Pablo Gamallo Otero Issac González López

Comparable corpora can be used for many linguistic tasks such as bilingual lexicon extraction. By improving the quality of comparable corpora, we improve the quality of the extraction. This article describes some strategies to build comparable corpora from Wikipedia and proposes a measure of comparability. Experiments were performed on Portuguese, Spanish, and English Wikipedia.

متن کامل

Tools and Methods for Computational Lexicology

Journal: :Computational Linguistics 1987

Roy J. Byrd Nicoletta Calzolari Martin Chodorow Judith L. Klavans Mary S. Neff Omneya A. Rizk

This paper presents a set of tools and methods for acquiring, manipulating, and analyzing machine-readable dictionaries. We give several detailed examples of the use of these tools and methods for particular analyses. A novel aspect of our work is that it allows the combined processing of multiple machine-readable dictionaries. Our examples describe analyses of data from Webster's Seventh Colle...

متن کامل

Cross-Lingual Semantic Similarity of Words as the Similarity of Their Semantic Word Responses

2013

Ivan Vulic Marie-Francine Moens

We propose a new approach to identifying semantically similar words across languages. The approach is based on an idea that two words in different languages are similar if they are likely to generate similar words (which includes both source and target language words) as their top semantic word responses. Semantic word responding is a concept from cognitive science which addresses detecting mos...

متن کامل

Bilingual Lexicon Extraction from Comparable Corpora Using Label Propagation

2012

Akihiro Tamura Taro Watanabe Eiichiro Sumita

This paper proposes a novel method for lexicon extraction that extracts translation pairs from comparable corpora by using graphbased label propagation. In previous work, it was established that performance drastically decreases when the coverage of a seed lexicon is small. We resolve this problem by utilizing indirect relations with the bilingual seeds together with direct relations, in which ...

متن کامل

Ranking Translation Candidates Acquired from Comparable Corpora

2013

Rima Harastani Béatrice Daille Emmanuel Morin

Domain-specific bilingual lexicons extracted from domain-specific comparable corpora provide for one term a list of ranked translation candidates. This study proposes to re-rank these translation candidates. We suggest that a term and its translation appear in comparable sentences that can be extracted from domainspecific comparable corpora. For a source term and a list of translation candidate...

متن کامل

Detecting Highly Confident Word Translations from Comparable Corpora without Any Prior Knowledge

2012

Ivan Vulic Marie-Francine Moens

In this paper, we extend the work on using latent cross-language topic models for identifying word translations across comparable corpora. We present a novel precisionoriented algorithm that relies on per-topic word distributions obtained by the bilingual LDA (BiLDA) latent topic model. The algorithm aims at harvesting only the most probable word translations across languages in a greedy fashio...

متن کامل

Constraint-Based Bilingual Lexicon Induction for Closely Related Languages

2016

Arbi Haza Nasution Yohei Murakami Toru Ishida

The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction becomes a difficult task for low-resource languages. Pivot language and cognate recognition approach have been proven useful to induce bilingual lexicons for such languages. We analyze the features of closely related languages and define a semantic constraint assumption. Based on the assumption, we propose...

متن کامل