نتایج جستجو برای: mizan english persian parallel corpus

تعداد نتایج: 413519  

2013
Laxmi Kashyap Malhar Kulkarni

Bilingual corpora play an important role as resources not only for machine translation research and development but also for studying tasks in comparative linguistics. Manual annotation of word alignments is of significance to provide a gold-standard for developing and evaluating machine translation models and comparative linguistics tasks. This paper presents research on building an English-Vi...

2011
Špela Vintar Darja Fišer

The paper describes an innovative approach to expanding the domain coverage of wordnet by exploiting multiple resources. In the experiment described here we are using a large monolingual Slovene corpus of texts from the domain of informatics to harvest terminology from, and a parallel English-Slovene corpus and an online dictionary as bilingual resources to facilitate the mapping of terms to th...

2013
Quoc Hung Ngo Werner Winiwarter Bartholomäus Wloka

Bilingual corpora play an important role as resources not only for machine translation research and development but also for studying tasks in comparative linguistics. Manual annotation of word alignments is of significance to provide a gold-standard for developing and evaluating machine translation models and comparative linguistics tasks. This paper presents research on building an English-Vi...

Journal: :CoRR 2017
Zahra Mousavi Heshaam Faili

This paper presents an automated supervised method for Persian wordnet construction. Using a Persian corpus and a bi-lingual dictionary, the initial links between Persian words and Princeton WordNet synsets have been generated. These links will be discriminated later as correct or incorrect by employing seven features in a trained classification system. The whole method is just a classification...

2003
Martin Čmejrek Jan Cuřín Jiří Havelka

We present an approach using treebanks in machine translation. Our experiment in Czech-English machine translation is an attempt to develop a full machine translation system based on dependency trees (Dependency Based Machine Translation, DBMT). We use the following resources: Prague Dependency Treebank, a newly created Czech-English parallel corpus of Penn Treebank, English monolingual corpus,...

2010
Orphée De Clercq Maribel Montero Perez

After three years of work the Dutch Parallel Corpus (DPC) project has reached an end. The finalized corpus is a ten-million-word high-quality sentence-aligned bidirectional parallel corpus of Dutch, English and French, with Dutch as central language. In this paper we present the corpus and try to formulate some basic data collection principles, based on the work that was carried out for the pro...

Journal: :LLC 2014
Sara Laviosa

Firstly, Lidun Hareide and Knut Hofland describe through practical advice the compilation process of The Norwegian Spanish Parallel Corpus (NSPC) created at the University of Bergen (Norway), as well as preliminary findings from ongoing and planned research based on it. The corpus is primarily constructed for research in Translation Studies, and is built to be roughly comparable to the Spanish-...

Journal: :Fundam. Inform. 2014
Cuong Hoang Anh-Cuong Le Phuong-Thai Nguyen Son Bao Pham Tu-Bao Ho

Automatically building a large bilingual corpus that contains millions of words is always a challenging task. In particular in case of low-resource languages, it is difficult to find an existing parallel corpus which is large enough for building a real statistical machine translation. However, comparable non-parallel corpora are richly available in the Internet environment, such as in Wikipedia...

2010
Do Thi Ngoc Diep Laurent Besacier Eric Castelli

This paper presents an unsupervised method for extracting parallel sentence pairs from a comparable corpus. A translation system is used to mine the comparable corpus and to detect parallel sentence pairs. An iterative process is implemented not only to increase the number of extracted parallel sentence pairs but also to improve the overall quality of the translation system. A comparison betwee...

2007
William H. Fletcher

This paper details the author’s plans for and progress with compiling and analyzing a new gigaword English corpus from the web to complement his BNC-based online database “Phrases in English”. This new corpus represents the principal English-speaking countries in proportion to their population and will be linguistically annotated with the CLAWS4 tagger using a PoS-tagset comparable to those of ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید