نتایج جستجو برای: mizan english persian parallel corpus

تعداد نتایج: 413519  

2013
Francisco Guzman Hassan Sajjad Stephan Vogel Ahmed Abdelali

In this paper, we introduce a new parallel corpus of subtitles of educational videos: the AMARA corpus for online educational content. We crawl a multilingual collection community generated subtitles, and present the results of processing the Arabic–English portion of the data, which yields a parallel corpus of about 2.6M Arabic and 3.9M English words. We explore different approaches to align t...

Disciplinary studies on metadiscourse in academic texts have come a rather long way (since the 1980s) to afford an awareness of the ways authors strive to signal their insights into their materials as well as their audience. However, few comprehensive corpus-based studies to date have provided a starting point for shaping our understanding of subdisciplinary and paradigmatic diversities within ...

2016
Michal Ziemski Marcin Junczys-Dowmunt Bruno Pouliquen

This paper describes the creation process and statistics of the official United Nations Parallel Corpus, the first parallel corpus composed from United Nations documents published by the original data creator. The parallel corpus presented consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russi...

Journal: :Language Resources and Evaluation 2011
Guy De Pauw Peter Waiganjo Wagacha Gilles-Maurice de Schryver

Research in machine translation and corpus annotation has greatly benefited from the increasing availability of word-aligned parallel corpora. This paper presents ongoing research on the development and application of the SAWA corpus, a two-million-word parallel corpus English—Swahili. We describe the data collection phase and zero in on the difficulties of finding appropriate and easily access...

2010
Guy De Pauw Peter Waiganjo Wagacha Gilles-Maurice de Schryver

Even though the Bantu language of Swahili is spoken by more than fifty million people in East and Central Africa, it is surprisingly resource-scarce from a language technological point of view, an unfortunate situation that holds for most, if not all languages on the continent. The increasing amount of digitally available, vernacular data has prompted researchers to investigate the applicabilit...

2013
Hui-Chuan Lu Yu-Hsin Chu

In the development of corpus linguistics, the creation of corpora has had a critical role in corpus-based studies. The majority of created corpora have been associated with English and native languages, while other languages and types of corpora have received relatively less attention. Because an increasing number of corpora have been constructed, and each corpus is constructed for a definite p...

2013
Anup Kumar Kolya Santanu Pal Asif Ekbal Sivaji Bandyopadhyay

This paper proposes the impacts of event and event actor alignment in English and Bengali phrase based Statistical Machine Translation (PB-SMT) System. Initially, events and event actors are identified from English and Bengali parallel corpus. For events and event actor identification in English we proposed a hybrid technique and it was carried out within the TimeML framework. Events in Bengali...

2014
Liang Tian Derek F. Wong Lidia S. Chao Paulo Quaresma Francisco Oliveira Lu Yi

Parallel corpus is a valuable resource for cross-language information retrieval and data-driven natural language processing systems, especially for Statistical Machine Translation (SMT). However, most existing parallel corpora to Chinese are subject to in-house use, while others are domain specific and limited in size. To a certain degree, this limits the SMT research. This paper describes the ...

2012
Tayebeh Mosavi Miangah Ali Delavar khalafi

In this paper we present a rather novel unsupervised method for part of speech (below POS) disambiguation which has been applied to Persian. This method known as Iterative Improved Feedback (IIF) Model, which is a heuristic one, uses only a raw corpus of Persian as well as all possible tags for every word in that corpus as input. During the process of tagging, the algorithm passes through sever...

2011
Patricia Sotelo Dios

This paper presents an ongoing research project that involves the compilation and exploitation of the multimedia corpus of subtitled films Veiga as a method to investigate the practice of English intralingual subtitling and English-Galician interlingual subtitling. Our project draws on recent work in corpus-based translation studies and its applications in the field of audiovisual translation a...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید