mizan english persian parallel corpus

Investigating Frequency and Distribution of Transition Markers in English and Persian Research Articles in Applied Linguistics: Focusing on Their Introduction Sections

2012

Leila Bahrami

The pressure to produce work in English and to publish internationally has increased over recent years. However, a large number of non-native writers may be excluded from the web of global scholarship due to defective rhetorical organizations and discourse structures of their works. This study aimed to investigate frequency and distribution of transition markers (TMs) in introduction sections o...

متن کامل

The CNGL-DCU-Prompsit Translation Systems for WMT13

2013

Raphaël Rubino Antonio Toral Santiago Cortes Vaíllo Jun Xie Xiaofeng Wu Stephen Doherty Qun Liu

This paper presents the experiments conducted by the Machine Translation group at DCU and Prompsit Language Engineering for the WMT13 translation task. Three language pairs are considered: SpanishEnglish and French-English in both directions and German-English in that direction. For the Spanish-English pair, the use of linguistic information to select parallel data is investigated. For the Fren...

متن کامل

A Statistical Approach to Persian Light Verb Constructions

2008

Kim Gerdes

This article presents the linguistic bases of Persian light verb constructions and shows the corpus based construction of lists of collocates for some common Persian verbs. The proposed methods of corpus construction are language independent and the good results on a relatively small corpus of 20 million words confirms the power of association measures based on the hypergeometric distribution. ...

متن کامل

Improving Persian Text Classification and Clustering Using Persian Thesaurus

2012

Hamid Parvin Atousa Dahbashi Sajad Parvin Behrouz Minaei-Bidgoli

This paper proposes an innovative approach to improve the classification performance of Persian texts. The proposed method uses a thesaurus as a helpful knowledge to obtain more representative word-frequencies in the corpus. Two types of word relationships are considered in our used thesaurus. This is the first attempt to use a Persian thesaurus in the field of Persian information retrieval. Ex...

متن کامل

The Scielo Corpus: a Parallel Corpus of Scientific Publications for Biomedicine

2016

Mariana L. Neves Antonio Jimeno-Yepes Aurélie Névéol

The biomedical scientific literature is a rich source of information not only in the English language, for which it is more abundant, but also in other languages, such as Portuguese, Spanish and French. We present the first freely available parallel corpus of scientific publications for the biomedical domain. Documents from the ”Biological Sciences” and ”Health Sciences” categories were retriev...

متن کامل

word representation or word embedding in Persian text

Journal: :CoRR 2017

Siamak Sarmady Erfan Rahmani

(Abstract) Text processing is one of the sub-branches of natural language processing. Recently, the use of machine learning and neural networks methods has been given greater consideration. For this reason, the representation of words has become very important. This article is about word representation or converting words into vectors in Persian text. In this research GloVe, CBOW and skip-gram ...

متن کامل

Constructing a Turkish-English Parallel TreeBank

2014

Olcay Taner Yildiz Ercan Solak Onur Görgün Razieh Ehsani

In this paper, we report our preliminary efforts in building an English-Turkish parallel treebank corpus for statistical machine translation. In the corpus, we manually generated parallel trees for about 5,000 sentences from Penn Treebank. English sentences in our set have a maximum of 15 tokens, including punctuation. We constrained the translated trees to the reordering of the children and th...

متن کامل

a comparative pragmatic analysis of the speech act of “disagreement” across english and persian

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه یزد - دانشکده زبانهای خارجی 1391

آیدا الهامیان, حمید علامی, علی محمد فضیلت فر,

the speech act of disagreement has been one of the speech acts that has received the least attention in the field of pragmatics. this study investigates the ways power relations, social distance, formality of the context, gender, and language proficiency (for efl learners) influence disagreement and politeness strategies. the participants of the study were 200 male and female native persian s...

15 صفحه اول

Deontic Modality in Lithuanian Translations of EU Legislation

Journal: :Vertimo studijos 2022

The present article is a corpus-driven investigation into deontic modality in the Lithuanian translations of EU legislation. It builds on data EUR-Lex2/2016 parallel corpus. discusses pivotal points and its realisation English Lithuanian. then presents corpus most frequent lexical bundles through which expressed subcorpus. discussion section analyses how source text were translated

متن کامل

Study on the English Corresponding Unit of Chinese Clause

2016

Wenhe Feng Yi Yang Yancui Li Xia Li Han Ren

This paper annotates the English corresponding units of Chinese clauses in Chinese-English translation and statistically analyzes them. Firstly, based on Chinese clause segmentation, we segment English target text into corresponding units (clause) to get a Chinese-to-English clause-aligned parallel corpus. Then, we annotate the grammatical properties of the English corresponding clauses in the ...

متن کامل