persian parallel corpus

Collocation Extraction using Parallel Corpus

2012

Kavosh Asadi Atui Heshaam Faili Kaveh Assadi Atuie

This paper presents a novel method to extract the collocations of the Persian language using a parallel corpus. The method is applicable having a parallel corpus between a target language and any other high-resource one. Without the need for an accurate parser for the target side, it aims to parse the sentences to capture long distance collocations and to generate more precise results. A traini...

متن کامل

a hybrid accurate alignment method for large persian-english corpus construction based on statistical analysis and lexicon/persian word net

Journal: :international journal of information science and management 0

mohammad bagher dastgheib ph.d. candidate department of computer science and engineering, shiraz university, shiraz, iran seyed mostafa fakhrahmad department of computer science and engineering, shiraz university, shiraz, iran mansour zolghadri jahromi department of computer science and engineering, shiraz university, shiraz, iran

a bilingual corpus is considered as a very important knowledge source and an inevitable requirement for many natural language processing (nlp) applications in which two languages are involved. for some languages such as persian, lack of such resources is much more significant. several applications, including statistical and example-based machine translation needs bilingual corpora, in which lar...

متن کامل

Building and Incorporating Language Models for Persian Continuous Speech Recognition Systems

2006

Mohammad Bahrani Hossein Sameti Nazila Hafezi H. Movassagh

In this paper building statistical language models for Persian language using a corpus and incorporating them in Persian continuous speech recognition (CSR) system are described. We used Persian Text Corpus for building the language models. First we preprocessed the texts of corpus by correcting the different orthography of words. Also, the number of POS tags was decreased by clustering POS tag...

متن کامل

PersianSMT: A first attempt to English-Persian Statistical Machine Translation

2010

Mohammad Taher Pilevar Heshaam Faili

In this paper, an attempt to develop a phrase-based statistical machine translation between English and Persian languages (PersianSMT) is described. Creation of the largest English-Persian parallel corpus yet presented by the use of movie subtitles is a part of this work. Two major goals are followed here: the first one is to show the main problems observed in the output of the PersianSMT syste...

متن کامل

A hidden Markov model for Persian part-of-speech tagging

2011

Morteza Okhovvat Behrouz Minaei-Bidgoli

One of the important actions in the processing of languages is part-of-speech tagging. Against of this importance, although numerous models have been presented in different languages but there is few works have been done in Persian language. In this paper, a part-of-speech tagging system on Persian corpus by using hidden Markov model is proposed. Achieving to this goal, the main aspects of Pers...

متن کامل

a decsription of persian deixis

پایان نامه :0 1375

پروانه فرخنده, محمد دبیرمقدم,

the significance of the study of deixis was then mentioned. the purpose of the present study from the outset was to provide a comprehensive overview of all kinds of deixis in persian, describing and defining each in true while considering them structurally and semantically. chapter two consisted of two main parts. a review of the english studies in this respect, besides presenting persian liter...

15 صفحه اول

Developing Monolingual Persian Corpus for Extrinsic Plagiarism Detection Using Artificial Obfuscation: Notebook for PAN at CLEF 2015

2015

Khadijeh Khoshnavataher Vahid Zarrabi Salar Mohtaj Habibollah Asghari

The task of text alignment corpus construction at PAN 2015 competition consists of preparing a plagiarism corpus so that it can provide various obfuscation types and versatile obfuscation degrees. Meanwhile, its format and metadata structure should follow previous PAN plagiarism corpora. In this paper, we describe our approach for construction of a monolingual Persian plagiarism corpus that can...

متن کامل

A Persian Part-Of-Speech Tagger Based on Morphological Analysis

2010

Mahdi Mohseni Behrouz Minaei-Bidgoli

This paper describes a method based on morphological analysis of words for a Persian Part-Of-Speech (POS) tagging system. This is a main part of a process for expanding a large Persian corpus called Peyekare (or Textual Corpus of Persian Language). Peykare is arranged into two parts: annotated and unannotated parts. We use the annotated part in order to create an automatic morphological analyze...

متن کامل

Overview of the 3rd Author Profiling Task at PAN 2015

2015

Francisco M. Rangel Pardo Fabio Celli Paolo Rosso Martin Potthast Benno Stein Walter Daelemans

In this paper we describe and evaluate the corpora submitted to the PAN 2015 shared task on plagiarism detection for text alignment. We received monoand cross-language corpora in the following languages (pairs): English, Persian, Chinese, and Urdu-English, English-Persian. We present an independent section for each submitted corpus including statistics, discussion of the obfuscation techniques ...

متن کامل

PAN 2015 Shared Task on Plagiarism Detection: Evaluation of Corpora for Text Alignment: Notebook for PAN at CLEF 2015

2015

Marc Franco-Salvador Imene Bensalem Enrique Flores Parth Gupta Paolo Rosso

In this paper we describe and evaluate the corpora submitted to the PAN 2015 shared task on plagiarism detection for text alignment. We received monoand cross-language corpora in the following languages (pairs): English, Persian, Chinese, and Urdu-English, English-Persian. We present an independent section for each submitted corpus including statistics, discussion of the obfuscation techniques ...

متن کامل