persian parallel corpus

Persian Word Sense Disambiguation Corpus Extraction Based on Web Crawler Method

2015

Mohamadreza Mahmoodvand Maryam Hourali

Finding an appropriate dataset for natural language processing applications is one of the main challenges for researches of this field. This issue is more problematic in Non-Latin languages especially Persian language. Access to an appropriate dataset that can be used in development of practical programs in language processing field, helps us to validate the obtained results and provide the fea...

متن کامل

Towards Automatic Acquisition of a Fully Sense Tagged Corpus for Persian

2011

Bahareh Sarrafzadeh Nikolay Yakovets Nick Cercone Aijun An

Sense tagged corpora play a crucial role in Natural Language Processing, particularly in Word Sense Disambiguation and Natural Language Understanding. Since semantic annotations are usually performed by humans, such corpora are limited to a handful of tagged texts and are not available for many languages with scarce resources including Persian. The shortage of efficient, reliable linguistic res...

متن کامل

A Cross-Cultural Approach to Contrasting Offers in English and Persian

2013

Amin Karimnia

This study is an attempt to carry out a comparative analysis using Natural Semantic Metalanguage (henceforth NSM). The offering routine patterns of native Persian speakers was compared with that of Native American English speakers to see if it can provide evidence for applicability of NSM model which is claimed to be universal. The descriptive technique was the cultural scripts approach, using ...

متن کامل

PersoNER: Persian Named-Entity Recognition

2016

Hanieh Poostchi Ehsan Zare Borzeshi Mohammad Abdous Massimo Piccardi

Named-Entity Recognition (NER) is still a challenging task for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. To abridge this gap, in this paper we target the Persian language that is spoken by a population of over a hundred million people world-wide. We first present ...

متن کامل

Linguistic Issues in Language Technology LiLT

2011

Mojgan Seraji Beáta Megyesi Joakim Nivre Jon Dehdari

This paper presents an ongoing project whose goal is to create a freely available dependency treebank for Persian. The data is taken from the Bijankhan corpus, which is already annotated for parts of speech, and a syntactic dependency annotation based on the Stanford Typed Dependencies is added through a bootstrapping procedure involving the opensource dependency parser MaltParser. We report pr...

متن کامل

Compliments in English and Persian interaction: A cross-cultural perspective

2011

Amin Karimnia Akbar Afghari

The study of compliments has attracted the attention of many scholars (e.g., Goffman 1971; Lakoff 1973; Brown and Levinson 1978; Amouzadeh 2001; Golato 2002; Sharifian 2005) and has become a major issue in the area of interactional sociolinguistics. To date, many models of politeness have been put forward in the literature. In this study, Brown and Levinson’s (1978, 1987) politeness model was u...

متن کامل

Word Sense Disambiguation Using Target Language Corpus in a Machine Translation System

Journal: :LLC 2005

Tayebeh Mosavi Miangah Ali Delavar Khalafi

This article studies different aspects of a new approach to word sense disambiguation using statistical information gained from a monolingual corpus of the target language. Here, the source language is English and the target is Persian, and the disambiguation method can be directly applied in the system of English-to-Persian machine translation for solving lexical ambiguity problems in this sys...

متن کامل

A Text Alignment Algorithm Based on Prediction of Obfuscation Types Using SVM Neural Network

2016

Fatemeh Mashhadirajab Mehrnoush Shamsfard

In this paper, we describe our text alignment algorithm that achieved the first rank in Persian Plagdet 2016 competition. The Persian Plagdet corpus includes several obfuscation strategies. Information about the type of obfuscation helps plagiarism detection systems to use their most suitable algorithm for each type. For this purpose, we use SVM neural network for classification of documents ac...

متن کامل

Algorithms and Corpora for Persian Plagiarism Detection: Overview of PAN at FIRE 2016

2016

Habibollah Asghari Salar Mohtaj Omid Fatemi Heshaam Faili Paolo Rosso Martin Potthast

The task of plagiarism detection is to find passages of text-reuse in a suspicious document. This task is of increasing relevance, since scholars around the world take advantage of the fact that information about nearly any subject can be found on the World Wide Web by reusing existing text instead of writing their own. We organized the Persian PlagDet shared task at PAN 2016 in an effort to pr...

متن کامل

Benchmarking SMT Performance for Farsi Using the TEP++ Corpus

2015

Peyman Passban Andy Way Qun Liu

Statistical machine translation (SMT) suffers from various problems which are exacerbated where training data is in short supply. In this paper we address the data sparsity problem in the Farsi (Persian) language and introduce a new parallel corpus, TEP++. Compared to previous results the new dataset is more efficient for Farsi SMT engines and yields better output. In our experiments using TEP+...

متن کامل