نتایج جستجو برای: persian parallel corpus

تعداد نتایج: 300662  

2015
Mohamadreza Mahmoodvand Maryam Hourali

Finding an appropriate dataset for natural language processing applications is one of the main challenges for researches of this field. This issue is more problematic in Non-Latin languages especially Persian language. Access to an appropriate dataset that can be used in development of practical programs in language processing field, helps us to validate the obtained results and provide the fea...

2011
Bahareh Sarrafzadeh Nikolay Yakovets Nick Cercone Aijun An

Sense tagged corpora play a crucial role in Natural Language Processing, particularly in Word Sense Disambiguation and Natural Language Understanding. Since semantic annotations are usually performed by humans, such corpora are limited to a handful of tagged texts and are not available for many languages with scarce resources including Persian. The shortage of efficient, reliable linguistic res...

2013
Amin Karimnia

This study is an attempt to carry out a comparative analysis using Natural Semantic Metalanguage (henceforth NSM). The offering routine patterns of native Persian speakers was compared with that of Native American English speakers to see if it can provide evidence for applicability of NSM model which is claimed to be universal. The descriptive technique was the cultural scripts approach, using ...

2016
Hanieh Poostchi Ehsan Zare Borzeshi Mohammad Abdous Massimo Piccardi

Named-Entity Recognition (NER) is still a challenging task for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. To abridge this gap, in this paper we target the Persian language that is spoken by a population of over a hundred million people world-wide. We first present ...

2011
Mojgan Seraji Beáta Megyesi Joakim Nivre Jon Dehdari

This paper presents an ongoing project whose goal is to create a freely available dependency treebank for Persian. The data is taken from the Bijankhan corpus, which is already annotated for parts of speech, and a syntactic dependency annotation based on the Stanford Typed Dependencies is added through a bootstrapping procedure involving the opensource dependency parser MaltParser. We report pr...

2011
Amin Karimnia Akbar Afghari

The study of compliments has attracted the attention of many scholars (e.g., Goffman 1971; Lakoff 1973; Brown and Levinson 1978; Amouzadeh 2001; Golato 2002; Sharifian 2005) and has become a major issue in the area of interactional sociolinguistics. To date, many models of politeness have been put forward in the literature. In this study, Brown and Levinson’s (1978, 1987) politeness model was u...

Journal: :LLC 2005
Tayebeh Mosavi Miangah Ali Delavar Khalafi

This article studies different aspects of a new approach to word sense disambiguation using statistical information gained from a monolingual corpus of the target language. Here, the source language is English and the target is Persian, and the disambiguation method can be directly applied in the system of English-to-Persian machine translation for solving lexical ambiguity problems in this sys...

2016
Fatemeh Mashhadirajab Mehrnoush Shamsfard

In this paper, we describe our text alignment algorithm that achieved the first rank in Persian Plagdet 2016 competition. The Persian Plagdet corpus includes several obfuscation strategies. Information about the type of obfuscation helps plagiarism detection systems to use their most suitable algorithm for each type. For this purpose, we use SVM neural network for classification of documents ac...

2016
Habibollah Asghari Salar Mohtaj Omid Fatemi Heshaam Faili Paolo Rosso Martin Potthast

The task of plagiarism detection is to find passages of text-reuse in a suspicious document. This task is of increasing relevance, since scholars around the world take advantage of the fact that information about nearly any subject can be found on the World Wide Web by reusing existing text instead of writing their own. We organized the Persian PlagDet shared task at PAN 2016 in an effort to pr...

2015
Peyman Passban Andy Way Qun Liu

Statistical machine translation (SMT) suffers from various problems which are exacerbated where training data is in short supply. In this paper we address the data sparsity problem in the Farsi (Persian) language and introduce a new parallel corpus, TEP++. Compared to previous results the new dataset is more efficient for Farsi SMT engines and yields better output. In our experiments using TEP+...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید