نتایج جستجو برای: persian parallel corpus
تعداد نتایج: 300662 فیلتر نتایج به سال:
Finding an appropriate dataset for natural language processing applications is one of the main challenges for researches of this field. This issue is more problematic in Non-Latin languages especially Persian language. Access to an appropriate dataset that can be used in development of practical programs in language processing field, helps us to validate the obtained results and provide the fea...
Sense tagged corpora play a crucial role in Natural Language Processing, particularly in Word Sense Disambiguation and Natural Language Understanding. Since semantic annotations are usually performed by humans, such corpora are limited to a handful of tagged texts and are not available for many languages with scarce resources including Persian. The shortage of efficient, reliable linguistic res...
This study is an attempt to carry out a comparative analysis using Natural Semantic Metalanguage (henceforth NSM). The offering routine patterns of native Persian speakers was compared with that of Native American English speakers to see if it can provide evidence for applicability of NSM model which is claimed to be universal. The descriptive technique was the cultural scripts approach, using ...
Named-Entity Recognition (NER) is still a challenging task for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. To abridge this gap, in this paper we target the Persian language that is spoken by a population of over a hundred million people world-wide. We first present ...
This paper presents an ongoing project whose goal is to create a freely available dependency treebank for Persian. The data is taken from the Bijankhan corpus, which is already annotated for parts of speech, and a syntactic dependency annotation based on the Stanford Typed Dependencies is added through a bootstrapping procedure involving the opensource dependency parser MaltParser. We report pr...
The study of compliments has attracted the attention of many scholars (e.g., Goffman 1971; Lakoff 1973; Brown and Levinson 1978; Amouzadeh 2001; Golato 2002; Sharifian 2005) and has become a major issue in the area of interactional sociolinguistics. To date, many models of politeness have been put forward in the literature. In this study, Brown and Levinson’s (1978, 1987) politeness model was u...
This article studies different aspects of a new approach to word sense disambiguation using statistical information gained from a monolingual corpus of the target language. Here, the source language is English and the target is Persian, and the disambiguation method can be directly applied in the system of English-to-Persian machine translation for solving lexical ambiguity problems in this sys...
In this paper, we describe our text alignment algorithm that achieved the first rank in Persian Plagdet 2016 competition. The Persian Plagdet corpus includes several obfuscation strategies. Information about the type of obfuscation helps plagiarism detection systems to use their most suitable algorithm for each type. For this purpose, we use SVM neural network for classification of documents ac...
The task of plagiarism detection is to find passages of text-reuse in a suspicious document. This task is of increasing relevance, since scholars around the world take advantage of the fact that information about nearly any subject can be found on the World Wide Web by reusing existing text instead of writing their own. We organized the Persian PlagDet shared task at PAN 2016 in an effort to pr...
Statistical machine translation (SMT) suffers from various problems which are exacerbated where training data is in short supply. In this paper we address the data sparsity problem in the Farsi (Persian) language and introduce a new parallel corpus, TEP++. Compared to previous results the new dataset is more efficient for Farsi SMT engines and yields better output. In our experiments using TEP+...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید