Linguistic Resources and Topic Models for the Analysis of Persian Poems

نویسندگان

  • Ehsaneddin Asgari
  • Jean-Cédric Chappelier
چکیده

This paper describes the usage of Natural Language Processing tools, mostly probabilistic topic modeling, to study semantics (word correlations) in a collection of Persian poems consisting of roughly 18k poems from 30 different poets. For this study, we put a lot of effort in the preprocessing and the development of a large scope lexicon supporting both modern and ancient Persian. In the analysis step, we obtained very interesting and meaningful results regarding the correlation between poets and topics, their evolution through time, as well as the correlation between the topics and the metre used in the poems. This work should thus provide valuable results to literature researchers, especially for those working on stylistics or comparative literature. 1 Context and Objectives The purpose of this work is to use Natural Language Processing (NLP) tools, among which probabilistic topic models (Buntine, 2002; Blei et al., 2003; Blei, 2012), to study word correlations in a special type of Persian poems called “Ghazal” (لزغ), one of the most popular Persian poem forms originating in 6th Arabic century. Ghazal is a poetic form consisting of rhythmic couplets with a rhyming refrain (see Figure 1). Each couplet consists of two phrases, called hemistichs. Syllables in all of the hemistichs of a given Ghazal follow the same pattern of heavy and light syllables. Such a pattern introduces a musical rhythm, called metre. Metre is one of the most important properties of Persian poems and the reason why usual Persian grammar rules can be violated in poems, especially the order of the parts of speech. There exist Figure 1: Elements of a typical Ghazal (by Hafez, calligraphed by K. Khoroush). Note that Persian is right to left in writing. about 300 metres in Persian poems, 270 of which are rare, the vast majority of poems composed only from 30 metres (Mojiry and Minaei-Bidgoli, 2008). Ghazal traditionally deals with just one subject, each couplet focusing on one idea. The words in a couplet are thus very correlated. However, depending on the rest of the couplets, the message of a couplet could often be interpreted differently due to the many literature techniques that can be found in Ghazals, e.g. metaphors, homonyms, personification, paradox, alliteration. For this study, we downloaded from the Ganjoor poems website1, with free permission to use, a Ghazal collection corresponding to 30 poets, from Hakim Sanai (1080) to Rahi Moayyeri (1968), with a total of 17, 939 Ghazals containing about 170, 000 couplets. The metres, as determined by experts (Shamisa, 2004), are also provided for most poems. 1http://ganjoor.net/.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Genre analysis of literature research article abstracts: A cross-linguistic, cross-cultural study

Following Swales’s  (1981)  works  on  genre  analysis,  studies  on  different  sections  of  Research Articles  (RAs)  in  various  languages  and  fields  abound;  however,  only  scant  attention  has  been directed toward abstracts written in Persian, and in the field of literature. Moreover, claims made by Lores (2004) regarding the correspondence of two types of abstracts with different ...

متن کامل

Confirming the themes and interpretive unity of Ghazal poetry using topic models

We apply topic modeling to classifying the genre of Ghazal, a form common in Persian poetry. We show that a classifier based on automatically-generated topics exposes important information with only a small performance penalty: the top discriminative topics can be manually aligned with themes prevalent in the associated genres, as identified by scholars of literature. We also weigh in on a long...

متن کامل

Application of environmental-cultural features in the contemporary Persian literature of Mazandaran toward strengthening the local culture from the perspective of the poem of Asadollah Emadi, Ali Akbar Mahjorian and Khali Gheisari

Abstract Contemporary environmental poetry is a subjective kind of poetry with an organic totality in which tradition and modernism are challenged clearly. Environmental poetry is on the peak of the pyramid of local literature and is regarded as the background for classical poetry. Highlighting environmental ideas and creating such a room in the linguistic environment creates a specific piece ...

متن کامل

A Linguistic Study on the Translation of Parvin E’tesami’s Poems into English Using Catford’s Category Shifts

The present study aimed to investigate the translation into English by Alaeddin Pazargadi of Parvin E’tesami’s poems; in particular, it attempted to analyze the structural elements such as verbs, nouns, pronouns, adjectives, adverbs, articles, conjunctions, prepositions, and interjections in them. Considering the relationship between Linguistics and Translation Studies, the theoretical framewor...

متن کامل

Syntactic Structures and Rhetorical Functions of Electrical Engineering, Psychiatry, and Linguistics Research Article Titles in English and Persian: A Cross-linguistic and Cross-disciplinary Study

A research article (RA) title is the first and foremost feature that attracts the reader's attention, the feature from which she/he may decide whether the whole article is worth reading. The present study attempted to investigate syntactic structures and rhetorical functions of RA titles written in English and Persian and published in journals in three disciplines of Electrical Engineering, Psy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013