Design and implementation of Persian spelling detection and correction system based on Semantic

نویسندگان

چکیده مقاله:

Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors.  Also developing Persian tools will provide Persian programs to check spell and reduce errors in electronic texts. In this work, we review the spelling detection and correction methods, especially for the Persian language. The proposed algorithm consists of two steps. The first step is non-word error detection and correction by intelligent scoring algorithm. The second step is read-word error detection and correction.  We propose a spelling system "Perspell” for Persian non-word and real-word errors using a hybrid scoring system and optimized language model by lexicon. This scoring system uses a combination of lexical and semantic features optimized by learning dataset. The weight of these features in scoring system is also optimized by learning phase. Perspell is compared with known Persian spellchecker systems and could overcome them in precision of detection and correction. Accordingly, the proposed Persian spell-checker system can also detect and correct real-word errors. This open challenge category of spelling is a complicated and time consuming task in Persian as well as, assessing the proposed method, the F-measure metric has improved significantly (about 10%) for detecting and correcting Persian words. In the proposed method, we used Persian language model with bootstrapping and smoothing to overcome data sparseness and lack of data. The bootstrapping is developed using a Persian dictionary and further we used word sense disambiguation to select the correct related replaced word.  

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

the role of semantic and communicative translation on reading comprehension of scientific texts

the following null hypothesis was proposed: h : there is no significant difference between the use of semantically or communicatively translates scientific texts. to test the null hypothesis, a number of procedures were taken first, two passages were selected form soyrcebooks of food and nutrition industry and gardening deciplines. each, in turn, was following by a number of comprehension quest...

15 صفحه اول

on the comparison of keyword and semantic-context methods of learning new vocabulary meaning

the rationale behind the present study is that particular learning strategies produce more effective results when applied together. the present study tried to investigate the efficiency of the semantic-context strategy alone with a technique called, keyword method. to clarify the point, the current study seeked to find answer to the following question: are the keyword and semantic-context metho...

15 صفحه اول

simulation and design of electronic processing circuit for restaurants e-procurement system

the poor orientation of the restaurants toward the information technology has yet many unsolved issues in regards to the customers. one of these problems which lead the appeal list of later, and have a negative impact on the prestige of the restaurant is the case when the later does not respond on time to the customers’ needs, and which causes their dissatisfaction. this issue is really sensiti...

15 صفحه اول

English-Persian Plagiarism Detection based on a Semantic Approach

Plagiarism which is defined as “the wrongful appropriation of other writers’ or authors’ works and ideas without citing or informing them” poses a major challenge to knowledge spread publication. Plagiarism has been placed in four categories of direct, paraphrasing (rewriting), translation, and combinatory. This paper addresses translational plagiarism which is sometimes referred to as cross-li...

متن کامل

Design and implementation of a WEBGIS-based recommendation system based on context-awareness for tourism planning

Today, tourism is one of the most lucrative industries in the world. Due to the large amount of information that exists about the points of Interest (POI) of a city, the tourist is faced with an overload of information. As a result, a recommending system is needed to recommend suitable tourist places to the tourist in the shortest time. In order to offer a better offer, the interests and contex...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 16  شماره 3

صفحات  128- 117

تاریخ انتشار 2019-12

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

کلمات کلیدی

کلمات کلیدی برای این مقاله ارائه نشده است

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023