Word Sense Disambiguation in Hindi Language Using Hyperspace Analogue to Language and Fuzzy C-Means Clustering

نویسندگان

  • Devendra K. Tayal
  • Leena Ahuja
  • Shreya Chhabra
چکیده

The problem of Word Sense Disambiguation (WSD) can be defined as the task of assigning the most appropriate sense to the polysemous word within a given context. Many supervised, unsupervised and semi-supervised approaches have been devised to deal with this problem, particularly, for the English language. However, this is not the case for Hindi language, where not much work has been done. In this paper, a new approach has been developed to perform disambiguation in Hindi language. For training the system, the text in Hindi language is converted into Hyperspace Analogue to Language (HAL) vectors, thereby, mapping each word into a high-dimensional space. We also deal with the fuzziness involved in disambiguation of words. We apply Fuzzy C-Means Clustering algorithm to form clusters denoting the various contexts in which the polysemous word may occur. The test data is then mapped into the high dimensional space created during the training phase. We test our approach on the corpus created using Hindi news articles and Wikipedia. We compare our approach with other significant approaches available in the literature and the experimental results indicate that our approach outperforms all the previous works done for Hindi Language.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Kernel Fuzzy C-Means Clustering for Word Sense Disambiguation in

Word sense disambiguation (WSD) in biomedical texts is important. The majority of existing research primarily focuses on supervised learning methods and knowledge-based approaches. Implementing these methods requires significant human-annotated corpus, which is not easily obtained. In this paper, we developed an unsupervised system for WSD in biomedical texts. First, we predefine the number of ...

متن کامل

Mining Association Rules Based Approach to Word Sense Disambiguation for Hindi Language

These days, the language is making hindrances in the advantages of Information Technology revolution in India. So, there is the need of the adequate measures to perform natural language processing (NLP) through computer processing so that computer based system can be interacted by users through natural language like Hindi. This paper presents a new Word Sense Disambiguation method associated wi...

متن کامل

Word Sense Disambiguation in Bengali applied to Bengali-Hindi Machine Translation

We have developed a word sense disambiguation(WSD) system for Bengali language and applied the system to get correct lexical choice in Bengali-Hindi machine translation. We are not aware of any existing system for Bengali WSD. Since there is no sense annotated Bengali corpus or sufficient amount of parallel corpus for Bengali-Hindi language pair, we had to use an unsupervised approach. We use a...

متن کامل

An Unsupervised Approach to Chinese Word Sense Disambiguation Based on Hownet

The research on word sense disambiguation (WSD) has great theoretical and practical significance in many fields of natural language processing (NLP). This paper presents an unsupervised approach to Chinese word sense disambiguation based on Hownet (an electronic Chinese lexical resource). In our approach, contexts that include ambiguous words are converted into vectors by means of a second-orde...

متن کامل

Web Based Hindi to Punjabi Machine Translation System

Hindi and Punjabi are closely related languages with lots of similarities in syntax and vocabulary Both Punjabi and Hindi languages have originated from Sanskrit which is one of the oldest language. In terms of speakers, Hindi is third most widely spoken language and Punjabi is twelfth most widely spoken language. Punjabi language is mostly used in the Northern India and in some areas of Pakist...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015