Developing Punjabi Morphology, Corpus and Lexicon

نویسندگان

  • Muhammad Humayoun
  • Aarne Ranta
چکیده

We describe an implementation of morphology, development of a corpus and building of a lexicon for Punjabi language. Such resources are building blocks for various language technology tasks ranging from part of speech tagging to machine translation. Their importance is further increased by the fact that Punjabi is an under resourced language. We release these resources as open-source.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Punjabi Grammar Checker

This article provides description about the grammar checking software developed for detecting the grammatical errors in Punjabi texts and providing suggestions wherever appropriate to rectify those errors. This system utilizes a full-form lexicon for morphology analysis and rule-based systems for part of speech tagging and phrase chunking. The system supported by a set of carefully devised erro...

متن کامل

A Grammar Checking System for Punjabi

This article provides description about the grammar checking system developed for detecting various grammatical errors in Punjabi texts. This system utilizes a fullform lexicon for morphological analysis, and applies rule-based approaches for part-of-speech tagging and phrase chunking. The system follows a novel approach of performing agreement checks at phrase and clause levels using the gramm...

متن کامل

A Morphological Lexicon for the Persian Language

We introduce PerLex, a large-coverage and freely-available morphological lexicon for the Persian language. We describe the main features of the Persian morphology, and the way we have represented it within the Alexina formalism, on which PerLex is based. We focus on the methodology we used for constructing lexical entries from various sources, as well as the problems related to typographic norm...

متن کامل

Identification of Prosodic Features of Punjabi for Enhancing the Pronunciation Lexicon Specification (pls) for Voice Browsing

Voice browsing requires speech interface framework. Pronunciation Lexicon Specification (PLS) 1.0 is a recommendation of Voice Browser Working Group of W3C (World-Wide Web Consortium), a machinereadable specification of pronunciation information which can be used for speech technology development. This global PLS standard is applicable across European and Asian languages and this specification ...

متن کامل

Identification and Separation of Complex Sentences from Punjabi Language

Complex sentences constitute major parts of the Punjabi language. All the large sentences are either of compound or of complex type. Detail analysis of complex sentences is helpful in processing the Punjabi language through computer. This study will be helpful in identifying and separating the complex sentences from Punjabi corpus. Also this study will be helpful in developing other NLP applica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010