Software Tools for Morphological Tagging of Zulu Corpora and Lexicon Development

نویسندگان

  • Sonja E. Bosch
  • Laurette Pretorius
چکیده

The aim of this paper is to discuss aspects of an on-going project on the development of grammatical and lexical resources for Zulu with sufficient coverage for unrestricted text. We explain how the basic software tools of computational morphology are used in linguistic processing, more specifically for automatic word form recognition and morphological tagging of the growing stock of electronic text corpora of a Bantu language such as Zulu. It is also shown how a machine-readable lexicon is in turn enhanced with the information acquired and extracted by means of such corpus analysis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

First Steps towards the Semi-automatic Development of a Wordformation-based Lexicon of Latin

Although lexicography of Latin has a long tradition dating back to ancient grammarians, and almost all Latin grammars devote to wordformation at least one part of the section(s) concerning morphology, none of the today available lexical resources and NLP tools of Latin feature a wordformation-based organization of the Latin lexicon. In this paper, we describe the first steps towards the semi-au...

متن کامل

Exploiting Cross-Linguistic Similarities in Zulu and Xhosa Computational Morphology

This paper investigates the possibilities that cross-linguistic similarities and dissimilarities between related languages offer in terms of bootstrapping a morphological analyser. In this case an existing Zulu morphological analyser prototype (ZulMorph) serves as basis for a Xhosa analyser. The investigation is structured around the morphotactics and the morphophonological alternations of the ...

متن کامل

Containing overgeneration in Zulu computational morphology1

The development of a large-coverage, computational morphological analyser for Zulu requires the modelling not only of the regular phenomena often associated with word formation, but also the idiosyncratic behaviour that may occur in Zulu morphology. This paper discusses the application of an existing rule-based, finite-state morphological analyser prototype ZulMorph in semi-automating the minin...

متن کامل

Multi-source morphosyntactic tagging for spoken Rusyn

This paper deals with the development of morphosyntactic taggers for spoken varieties of the Slavic minority language Rusyn. As neither annotated corpora nor parallel corpora are electronically available for Rusyn, we propose to combine existing resources from the etymologically close Slavic languages Russian, Ukrainian, Slovak, and Polish and adapt them to Rusyn. Using MarMoT as tagging toolki...

متن کامل

Philippine Language Resources: Trends and Directions

We present the diverse research activities on Philippine languages from all over the country, with focus on the Center for Language Technologies of the College of Computer Studies, De La Salle University, Manila, where majority of the work are conducted. These projects include the formal representation of Philippine languages and the processes involving these languages. Language representation ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004