Towards a reference tagset for Japanese
نویسنده
چکیده
This is a progress report on ongoing research aimed at proposing a ‘reference’ morphosyntactic part-of-speech tagset for the Japanese language. Such a tagset should be linguistically motivated, explicit, broadly applicable, and computationally tractable. Being well defined, such a tagset should be easily adapted in specific ways (e.g. limited, extended or modified). The author is currently attempting to apply the Standards for tagsets (Leech and Wilson 1999, originally proposed in EAGLES, 1996) in designing a reference tagset for Japanese.
منابع مشابه
Universal Dependencies for Japanese
We present an attempt to port the international syntactic annotation scheme, Universal Dependencies, to the Japanese language in this paper. Since the Japanese syntactic structure is usually annotated on the basis of unique chunk-based dependencies, we first introduce word-based dependencies by using a word unit called the Short Unit Word, which usually corresponds to an entry in the lexicon Un...
متن کاملExamining the difficulty pathways of can-do statements from a localized version of the CEFR
The Japanese adaptation of the Common European Framework of Reference (CEFR-J) is a tailored version of the Common European Framework of Reference (CEFR), designed to better meet the needs of Japanese learners of English. The CEFR-J, like the CEFR, uses illustrative descriptors known as can-do statements, that describe achievement goals for five skills (listening, reading, spoken ...
متن کاملCzech Morphological Tagset Revisited
Lot of natural language processing is built on top of some solid morphological annotation. In this paper we present an update of the Czech morphological tagset as given by the analyzer Ajka that has been used for academic as well as commercial purposes for more than dozen years. The revision reacts on rather practical issues that we had to face during development of subsequent tools for NLP, pa...
متن کاملReusable Tagset Conversion Using Tagset Drivers
Part-of-speech or morphological tags are important means of annotation in a vast number of corpora. However, different sets of tags are used in different corpora, even for the same language. Tagset conversion is difficult, and solutions tend to be tailored to a particular pair of tagsets. We propose a universal approach that makes the conversion tools reusable. We also provide an indirect evalu...
متن کاملBIS Annotation Standards With Reference to Konkani Language
The Bureau of Indian Standards (BIS) Part Of Speech (POS) tagset has been prepared for the Indian Languages by the POS Tag Standardization Committee of Department of Information Technology (DIT), New Delhi, India. The BIS POS tagset aims to ensure standardization in the POS tagging of all the Indian Languages. It has been used for POS tagging in the Indian Languages Corpora Initiative (ILCI) pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001