Tagging Unknown Proper Names Using Decision Trees

نویسندگان

  • Frédéric Béchet
  • Alexis Nasr
  • Franck Genet
چکیده

This paper describes a supervised learning method to automatically select from a set of noun phrases, embedding proper names of different semantic classes, their most distinctive features. The result of the learning process is a decision tree which classifies an unknown proper name on the basis of its context of occurrence. This classifier is used to estimate the probability distribution of an out of vocabulary proper name over a tagset. This probability distribution is itself used to estimate the parameters of a stochastic part of speech tagger.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Approach to Proper Name Tagging for German

This paper presents an incremental method for the tagging of proper names in German newspaper texts. The tagging is performed by the analysis of the syntactic and textual contexts of proper names together with a morphological analysis. The proper names selected by this process supply new contexts which can be used for finding new proper names, and so on. This procedure was applied to a small Ge...

متن کامل

Automatic Semantic Tagging of Unknown Proper Names

Implemented methods for proper names recognition rely on large gazetteers of common proper nouns and a set of heuristic rules (e.g. Mr. as an indicator of a PERSON entity type). Though the performance of current PN recognizers is very high (over 90%), it is important to note that this problem is by no means a "solved problem". Existing systems perform extremely well on newswire corpora by virtu...

متن کامل

Proper Nouns Recognition in Arabic Crime Text Using Machine Learning Approach

Named Entity Recognition (NER) identifies proper nouns in a text and categorizes it as a distinct kind of named entities. This function enables the extraction of peoples name, locations, organizations, and currencies. Several research abound in this area in Arabic NER is concerned. However, recognizing Arabic named entities is challenging due to the complexity in the Arabic language. These comp...

متن کامل

Amazigh Part-of-speech Tagging Using Markov Models and Decision Trees

The main goal of this work is the implementation of a new tool for the Amazigh part of speech tagging using Markov Models and decision trees. After studying different approaches and problems of part of speech tagging, we have implemented a tagging system based on TreeTagger a generic stochastic tagging tool, very popular for its efficiency. We have gathered a working corpus, large enough to ens...

متن کامل

A Comparison of Three Machine Learning Methods for Amazigh POS Tagging

Part of speech tagging (POS tagging) has a crucial role in different fields of natural language processing (NLP) including Speech Recognition, Natural Language Parsing, Information Retrieval and Multi Words Term Extraction. This paper describes a set of experiments involving the application of three state-of the-art part-of-speech taggers to Amazigh texts, using a tagset of 28 tags. The taggers...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000