A Maximum Entropy Approach to Identifying

نویسنده

  • Adwait Ratnaparkhi
چکیده

We present a trainable model for identifying sentence boundaries in raw text. Given a corpus annotated with sentence boundaries , our model learns to classify each occurrence of ., ?, and ! as either a valid or invalid sentence boundary. The training procedure requires no hand-crafted rules, lex-ica, part-of-speech tags, or domain-speciic information. The model can therefore be trained easily on any genre of English, and should be trainable on any other Roman-alphabet language. Performance is comparable to or better than the performance of similar systems, but we emphasize the simplicity of retraining for new domains.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identifying Biomagnetic Sources in the Brain by the Maximum Entropy Approach

Magnetoencephalographic (MEG) measurements record magnetic fields generated from neurons while information is being processed in the brain. The inverse problem of identifying sources of biomagnetic fields and deducing their intensities from MEG measurements is ill-posed when the number of field detectors is far less than the number of sources. This problem is less severe if there is already a r...

متن کامل

Plant Classification in Images of Natural Scenes Using Segmentations Fusion

This paper presents a novel approach to automatic classifying and identifying of tree leaves using image segmentation fusion. With the development of mobile devices and remote access, automatic plant identification in images taken in natural scenes has received much attention. Image segmentation plays a key role in most plant identification methods, especially in complex background images. Wher...

متن کامل

Initial Study on Automatic Identification of Speaker Role in Broadcast News Speech

Identifying a speaker’s role (anchor, reporter, or guest speaker) is important for finding the structural information in broadcast news speech. We present an HMM-based approach and a maximum entropy model for speaker role labeling using Mandarin broadcast news speech. The algorithms achieve classification accuracy of about 80% (compared to the baseline of around 50%) using the human transcripti...

متن کامل

Predicting the Potential Habitat Distribution of Crataegus Pontica C. Koch, Using a Combined Modeling Approach in Lorestan Province

Habitat degradation is one the important reasons of plant species extinction. Modeling techniques are widely used for identifying the potential habitats of different plant species. Thus, the purpose of current study was to determine potential habitats of Zalzalak in Lorestan Province. Species presence data and 23 environmental variables were collected in Lorestan Province. Correlation analysis ...

متن کامل

Modeling of the Maximum Entropy Problem as an Optimal Control Problem and its Application to Pdf Estimation of Electricity Price

In this paper, the continuous optimal control theory is used to model and solve the maximum entropy problem for a continuous random variable. The maximum entropy principle provides a method to obtain least-biased probability density function (Pdf) estimation. In this paper, to find a closed form solution for the maximum entropy problem with any number of moment constraints, the entropy is consi...

متن کامل

A local approach to the entropy of countable fuzzy partitions

This paper denes and investigates the ergodic proper-ties of the entropy of a countable partition of a fuzzy dynamical sys-tem at different points of the state space. It ultimately introducesthe local fuzzy entropy of a fuzzy dynamical system and proves itto be an isomorphism invariant.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997