Chinese Word Segmentation Based On Direct Maximum Entropy Model
نویسنده
چکیده
Chinese word segmentation is a fundamental and important issue in Chinese information processing. In order to find a unified approach for Chinese word segmentation, the author develop a Chinese lexical analyzer PCWS using direct maximum entropy model. The paper presents the general description of PCWS, as well as the result and analysis of its performance at the Second International Chinese Word Segmentation Bakeoff.
منابع مشابه
Chinese Word Boundaries Detection Based on Maximum Entropy Model
Among the language texts in natural language, Chinese texts are written in a continuous way with ideographic characters. Unlike other western language texts such as English, Portuguese, etc., delimiters are used to specify the word boundaries. Hence, for any Chinese information processing system such as automatic question and answering, web information retrieval, text to speech conversion, mach...
متن کاملCombination of Machine Learning Methods for Optimum Chinese Word Segmentation
This article presents our recent work for participation in the Second International Chinese Word Segmentation Bakeoff. Our system performs two procedures: Out-ofvocabulary extraction and word segmentation. We compose three out-of-vocabulary extraction modules: Character-based tagging with different classifiers – maximum entropy, support vector machines, and conditional random fields. We also co...
متن کاملChinese Word Segmentation Based on an Approach of Maximum Entropy Modeling
In this paper, we described our Chinese word segmentation system for the 3rd SIGHAN Chinese Language Processing Bakeoff Word Segmentation Task. Our system deal with the Chinese character sequence by using the Maximum Entropy model, which is fully automatically generated from the training data by analyzing the character sequences from the training corpus. We analyze its performance on both close...
متن کاملChinese Word Segmentation with Maximum Entropy and N-gram Language Model
This paper presents the Chinese word segmentation systems developed by Speech and Hearing Research Group of National Laboratory on Machine Perception (NLMP) at Peking University, which were evaluated in the third International Chinese Word Segmentation Bakeoff held by SIGHAN. The Chinese character-based maximum entropy model, which switches the word segmentation task to a classification task, i...
متن کاملA Maximum Entropy Chinese Character-Based Parser
The paper presents a maximum entropy Chinese character-based parser trained on the Chinese Treebank (“CTB” henceforth). Word-based parse trees in CTB are first converted into characterbased trees, where word-level part-ofspeech (POS) tags become constituent labels and character-level tags are derived from word-level POS tags. A maximum entropy parser is then trained on the character-based corpu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005