نتایج جستجو برای: lexical segmentation

تعداد نتایج: 95920  

2008
Degen Huang Xiao Sun Shidou Jiao Lishuang Li Zhuoye Ding Ru Wan

This paper presents the Chinese lexical analysis systems developed by Natural Language Processing Laboratory at Dalian University of Technology, which were evaluated in the 4th International Chinese Language Processing Bakeoff. The HMM and CRF hybrid model, which combines character-based model with word-based model in a directed graph, is adopted in system developing. Both the closed and open t...

2000
V. DI LECCE G. DIMAURO A. GUERRIERO S. IMPEDOVO G. PIRLO A. SALZO

This paper presents a new hybrid approach for legal amount recognition on Italian bankchecks. It exploits the consideration that a legal amount can be described as a sequence of 'core' groups of words separated by suitable 'separator' words. Therefore, an analytical strategy is used to perform amount segmentation into 'core' groups of words that are then recognized according to a global approac...

2011
Hyoung-Gyu Lee Joo-Young Lee Min-Jeong Kim Hae-Chang Rim Joong-Hwi Shin Young-Sook Hwang

In this paper, we propose a phrase segmentation model for the phrase-based statistical machine translation. We observed that good translation candidates generated by a conventional phrase-based SMT decoder have lexical cohesion and show more uniform translation for each phrase segment. Based on the observation, we propose a novel phrase segmentation model using collocation between two adjacent ...

Journal: :TACL 2014
Nathan Schneider Emily Danchik Chris Dyer Noah A. Smith

We present a novel representation, evaluation measure, and supervised models for the task of identifying the multiword expressions (MWEs) in a sentence, resulting in a lexical semantic segmentation. Our approach generalizes a standard chunking representation to encode MWEs containing gaps, thereby enabling efficient sequence tagging algorithms for featurerich discriminative models. Experiments ...

2017
Younes Samih Mohammed Attia Mohamed Eldesouki Ahmed Abdelali Hamdy Mubarak Laura Kallmeyer Kareem Darwish

The automated processing of Arabic dialects is challenging due to the lack of spelling standards and the scarcity of annotated data and resources in general. Segmentation of words into their constituent tokens is an important processing step for natural language processing. In this paper, we show how a segmenter can be trained on only 350 annotated tweets using neural networks without any norma...

2007
Sebastien Cuendet Elizabeth Shriberg Benoit Favre James Fung Dilek Hakkani-Tur

Information retrieval techniques for speech are based on those developed for text, and thus expect structured data as input. An essential task is to add sentence boundary information to the otherwise unannotated stream of words output by automatic speech recognition systems. We analyze sentence segmentation performance as a function of feature types and transcription (manual versus automatic) f...

2017
Franco Alberto Cardillo Marcello Ferro Claudia Marzi Vito Pirrelli

English. Machine learning offers two basic strategies for morphology induction: lexical segmentation and surface word relation. The first one assumes that words can be segmented into morphemes. Inducing a novel inflected form requires identification of morphemic constituents and a strategy for their recombination. The second approach dispenses with segmentation: lexical representations form par...

2004
Iñaki Alegria Olatz Ansa Xabier Artola Nerea Ezeiza Koldo Gojenola Ruben Urizar

This paper describes the representation of Basque Multiword Lexical Units and the automatic processing of Multiword Expressions. After discussing and stating which kind of multiword expressions we consider to be processed at the current stage of the work, we present the representation schema of the corresponding lexical units in a generalpurpose lexical database. Due to its expressive power, th...

2010
Yvonne Tsai

Text analysis involves the deconstruction of information within a text. This includes text structure, text pattern, linguistic features, lexical analysis, and syntactic analysis. This research took as its starting point the bottom-up approach of analysing the lexical features, syntactic features, and textual features of patent abstracts for comprehensive coverage of text analysis. Several tools...

2006
Sébastien Cuendet

The sentence segmentation task is a classification task that aims at inserting sentence boundaries in a sequence of words. One of the applications of sentence segmentation is to detect the sentence boundaries in the sequence of words that is output by an automatic speech recognition system (ASR). The purpose of correctly finding the sentence boundaries in ASR transcriptions is to make it possib...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید