نتایج جستجو برای: lexical clusters

تعداد نتایج: 143359  

2013
Olutobi Owoputi Brendan T. O'Connor Chris Dyer Kevin Gimpel Nathan Schneider Noah A. Smith

We consider the problem of part-of-speech tagging for informal, online conversational text. We systematically evaluate the use of large-scale unsupervised word clustering and new lexical features to improve tagging accuracy. With these features, our system achieves state-of-the-art tagging results on both Twitter and IRC POS tagging tasks; Twitter tagging is improved from 90% to 93% accuracy (m...

1997
Masayuki Yamada Yasuhiro Komori Tetsuo Kosaka Hiroki Yamamoto

This paper describes a high speed algorithm for a speech recognizer based on speaker cluster HMM. The speaker cluster HMM, which enables to deal with variety among speakers, have been reported to show good performance. However, the computation amount grows in proportion to the number of clusters, when the speaker cluster HMM is used in speaker independent recognition, where the recognition proc...

2004
Dragomir R. Radev Jahna Otterbacher Zhu Zhang

Clusters of multiple news stories related to the same topic exhibit a number of interesting properties. For example, when documents have been published at various points in time or by different authors or news agencies, one finds many instances of paraphrasing, information overlap and even contradiction. The current paper presents the Cross-document Structure Theory (CST) Bank, a collection of ...

2016
Janelle Szary Michael N. Jones

Semantic fluency tasks have increasingly been used to probe the structure of human memory, adopting methodologies from the ecological foraging literature to describe memory as a trajectory through semantic space. Clusters of semantically related items are often produced together, and the transitions between these clusters of semantically related items are consistent with theories of optimal for...

2014
Pashutan Modaresi Philipp Gross

In this work we describe our approach to solve the author verification problem introduced in the PAN 2014 Author Identification task. The author verification task presents participants with a set of problems where each problem consists of a set of documents written by the same author and a questioned document with an unknown author. The task is then to decide whether the questioned document has...

2015
Karl Stratos Michael Collins

We tackle the question: how much supervision is needed to achieve state-of-the-art performance in part-of-speech (POS) tagging, if we leverage lexical representations given by the model of Brown et al. (1992)? It has become a standard practice to use automatically induced “Brown clusters” in place of POS tags. We claim that the underlying sequence model for these clusters is particularly well-s...

2012
Juliette Blevins Andrew Pawley

Kalam is a Trans New Guinea language of Papua New Guinea. Kalam has two distinct vowel types: full vowels /a e o/, which are of relatively long duration and stressed, and reduced central vowels, which are shorter and often unstressed, and occur predictably within word-internal consonant clusters and in monoconsonantal utterances. The predictable nature of the reduced vowels has led earlier rese...

2014
Martin Riedl Richard Steuer Christian Biemann

This paper introduces a distributional thesaurus and sense clusters computed on the complete Google Syntactic N-grams, which is extracted from Google Books, a very large corpus of digitized books published between 1520 and 2008. We show that a thesaurus computed on such a large text basis leads to much better results than using smaller corpora like Wikipedia. We also provide distributional thes...

2013
Chan-Chia Hsu

This study takes a corpus-based approach to examine twenty Chinese verbs that have been found to coerce their NP complements into an event type (cf. Lin et al. 2009), with an aim of creating a coercion profile for each verb. A cluster analysis is further conducted on the coercion profiles. The resulting clusters in our analysis show a bi-directional distribution: the verbs in Cluster 1 are foun...

2016
Guillaume Jacquet Maud Ehrmann Ralf Steinberger Jaakko Väyrynen

This paper reports on an approach and experiments to automatically build a cross-lingual multi-word entity resource. Starting from a collection of millions of acronym/expansion pairs for 22 languages where expansion variants were grouped into monolingual clusters, we experiment with several aggregation strategies to link these clusters across languages. Aggregation strategies make use of string...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید