نتایج جستجو برای: text linguistic

تعداد نتایج: 208900  

2006
Kun Yu Sadao Kurohashi Hao Liu Toshiaki Nakazawa

This paper describes our word segmentation system and named entity recognition (NER) system for participating in the third SIGHAN Bakeoff. Both of them are based on character tagging, but use different tag sets and different features. Evaluation results show that our word segmentation system achieved 93.3% and 94.7% F-score in UPUC and MSRA open tests, and our NER system got 70.84% and 81.32% F...

2003
Stephanie Strassel David Miller Kevin Walker Christopher Cieri

This paper describes ongoing efforts at Linguistic Data Consortium to create shared resources for improved speech-totext technology. Under the DARPA EARS program, technology providers are charged with creating STT systems whose outputs are substantially richer and much more accurate than is currently possible. These aggressive program goals motivate new approaches to corpus creation and distrib...

Journal: :international journal of information science and management 0
zahra abdolhosseini department of computer engineering, alzahra university, tehran, iran mohammad reza keyvanpour department of computer engineering, alzahra university, tehran, iran

persian natural language processing (nlp) researchers have many limitations to access linguistic tools which are suitable for text processing. therefore, researchin persian text processing is very limited. since dataset is an important requirement for experiments and their evaluation, we aimed to create appropriate corpora for information retrieval and natural language processing in persian. th...

2004
Ulrik Sandborg-Petersen

Emdros is a text database engine for linguistic analysis or annotation of text. It is appliccable especially in corpus linguistics for storing and retrieving linguistic analyses of text, at any linguistic level. Emdros implements the EMdF text database model and the MQL query language. In this paper, I present both, and give an example of how Emdros can be useful in computational linguistics.

2010
Zhiyi Song Stephanie Strassel Gary Krug Kazuaki Maeda

Statistical Machine Translation (MT) systems have achieved impressive results in recent years, due in large part to the increasing availability of parallel text for system training and development. This paper describes recent efforts at Linguistic Data Consortium to create linguistic resources for MT, including corpora, specifications and resource infrastructure. We review LDC's three-pronged a...

Journal: :Speech Communication 2001
Steven Bird Jonathan Harrington

The growth in the use of speech corpora has benefited in the last 10 years from the establishment of data centres, such as the Linguistic Data Consortium (LDC), the European Language Resources Association (ELRA), the Japanese Language Resource Consortium (GSK: Gengo Shigen Kyouyuukikou), and multi-site annotation initiatives, such as the ToBI system for prosodic annotation and the DAMSL system ...

2010
Alvin F. Martin Craig S. Greenberg

The 2010 NIST Speaker Recognition Evaluation continues a series of evaluations of text independent speaker detection begun in 1996. It utilizes the newly collected Mixer-6 and Greybeard Corpora from the Linguistic Data Consortium. Major test conditions to be examined include variations in channel, speech style, vocal effort, and the effect of speaker aging over a multi-year period. A new primar...

2012
Zhiyi Song Safa Ismael Stephen Grimes David S. Doermann Stephanie Strassel

We describe efforts to create corpora to support development and evaluation of handwriting recognition and translation technology. LDC has developed a stable pipeline and infrastructures for collecting and annotating handwriting linguistic resources to support the evaluation of MADCAT and OpenHaRT. We collect handwritten samples of pre-processed Arabic and Chinese data that has been already tra...

2016
Olga Uryupina Ron Artstein Antonella Bristot Federica Cavicchio Kepa Joseba Rodríguez Massimo Poesio

This paper presents a second release of the ARRAU dataset: a multi-domain corpus with thorough linguistically motivated annotation of anaphora and related phenomena. Building upon the first release almost a decade ago, a considerable effort had been invested in improving the data both quantitatively and qualitatively. Thus, we have doubled the corpus size, expanded the selection of covered phen...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید