text database

Comparative phonetic analysis and phoneme recognition for Afrikaans, English and Xhosa using the African Speech Technology telephone speech databases

Journal: :South African Computer Journal 2004

Thomas Niesler Philippa H. Louw

This paper concerns the Afrikaans, English and Xhosa speech databases recently developed as part of the African Speech Technology project. The three corpora are analysed and compared in terms of their phonetic content, diversity and mutual overlap. Connected phoneme recognition systems are subsequently developed and tested in each language.

متن کامل

Querying Linguistic Trees

Journal: :Journal of Logic, Language and Information 2010

Catherine Lai Steven Bird

Large databases of linguistic annotations are used for testing linguistic hypotheses and for training language processing models. These linguistic annotations are often syntactic or prosodic in nature, and have a hierarchical structure. Query languages are used to select particular structures of interest, or to project out large slices of a corpus for external analysis. Existing languages suffe...

متن کامل

Multi-Dimensional Data Acquisition for Integrated Acoustic Information Research

2002

Nobuo Kawaguchi Shigeki Matsubara Kazuya Takeda Fumitada Itakura

The Center for Integrated Acoustic Information Research (CIAIR) at Nagoya University has been collecting various kinds of speech corpora for both of acoustic modeling and speech modeling. The corpora include multi-media data collection in moving-car environment, collection of children's voice while video gaming, room acoustics at multiple points, head related transfer functions of multiple subj...

متن کامل

A Text Mining Approach for the Extraction of Kinetic Information from Literature

2015

Ana Alão Freitas Hugo Costa Miguel Rocha Isabel Rocha

Systems biology has fostered interest in the use of kinetic models to better understand the dynamic behavior of metabolic networks in a wide variety of conditions. Unfortunately, in most cases, data available in different databases are not sufficient for the development of such models, since a significant part of the relevant information is still scattered in the literature. Thus, it becomes es...

متن کامل

A database design for a TTS synthesis system using lexical diphones

2004

Tanya Lambert Andrew P. Breen

Database designs, if based on the premise that there are about 2000 diphones in English, as stated in many publications and on-line documents, are likely to render a database of diphones, which will fail to capture some important phonological phenomena of English. This paper proposes a TTS database, which is built from diphones inclusive of their syllabic stress; we term these units lexical dip...

متن کامل

Text-independent speaker identification and verification using the TIMIT database

1998

Nuala C. Ward Dominik R. Dersch

This paper presents a neural network inspired approach to speaker recognition using speaker models constructed from full data sets. A similarity measure between data sets is used for text-independent speaker identification and verification. In order to reduce the computational effort in calculating the similarity measure, a fuzzy Vector Quantisation procedure is applied. This method has previou...

متن کامل

A Model for Interoperability: XML Documents as an RDF Database

2004

Gary F. Simons Brian Fitzsimons William D. Lewis Scott O. Farrar Alexis Lanham Hector Gonzalez

We propose a model for a Resource Description Format (RDF) database for interlinear glossed text (IGT) created from documents encoded in the Extensible Markup Language (XML) using markup metaschemas. A metaschema, constructed using the Semantic Interpretation Language (SIL) (Simons 2004) maps XML-encoded documents to a common semantically rich RDF database. The RDF database in turn can be searc...

متن کامل

Text Classification of Formatted Text Documents

2002

We describe a multiclass text classification system for formatted text messages contained in the Rich Text Format fields of a structured database of military documents. This system uses a Part-Of–Speech tagger and a RuleBased Classifier to classify 80 different types of formatted messages.

متن کامل

Text to Phoneme Conversion in Persian Using Smooth Ergodic Hidden Markov Model

2004

F. Hendessi

In developing a text-to-speech system, it is well known that the accuracy of information extracted from a text is crucial to produce high quality synthesized speech. In this paper, a Persian text to speech system is studied. The system uses speech waveform concatenation method that is comparatively mature in text-to-speech synthesis. This paper describes the innovation introduced into the text ...

متن کامل

Korean Text Generation from Database for Homeshopping Sites

2001

Ji-Eun Roh Sin-Jae Kang Jong-Hyeok Lee

This paper describes a text generation system, XExplainer, which can dynamically produce a description of commodities in Korean from a relational database for homeshopping sites. We focus on how to generate well-written texts through several generation stages in the marketing domain. The generated text was evaluated using several criteria, such as content completeness, structural coherence, con...

متن کامل