Diagnostic techniques for spoken keyword discovery

نویسندگان

  • Peter F. Schulam
  • Murat Akbacak
چکیده

Keyword discovery is an unsupervised technology that can help to process collections of speech and capture repeated patterns. This technology becomes useful and provides solution for unsupervised content analysis tasks, especially when the acoustic and lexical characteristics are not known in advance or there is little or no data to model these characteristics via statistical models. In these situations, keyword discovery can find potentially important words for further analysis using minimal resources. Unfortunately, keyword discovery performance heavily depends on the quality of the features used to characterize the raw signal and the alignment algorithm used to find similar feature subsequences. It is not yet fully understood which features and alignment algorithms work well in different scenarios and for different tasks, and there are very few diagnostic techniques for improving our understanding. In this paper, we present two diagnostic measurements that can be used to directly assess the quality of alignments between sequences of features independently of the intended use of the alignments downstream. We argue that such diagnostic techniques are valuable for intrinsically assessing speech features and alignment algorithms for keyword detection.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Event Detection From Spoken Content In Consumer Domain Videos

In this paper, we propose an innovative integrated approach to leverage available spoken content while detecting events in consumer-generated multimedia data (i.e., YouTube videos). Spoken content in consumer videos exhibits several challenges. For example, unlike Broadcast News, the spoken audio is typically not labeled. Also, the audio track in consumer videos tends to be noisy and the spoken...

متن کامل

Information Discovery on Electronic Health Records Using Authority Flow Techniques

BACKGROUND As the use of electronic health records (EHRs) becomes more widespread, so does the need to search and provide effective information discovery within them. Querying by keyword has emerged as one of the most effective paradigms for searching. Most work in this area is based on traditional Information Retrieval (IR) techniques, where each document is compared individually against the q...

متن کامل

Comparing Isolately Spoken Keywords Queries for Japanese Spoken D

This paper describes a Japanese spoken document retrieval system that uses voice input queries. We prepare two types of spoken queries: isolately spoken keywords and spontaneously spoken queries. To solve a mis-recognition problem of spoken queries, N-best hypotheses of transcripts of queries are used, and keyword candidates are selected from them by mutual information between recognized words....

متن کامل

Spoken Document Retrieval by Contents Complement and Keyword Expansion Using Subordinate Concept for NTCIR-SpokenDoc

We report on the result of investigating which relationship is important among hypernym and hyponym relationships in retrieval keyword expansion. Moreover, we report the effect of the keyword expansion and the contents complement for spoken document retrieval for SCR lecture retrieval task and SCR passage retrieval task. Spoken Document Retrieval by contents complement and keyword expansion usi...

متن کامل

Distributed Chinese Keyw Verification for Spoken Wireless Envir

With the rapid developments of wireless communications, it is highly desired for users to access the network information with spoken dialogue interface via hand-held devices at any time, from anywhere. One possible approach towards this goal is to perform speech feature extraction at the hand-held devices (the clients) and have all other recognition tasks and dialogue functions absorbed by the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014