Accurate endpointing with expected pause duration

نویسندگان

  • Baiyang Liu
  • Björn Hoffmeister
  • Ariya Rastrow
چکیده

In an online automatic speech recognition system, the role of the endpoint detector is to infer when a user has finished speaking a query. Accurate and low-latency endpoint detection is crucial for natural voice interaction. Classic voice activity detector (VAD) based approaches monitor the incoming audio and trigger when a sufficiently long pause is detected. Such approaches are typically limited due to their inability to distinguish between within and end-of-sentence pauses. In this paper, we propose an endpoint detection algorithm that is integrated with the speech recognition process, leveraging acoustic and language model information in order to distinguish between within and end-ofsentence pauses. Unlike other integrated approaches that are based on the highest-scoring active recognition hypothesis, the proposed algorithm computes the expected pause duration over all active hypotheses, which leads to a more reliable pause duration prediction. We show that our method achieves significantly higher accuracy and lower latency in a comparison to standard approaches for endpoint detection.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Utterance Segmentation in Spontaneous Speech

As applications incorporating speech recognition technology become widely used, it is desireable to have such systems interact naturally with its users. For such natural interaction to occur, recognition systems must be able to accurately detect when a speaker has finished speaking. This research presents an analysis combining lower and higher level cues to perform the utterance endpointing tas...

متن کامل

Optimizing Endpointing Thresholds using Dialogue Features in a Spoken Dialogue System

This paper describes a novel algorithm to dynamically set endpointing thresholds based on a rich set of dialogue features to detect the end of user utterances in a dialogue system. By analyzing the relationship between silences in user’s speech to a spoken dialogue system and a wide range of automatically extracted features from discourse, semantics, prosody, timing and speaker characteristics,...

متن کامل

ACL - 08 : HLT Proceedings of the 9 th SIGdial Workshop on Discourse and Dialogue

This paper describes a novel algorithm to dynamically set endpointing thresholds based on a rich set of dialogue features to detect the end of user utterances in a dialogue system. By analyzing the relationship between silences in user’s speech to a spoken dialogue system and a wide range of automatically extracted features from discourse, semantics, prosody, timing and speaker characteristics,...

متن کامل

Proceedings of the 9 th SIGdial Workshop on Discourse and Dialogue

This paper describes a novel algorithm to dynamically set endpointing thresholds based on a rich set of dialogue features to detect the end of user utterances in a dialogue system. By analyzing the relationship between silences in user’s speech to a spoken dialogue system and a wide range of automatically extracted features from discourse, semantics, prosody, timing and speaker characteristics,...

متن کامل

Pause duration and variability in read texts

Generating natural sounding synthetic speech from text requires a division of a text into IPs and assigning pauses between those phrases. A difficulty which faces attempts to model pauses quantitatively is high degree of variability exhibited by speakers in pause placement and duration. The present study seeks to investigate if Synchronous Speech (speech elicited when two speakers are asked to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015