Robust Speech Detection with Heteroscedastic Discriminant Analysis Applied to the Time-frequency Energy
نویسندگان
چکیده
In this paper, we propose a robust speech detection algorithm with Heteroscedastic Discriminant Analysis (HDA) applied to the Time-Frequency Energy (TFE). The TFE consists of the log energy in time domain, the log energy in the fixed band 2503500 Hz, and the log Mel-scale frequency bands energy. The bottom-up algorithm with automatic threshold adjustment is used for accurate word boundary detection. Compared to the algorithms based on the energy in time domain [1], the ATF parameter [2], the energy and the LDA-MFCC parameter [3], the proposed algorithm shows better performance under different types of noise.
منابع مشابه
Robust speech/non-speech detection using LDA applied to MFCC
In speech recognition, a speech/non-speech detection must be robust to noise. In this work, a new method for speech/nonspeech detection using a Linear Discriminant Analysis (LDA) applied to Mel Frequency Cepstrum Coefficients (MFCC) is presented. The energy is the most discriminant parameter between noise and speech. But with this single parameter, the speech/non-speech detection system detects...
متن کاملRobust speech/non-speech detection using LDA applied to MFCC for continuous speech recognition
Continuous speech recognition applications need precise detection because the number of words to recognize is unknown and vocabulary words can be short. The speech/non-speech detection must be robust to the boundary precision. In this work, a new approach to evaluate detection algorithm for continuous speech recognition is presented. The speech/non-speech detection using energy parameter combin...
متن کاملAcoustic and Data-driven Features for Robust Speech Activity Detection
In this paper we evaluate different features for speech activity detection (SAD). Several signal processing techniques are used to derive acoustic features that capture attributes of speech useful in differentiating speech segments in noise. The acoustic features include short-term spectral features, long-term modulation features both derived using Frequency Domain Linear Prediction (FDLP), and...
متن کاملLong Span Features and Minimum Phoneme Error Heteroscedastic Linear Discriminant Analysis
In this paper we explore the effect of long-span features, resulting from concatenating multiple speech frames and projecting the resulting vector onto a subspace using Linear Discriminant Analysis (LDA) techniques. We show that LDA is not always effective in selecting the optimal combination of long-span features, and introduce a discriminative feature analysis method that seeks to minimize ph...
متن کاملAudio classification using dominant spatial patterns in time-frequency space
This paper presents a novel audio discrimination algorithm using spatial features in time-frequency (TF) space. Three types of audio signals – speech, music without vocal and music with background vocal are taken into consideration for classification. The audio segment is transformed into TF domain yielding the spatial illustration of energy. Nonnegative matrix factorization (NMF) is applied to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002