Monoids: ecient segmental features for speech recognition
نویسندگان
چکیده
Recently, there has been interest in speech recognition with log-linear models that use features for whole segments, for example, words. e segmentation is oen taken from a conventional speech recogniser. However, this limits the performance ofmoving to a newmodel. An alternative is to nd the optimal segmentation. is requires acoustic features for all possible segments, which a recently proposed method extracts eciently. It shares computation between features for segments with the same start time. is is useful when all segments are considered, but feature extraction still takes quadratic time in the length of the utterance. A more realistic strategy for decoding would prune the hypothesis space. is report therefore proposes a new, more exible class of features. When features for all segments are required, extracting them has the same time complexity, but when only a limited number of segments are considered, they allow more re-use of computation. A specic subclass of features of interest derives from the total weight of a hidden Markov model (hmm) or a similar nite-state model. is report shows how to compute scores eciently for such a nite-state model with weights in any semiring.
منابع مشابه
An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملFusion of global statistical and segmental spectral features for speech emotion recognition
Speech emotion recognition is an interesting and challenging speech technology, which can be applied to broad areas. In this paper, we propose to fuse the global statistical and segmental spectral features at the decision level for speech emotion recognition. Each emotional utterance is individually scored by two recognition systems, the global statistics-based and segmental spectrum-based syst...
متن کاملClassification of emotional speech using spectral pattern features
Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کامل