Speech Emotion Recognition Using Scalogram Based Deep Structure

نویسندگان

  • K. Aghajani Department of Engineering and Technology, University of Mazandaran, Babolsar, Iran
چکیده مقاله:

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concatenated Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN). The CNN can be used to learn local salient features from speech signals, images, and videos. Moreover, the RNNs have been used in many sequential data processing tasks in order to learn long-term dependencies between the local features. A combination of these two gives us the advantage of the strengths of both networks. In the proposed method, CNN has been applied directly to a scalogram of speech signals. Then, the attention-mechanism-based RNN model was used to learn long-term temporal relationships of the learned features. Experiments on various data such as RAVDESS, SAVEE, and Emo-DB demonstrate the effectiveness of the proposed SER method.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms

We present a new implementation of emotion recognition from the para-lingual information in the speech, based on a deep neural network, applied directly to spectrograms. This new method achieves higher recognition accuracy compared to previously published results, while also limiting the latency. It processes the speech input in smaller segments – up to 3 seconds, and splits a longer input into...

متن کامل

Emotion recognition using imperfect speech recognition

This paper investigates the use of speech-to-text methods for assigning an emotion class to a given speech utterance. Previous work shows that an emotion extracted from text can convey complementary evidence to the information extracted by classifiers based on spectral, or other non-linguistic features. As speech-to-text usually presents significantly more computational effort, in this study we...

متن کامل

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...

متن کامل

Emotion Recognition from Speech Using IG-Based Feature Compensation

This paper presents an approach to feature compensation for emotion recognition from speech signals. In this approach, the intonation groups (IGs) of the input speech signals are extracted first. The speech features in each selected intonation group are then extracted. With the assumption of linear mapping between feature spaces in different emotional states, a feature compensation approach is ...

متن کامل

Emotion Recognition from Speech using Teager based DSCC Features

Emotion recognition from speech has emerged as an important research area in the recent past. The purpose of speech emotion recognition system is to automatically classify speaker's utterances into seven emotional states including anger, boredom, disgust, fear, happiness, sadness and neutral. The speech samples are from Berlin emotional database and the features extracted from these uttera...

متن کامل

Physical Features Based Speech Emotion Recognition Using Predictive Classification

In the era of data explosion, speech emotion plays crucial commercial significance. Emotion recognition in speech encompasses a gamut of techniques starting from mechanical recording of audio signal to complex modeling of extracted patterns. Most challenging part of this research purview is to classify the emotion of the speech purely based on the physical characteristics of the audio signal in...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 33  شماره 2

صفحات  285- 292

تاریخ انتشار 2020-02-01

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023