mel frequency cel cepstrum mfcc

Audio Classification Based on Sparse Coefficients

2011

Syed Zubair Wenwu Wang

Audio signal classification is usually done using conventional signal features such as mel-frequency cepstrum coefficients (MFCC), line spectral frequencies (LSF), and short time energy (STM). Learned dictionaries have been shown to have promising capability for creating sparse representation of a signal and hence have a potential to be used for the extraction of signal features. In this paper,...

متن کامل

Improved Frame Level Features and SVM Supervectors Approach for the Recogniton of Emotional States from Speech: Application to categorical and dimensional states

Journal: :CoRR 2013

Imen Trabelsi Dorra Ben Ayed Mezghanni Noureddine Ellouze

The purpose of speech emotion recognition system is to classify speaker's utterances into different emotional states such as disgust, boredom, sadness, neutral and happiness. Speech features that are commonly used in speech emotion recognition (SER) rely on global utterance level prosodic features. In our work, we evaluate the impact of frame-level feature extraction. The speech samples are fro...

متن کامل

MFCC Based Text-Dependent Speaker Identification Using BPNN

2014

S. Nandyal

Speech processing has emerged as one of the important application area of digital signal processing. Various fields for research in speech processing are speech recognition, speaker recognition, speech synthesis, speech coding etc. Speaker recognition is one of the most useful and popular biometric recognition techniques in the world especially related to areas in which security is a major conc...

متن کامل

Text-Independent Speaker Authentication with Spiking Neural Networks

2007

Simei Gomes Wysoski Lubica Benusková Nikola K. Kasabov

This paper presents a novel system that performs text-independent speaker authentication using new spiking neural network (SNN) architectures. Each speaker is represented by a set of prototype vectors that is trained with standard Hebbian rule and winner-takes-all approach. For every speaker there is a separated spiking network that computes normalized similarity scores of MFCC (Mel Frequency C...

متن کامل

A multi-channel speech enhancement framework for robust NMF-based speech recognition for speech-impaired users

2015

Gert Dekkers Toon van Waterschoot Bart Vanrumste Bert Van Den Broeck Jort F. Gemmeke Hugo Van hamme Peter Karsmakers

In this paper a multi-channel speech enhancement framework for distant speech acquisition in noisy and reverberant environments for Non-negative Matrix Factorization (NMF)-based Automatic Speech Recognition (ASR) is proposed. The system is evaluated for its use in an assistive vocal interface for physically impaired and speech-impaired users. The framework utilises the Spatially Pre-processed S...

متن کامل

Acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments

2001

Hong Kook Kim Richard C. Rose Hong-Goo Kang

This paper presents a set of acoustic feature pre–processing techniques that are applied to improving automatic speech recognition (ASR) performance on the Aurora 2 noisy speech recognition task. The principal contribution of this paper is an approach for cepstrum domain feature compensation in ASR which is motivated by techniques for decomposing speech and noise that were originally developed ...

متن کامل

A Configurable Accelerator for Keyword Spotting Based on Small-Footprint Temporal Efficient Neural Network

Journal: :Electronics 2022

Keyword spotting (KWS) plays a crucial role in human–machine interactions involving smart devices. In recent years, temporal convolutional networks (TCNs) have performed outstandingly with less computational complexity, comparison classical neural network (CNN) methods. However, it remains challenging to achieve trade-off between small-footprint model and high accuracy for the edge deployment o...

متن کامل

Simulasi Simulasi Ekstraksi Fitur Suara menggunakan Mel-Frequency Cepstrum Coefficient

Journal: :Jurnal sains dan informatika 2022

Berbicara adalah cara komunikasi yang paling mudah dan banyak digunakan antara manusia. Pengembangan antarmuka komputer manusia untuk membangun dialog serupa mesin inspirasi di balik sistem pengenalan suara. Salah satu algoritma tersebut koefisien Cepstral frekuensi Mel. Makalah ini menjelaskan semua tahapan teknik MFCC bersama dengan deskripsi singkat dari setiap proses. Dalam penelitian dijel...

متن کامل

Low Resource Language Analysis Using Deep Learning Algorithm for Gender Classification

Journal: :ACM Transactions on Asian and Low-Resource Language Information Processing 2023

Voice signals are the essential input source for applications based on human and computer interaction technology. Gender identification through voice is one of most challenging tasks. For signal analysis, deep learning algorithms provide an alternative to traditional conventional classification. To identify gender female, male ‘first-time’ transgender, algorithm used improve robustness model wi...

متن کامل

Internationaljournal of Adavnced Studies in Computer Science and Engineering Ijascse Volume 6 Issue 01, 2017

2017

Roy Rudolf Huizen Jazi Eko Istiyanto Agfianto Eko Putra

This research was conducted to develop a method to identify voice utterance. For voice utterance that encounters change caused by aging factor, with the interval of 10 to 25 years. The change of voice utterance influenced by aging factor might be extracted by MFCC (Mel Frequency Cepstrum Coefficient). However, the level of the compatibility of the feature may be dropped down to 55%. While the o...

متن کامل