mel frequency cel cepstrum mfcc

MTM at MediaEval 2013 Violent Scenes Detection: Through Acoustic-visual Transform

2013

Bruno do Nascimento Teixeira

This paper describes the team MTM participation in the MediaEval 2013 campaign. We submitted one run at shot level that explores spatial correlation between acoustic-visual features. The motion features are computed to represent the video.The Mel Frequency Cepstral Coefficients (MFCC) of the acoustic signal, and their first and second order derivatives are exploited to represent audio. One main...

متن کامل

Short-Term Spectral Feature Extraction and Their Fusion in Text Independent Speaker Recognition: A Review

2014

Ruchi Chaudhary

The paper gives an overview of Text-independent short-term-feature-extraction methods of Speaker Recognition System, for clean as well as noisy environment and their fusion at different levels. The basics of extracting feature, which is an imperative component for speaker recognition system, have been discussed along with their variants. The evolution of the conventional methods to ‘Stateof-the...

متن کامل

Identifying Perceptually Similar Languages Using Teager Energy Based Cepstrum

Journal: :Engineering Letters 2008

Hemant A. Patil Tapan Kumar Basu

identifying an unknown language from the test utterances. In this paper, a new method of feature extraction, viz., Teager Energy Based Mel Frequency Cepstral Coefficients (T-MFCC) is developed for identification of perceptually similar languages. Finally, an LID system is presented for Hindi and Urdu (perceptually similar Indian languages) to demonstrate effectiveness of newly proposed feature ...

متن کامل

Design and Evaluation of Semantic Mood Models for Music Recommendation using Editorial Tags

2013

Mathieu Barthet David Marston Chris Baume György Fazekas Mark B. Sandler

In this paper we present and evaluate two semantic music mood models relying on metadata extracted from over 180,000 production music tracks sourced from I Like Music (ILM)’s collection. We performed non-metric multidimensional scaling (MDS) analyses of mood stem dissimilarity matrices (1 to 13 dimensions) and devised five different mood tag summarisation methods to map tracks in the dimensiona...

متن کامل

Robust algorithms for speech reconstruction on mobile devices

2005

Xu Shao

This thesis is concerned with reconstructing an intelligible time-domain speech signal from speech recognition features, such as Mel-frequency cepstral coefficients (MFCCs), in a distributed speech recognition(DSR) environment. The initial reconstruction methods in this thesis require, in addition to MFCC vectors, fundamental frequency and voicing information. In the later parts of the thesis t...

متن کامل

Classification of video genre using audio

2001

Matthew Roach John S. D. Mason

In this paper we propose an approach to high-level classification of video into genre: sport, cartoon, news, commercial and music. An important issue for automatic high-level classification systems is the amount of time needed to classify a video. Here we investigate classification performance as a function of the test sequence length. In addition we present performance against different orders...

متن کامل

The vocal tract as a biometric: output measures, interrelationships, and efficacy

2015

Peter French Paul Foulkes Philip Harrison Vincent Hughes Louisa Stevens

This paper explores methods for characterising individual voices using different vocal tract output measures. Mel frequency cepstral coefficients (MFCCs), long-term formant distributions (LTFDs) and scores based on vocal profile analysis (VPA) of long-term supralaryngeal settings were extracted from the same corpus of recordings. Distances between speakers were calculated and used to test the i...

متن کامل

Modified Mel Filter Bank to Compute MFCC of Subsampled Speech

Journal: :CoRR 2014

Kiran Kumar Bhuvanagiri Sunil Kumar Kopparapu

Mel Frequency Cepstral Coefficients (MFCCs) are the most popularly used speech features in most speech and speaker recognition applications. In this work, we propose a modified Mel filter bank to extract MFCCs from subsampled speech. We also propose a stronger metric which effectively captures the correlation between MFCCs of original speech and MFCC of resampled speech. It is found that the pr...

متن کامل

Auditory image model features for automatic speech recognition

2005

Mario E. Munich Qiguang Lin

Conventional speech recognition engines extract Mel Frequency Cepstral Coefficients (MFCC) features from incoming speech. This paper presents a novel approach for feature extraction in which speech is processed according to the Auditory Image Model, a model of human psychoacoustics. We fist describe the proposed frontend, then we present recognition results obtained with the TIMIT database. Com...

متن کامل

Improving the noise-robustness of mel-frequency cepstral coefficients for speech processing

2006

Sourabh Ravindran David V. Anderson Malcolm Slaney

In this paper we study the noise-robustness of mel-frequency cepstral coefficients (MFCCs) and explore ways to improve their performance in noisy conditions. Improvements based on a more accurate model of the early auditory system are suggested to make the MFCC features more robust to noise while preserving their class discrimination ability. Speech versus non-speech classification and speech r...

متن کامل