A Gammatone-based Psychoacoustical Modeling Approach for Speech and Audio Coding

نویسندگان

  • Ghassan Charestan
  • Richard Heusdens
چکیده

We propose a new approach for modeling auditory masking based on gammatone filters for application areas including speech/audio coding and audio watermarking. Besides the use of gammatone filters, this model differs from existing audio coding psychoacoustical models (e.g., the ones used in MPEG), in taking into account the contribution of a range of filters in computing the distortion, rather than considering the filter receiving most of the distortion. This is more in line with known psychoacoustical data and it more adequately describes the different masking behavior of tonal versus noisy components without the need for a separate tonality detector. keywords— audio/speech coding, auditory masking, distortion detectability, psychoacoustical model, gammatone filter bank.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Wideband speech and audio coding using gammatone filter banks

Considerable research attention has been directed towards speech and audio coding algorithms capable of producing high quality coded speech and audio, however few of these use signal representations which account for temporal as well as spectral detail. This paper presents a new technique for 16 kHz wideband speech and audio coding, whereby analysis and synthesis are performed using a linear ph...

متن کامل

Application of a Physiological Ear Model to Irrelevance Reduction in Audio Coding

A previously published physiological ear model is applied as perceptual model to an audio coder complying with the ISO/ MPEG-2 AAC standard. The achieved subjective sound quality is compared to results from an optimized psychoacoustical model. Significant deviations of the generated masked thresholds from the physiological ear model and the psychoacoustical model are evaluated with respect to p...

متن کامل

Auditory-inspired sparse representation of audio signals

This article deals with the generation of auditory-inspired spectro-temporal features aimed at audio coding. To do so, we first generate sparse audio representations we call spikegrams, using projections on gammatone/gammachirp kernels that generate neural spikes. Unlike Fourier-based representations, these representations are powerful at identifying auditory events, such as onsets, offsets, tr...

متن کامل

Reconstructing audio signals from modified non-coherent hilbert envelopes

In this paper, we present a speech and audio analysis-synthesis method based on a Basilar Membrane (BM) model. The audio signal is represented in this method by the Hilbert envelopes of the responses to complex gammatone filters uniformally spaced on a critical band scale. We show that for speech and audio signals, a perceptually equivalent signal can be reconstructed from the envelopes alone b...

متن کامل

Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

The full modulation spectrum is a high-dimensional representation of one-dimensional audio signals. Most previous research in automatic speech recognition converted this very rich representation into the equivalent of a sequence of short-time power spectra, mainly to simplify the computation of the posterior probability that a frame of an unknown speech signal is related to a specific state. In...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001