Analysis of spectrogram image methods for sound event classification
نویسندگان
چکیده
The time-frequency spectrogram representation of an audio signal can be visually analysed by a trained researcher to recognise any underlying sound events in a process called “spectrogram reading”. However, this has not become a popular approach for automatic classification, as the field is driven by Automatic Speech Recognition (ASR) where frame-based features are popular. As opposed to speech, sound events typically have a more distinctive time-frequency representation, with the energy concentrated in a small number of spectral components. This makes them more suitable for classification based on their visual signature, and enables inspiration to be found in techniques from the related field of image processing. Recently, there have been a range of techniques that extract image processing-inspired features from the spectrogram for sound event classification. In this paper, we introduce the idea and structure behind six recent spectrogram image methods and analyse their performance on a large database containing 50 different environmental sounds to give a standardised comparison that is not often available in sound event classification tasks.
منابع مشابه
Classification of emotional speech using spectral pattern features
Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...
متن کاملSound-Event Classification Using Pseudo-Color CENTRIST Feature and Classifier Selection
Sound-event classification often extracts features from an image-like spectrogram. Recent approaches such as spectrogram image feature and subband-power-distribution image feature extract local statistics such as mean and variance from the spectrogram. We argue that such simple image statistics cannot well capture complex texture details of the spectrogram. Thus, we propose to extract pseudo-co...
متن کاملRobust sound event classification using LBP-HOG based bag-of-audio-words feature representation
This paper addresses the problem of sound event classification, focusing on feature extraction methods which are robust in noisy environments. In real world, sound events can be easily exposed in a noisy situation causing corruption of distinctive temporal and spectral characteristics. Therefore, extracting robust features to represent these characteristics is important in achieving good classi...
متن کاملOverlapping Sound Event Recognition using Local Spectrogram Features with the Generalised Hough Transform
We present a novel approach for recognition of overlapping sound events based on the Generalised Hough Transform (GHT) – a technique commonly used for object recognition in the domain of image processing. Unlike our previous work on image-based sound event classification, where we focussed on global image features, here we extract local features from detected interest-points in the spectrogram....
متن کاملSound Event Recognition in Unstructured Environments using Spectrogram Image Processing
The objective of this research is to develop feature extraction and classification techniques for the task of sound event recognition (SER) in unstructured environments. Although this field is traditionally overshadowed by the popular field of automatic speech recognition (ASR), an SER system that can achieve human-like sound recognition performance opens up a range of novel application areas. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014