Onset Detection Exploiting Wavelet Transform with Bidirectional Long Short-term Memory Neural Networks
نویسندگان
چکیده
A plethora of different onset detection methods have been proposed in the recent years. However few attempts have been made with regard to widely-applicable approaches in order to achieve superior performances over different types of music and with considerable temporal precision. This paper concerns the usage of Wavelet Packet Transform in order to exploits multi-resolution time-frequency features. We apply early fusion in the feature space by combining Wavelet Packet Energy Coefficients and auditory spectral features. The features are then processed by a bidirectional Long Short-Term Memory recurrent neural network, acting as reduction function. The network is trained with a large database of onset data covering various genres and onset types. Due to the data driven nature, our approach does not require the onset detection method and its parameters to be tuned to a particular type of music. 1. ALGORITHM DESCRIPTION The algorithm can be seen divided in three parts. Firstly, the audio data is transformed into the frequency domain via a Discrete Wavelet Packet Transform (DWPT) with 22 bands (Table 1 shows frequency division) and via two parallel STFTs with two different window sizes. Energy-base information and their evolution in time are obtained from the transformations leading to the final feature set. Secondly, the features are used as inputs to the BLSTM network, which produces an onset activation function at its output. Finally, network output is post-processed by a thresholding and peak picking methods in order to obtain the correct onsets’ position. Figure 1 shows this basic signal flow. The individual blocks are described in more detail in the following sections. This document is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 License. http://creativecommons.org/licenses/by-nc-sa/3.0/ c © 2013 The Authors. Auditory Spectral Feat. WPEC BLSTM Network Peak detection Signal Onsets Figure 1. General block scheme. 1.1 Feature Extraction Discrete input audio files, sampled at Fs = 44.1kHz, have been used for our experiments. A new features set is obtained exploiting wavelet transformation (cf. Figure 2) by obtaining Wavelet Packet Energy Coefficients (WPEC). The discrete input audio signal is segmented into overlapping frames ofW46 = 2048 samples, which are sampled at a rate of 100 fps, log-energy of each frame is calculated before applying the Hamming window following:
منابع مشابه
Onset Detection Exploiting Adaptive Linear Prediction Filtering in Dwt Domain with Bidirectional Long Short-term Memory Neural Networks
The following short paper presents an experimental algorithm for onset detection which apply features extraction in the wavelet domain and auditory spectral features to Bidirectional Long Short-Term Memory (BLSTM) recurrent neural networks for decision-making. The presented algorithm exploits multi-resolution time-frequency features via the discrete wavelet transformation to decompose the input...
متن کاملOnset Detection for Piano Music Transcription Based on Neural Networks
Onset detection refers to the task of determining the physical starting time of notes or other musical events as they occur in a music recording. Various kinds of onset detection methods have been proposed in recent years. The goal of this paper is to choose a relative appropriate method to do onset detection. The neural network is discussed, especially the advanced bidirectional long short-ter...
متن کاملUniversal Onset Detection with Bidirectional Long Short-Term Memory Neural Networks
Many different onset detection methods have been proposed in recent years. However those that perform well tend to be highly specialised for certain types of music, while those that are more widely applicable give only moderate performance. In this paper we present a new onset detector with superior performance and temporal precision for all kinds of music, including complex music mixes. It is ...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملShort term electric load prediction based on deep neural network and wavelet transform and input selection
Electricity demand forecasting is one of the most important factors in the planning, design, and operation of competitive electrical systems. However, most of the load forecasting methods are not accurate. Therefore, in order to increase the accuracy of the short-term electrical load forecast, this paper proposes a hybrid method for predicting electric load based on a deep neural network with a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013