Limited data speaker identification

نویسندگان

  • H S JAYANNA
  • MAHADEVA PRASANNA
چکیده

In this paper, the task of identifying the speaker using limited training and testing data is addressed. Speaker identification system is viewed as four stages namely, analysis, feature extraction, modelling and testing. The speaker identification performance depends on the techniques employed in these stages. As demonstrated by different experiments, in case of limited training and testing data condition, owing to less data, existing techniques in each stage will not provide good performance. This work demonstrates the following: multiple frame size and rate (MFSR) analysis provides improvement in the analysis stage, combination of mel frequency cepstral coefficients (MFCC), its temporal derivatives ( , ), linear prediction residual (LPR) and linear prediction residual phase (LPRP) features provides improvement in the feature extraction stage and combination of learning vector quantization (LVQ) and gaussian mixture model – universal background model (GMM–UBM) provides improvement in the modelling stage. The performance is further improved by integrating the proposed techniques at the respective stages and combining the evidences from them at the testing stage. To achieve this, we propose strength voting (SV), weighted borda count (WBC) and supporting systems (SS) as combining methods at the abstract, rank and measurement levels, respectively. Finally, the proposed hierarchical combination (HC) method integrating these three methods provides significant improvement in the performance. Based on these explorations, this work proposes a scheme for speaker identification under limited training and testing data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating speaker identification and learning with adaptive speech recognition

Presently, speaker adaptive systems are the state-of-theart in automatic speech recognition. A general baseline model is adapted to the current speaker during recognition in order to improve the quality of the results obtained. However, the adaptation procedure needs to be able to distinguish between data from different speakers. Therefore, in a general speaker adaptive recognizer speaker recog...

متن کامل

Augmentation of adaptation data

Linear regression based speaker adaptation approaches can improve Automatic Speech Recognition (ASR) accuracy significantly for a target speaker. However, when the available adaptation data is limited to a few seconds, the accuracy of the speaker adapted models is often worse compared with speaker independent models. In this paper, we propose an approach to select a set of reference speakers ac...

متن کامل

On the use of nearest feature line for speaker identification

As a new pattern classification method, Nearest Feature Line (NFL) provides an effective way to tackle the sort of pattern recognition problems where only limited data are available for training. In this paper, we explore the use of NFL for speaker identification in terms of limited data and examine how the NFL performs in such a vexing problem of various mismatches between training and test. I...

متن کامل

Speaker tracking in an unsupervised speech controlled system

In this paper we present a technique to increase the robustness of a self-learning speech controlled system comprising speech recognition, speaker identification and speaker adaptation. Our goal is the automatic personalization of a speech controlled device for groups of 5-10 recurring speakers. Speakers should be identified and tracked across speaker turns only by their voice patterns. Efficie...

متن کامل

Closed-Set Speaker Identification Based on a Single Word Utterance: An Evaluation of Alternative Approaches

The problem of closed-set speaker identification based on a single spoken word from a limited vocabulary is relevant to several current and futuristic interactive multimedia applications. In this paper, we evaluate the effectiveness of several potential solutions using an isolated word speech corpus. In addition to evaluating the text-dependent and text-constrained variants of the Gaussian Mixt...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010