I-Vector/PLDA Variants for Text-Dependent Speaker Recognition
نویسندگان
چکیده
The i-vector/PLDA approach currently dominates the field of text-independent speaker recognition and the question of how to translate this methodology to the text-dependent domain has recently become an active area of research. The essential difference between the two fields is that it is possible to do speaker recognition with enrollment and test utterances of very short duration in the text-dependent case but not in the text-independent case. The i-vector representation of short utterances turns out to be very sensitive to their phonetic content and this introduces a major source of nuisance variability when i-vectors are used in text-dependent speaker recognition. We show how, despite this complication, i-vector extractors can be successfully trained on short utterances (rather than on whole conversation sides as is usually done) and how this source of nuisance variability can be dealt with successfully in a PLDA classifier by making the PLDA model parameters phrase-dependent. Our results show that this phrase dependent version of PLDA is capable of outperforming the speaker-phrase version of PLDA prePreprint submitted to Computer, Speech and Language November 6, 2013 sented in [8] on the RSR2015 dataset. We also give a detailed account of uncertainty propagation in PLDA and we show that it combines very successfully with phrase-dependent PLDA.
منابع مشابه
Text-dependent speaker recognition using PLDA with uncertainty propagation
In this paper, we apply and enhance the i-vector-PLDA paradigm to text-dependent speaker recognition. Due to its origin in text-independent speaker recognition, this paradigm does not make use of the phonetic content of each utterance. Moreover, the uncertainty in the i-vector estimates should be taken into account in the PLDA model, due to the short duration of the utterances. To bridge this g...
متن کاملEnd-to-end DNN Based Speaker Recognition Inspired by i-vector and PLDA
Recently several end-to-end speaker verification systems based on deep neural networks (DNNs) have been proposed. These systems have been proven to be competitive for text-dependent tasks as well as for text-independent tasks with short utterances. However, for text-independent tasks with longer utterances, end-to-end systems are still outperformed by standard i-vector + PLDA systems. In this w...
متن کاملFast scoring for PLDA with uncertainty propagation via i-vector grouping
The i-vector/PLDA framework has gained huge popularity in text-independent speaker verification. This approach, however, lacks the ability to represent the reliability of i-vectors. As a result, the framework performs poorly when presented with utterances of arbitrary duration. To address this problem, a method called uncertainty propagation (UP) was proposed to explicitly model the reliability...
متن کاملDuration dependent covariance regularization in PLDA modeling for speaker verification
In this paper, we present a covariance regularized probabilistic linear discriminant analysis (CR-PLDA) model for text independent speaker verification. In the conventional simplified PLDA modeling, the covariance matrix used to capture the residual energies is globally shared for all i-vectors. However, we believe that the point estimated i-vectors from longer speech utterances may be more acc...
متن کاملPLDA in the I-Supervector Space for Text-Independent Speaker Verification
In this paper, we advocate the use of the uncompressed form of i-vector and depend on subspace modeling using probabilistic linear discriminant analysis (PLDA) in handling the speaker and session (or channel) variability. An i-vector is a low-dimensional vector containing both speaker and channel information acquired from a speech segment. When PLDA is used on an i-vector, dimension reduction i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013