Target Speaker Verification With Selective Auditory Attention for Single and Multi-Talker Speech

نویسندگان

چکیده

Speaker verification has been studied mostly under the single-talker condition. It is adversely affected in presence of interference speakers. Inspired by study on target speaker extraction, e.g., SpEx, we propose a unified framework for both single- and multi-talker speech, that able to pay selective auditory attention speaker. This (tSV) jointly optimizes module representation via multi-task learning. We four different embedding schemes tSV framework. The experimental results show all significantly outperform other competitive solutions speech. Notably, best scheme achieves 76.0% 55.3% relative improvements over baseline system WSJ0-2mix-extr Libri2Mix corpora terms equal-error-rate 2-talker while performance speech par with traditional system, trained evaluated same

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Auditory measures of selective and divided attention in young and older adults using single-talker competition.

In this study, two experiments were conducted on auditory selective and divided attention in which the listening task involved the identification of words in sentences spoken by one talker while a second talker produced a very similar competing sentence. Ten young normal-hearing (YNH) and 13 elderly hearing-impaired (EHI) listeners participated in each experiment. The type of attention cue used...

متن کامل

Single-Channel Multi-talker Speech Recognition with Permutation Invariant Training

Although great progresses have been made in automatic speech recognition (ASR), significant performance degradation is still observed when recognizing multi-talker mixed speech. In this paper, we propose and evaluate several architectures to address this problem under the assumption that only a single channel of mixed signal is available. Our technique extends permutation invariant training (PI...

متن کامل

The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene.

Attending to one speaker in multi-speaker situations is challenging. One neural mechanism proposed to underlie the ability to attend to a particular speaker is phase-locking of low-frequency activity in auditory cortex to speech's temporal envelope ("speech-tracking"), which is more precise for attended speech. However, it is not known what brings about this attentional effect, and specifically...

متن کامل

Unsupervised segmentation and verification of multi-speaker conversational speech

This paper presents our approach to unsupervised multispeaker conversational speech segmentation. Speech segmentation is obtained in two steps that employ different techniques. The first step performs a preliminary segmentation of the conversation analyzing fixed length slices, and assumes the presence in every slice of one or two speakers. The second step clusters the segments obtained by the ...

متن کامل

Single-speaker/multi-speaker co-channel speech classification

The demand for content-based management and real-time manipulation of audio data is constantly increasing. This paper presents a method to identify temporal regions, in a segment of co-channel speech, as being either single-speaker or multispeaker speech. The state of the art approach for this purpose is the kurtosis. In this paper, a set of complementary time-domain and frequency-domain featur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2021

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2021.3100682