Target Speaker Verification With Selective Auditory Attention for Single and Multi-Talker Speech
نویسندگان
چکیده
Speaker verification has been studied mostly under the single-talker condition. It is adversely affected in presence of interference speakers. Inspired by study on target speaker extraction, e.g., SpEx, we propose a unified framework for both single- and multi-talker speech, that able to pay selective auditory attention speaker. This (tSV) jointly optimizes module representation via multi-task learning. We four different embedding schemes tSV framework. The experimental results show all significantly outperform other competitive solutions speech. Notably, best scheme achieves 76.0% 55.3% relative improvements over baseline system WSJ0-2mix-extr Libri2Mix corpora terms equal-error-rate 2-talker while performance speech par with traditional system, trained evaluated same
منابع مشابه
Auditory measures of selective and divided attention in young and older adults using single-talker competition.
In this study, two experiments were conducted on auditory selective and divided attention in which the listening task involved the identification of words in sentences spoken by one talker while a second talker produced a very similar competing sentence. Ten young normal-hearing (YNH) and 13 elderly hearing-impaired (EHI) listeners participated in each experiment. The type of attention cue used...
متن کاملSingle-Channel Multi-talker Speech Recognition with Permutation Invariant Training
Although great progresses have been made in automatic speech recognition (ASR), significant performance degradation is still observed when recognizing multi-talker mixed speech. In this paper, we propose and evaluate several architectures to address this problem under the assumption that only a single channel of mixed signal is available. Our technique extends permutation invariant training (PI...
متن کاملThe effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene.
Attending to one speaker in multi-speaker situations is challenging. One neural mechanism proposed to underlie the ability to attend to a particular speaker is phase-locking of low-frequency activity in auditory cortex to speech's temporal envelope ("speech-tracking"), which is more precise for attended speech. However, it is not known what brings about this attentional effect, and specifically...
متن کاملUnsupervised segmentation and verification of multi-speaker conversational speech
This paper presents our approach to unsupervised multispeaker conversational speech segmentation. Speech segmentation is obtained in two steps that employ different techniques. The first step performs a preliminary segmentation of the conversation analyzing fixed length slices, and assumes the presence in every slice of one or two speakers. The second step clusters the segments obtained by the ...
متن کاملSingle-speaker/multi-speaker co-channel speech classification
The demand for content-based management and real-time manipulation of audio data is constantly increasing. This paper presents a method to identify temporal regions, in a segment of co-channel speech, as being either single-speaker or multispeaker speech. The state of the art approach for this purpose is the kurtosis. In this paper, a set of complementary time-domain and frequency-domain featur...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing
سال: 2021
ISSN: ['2329-9304', '2329-9290']
DOI: https://doi.org/10.1109/taslp.2021.3100682