Welcome to the first CN-Celeb speaker recognition challenge, CNSRC 2022 ! The challenge aims to probe how well the current speaker recognition methods can work in real world scenarios, including with in-the-wild complexity and real-time processing speed.
The challenge will be based on CN-Celeb, a free multi-genre speaker recognition dataset with the most real-world complexity so far. The dataset consists of audio from both multiple genres of speech, including entertainment, interview, singing, play, movie, vlog, live broadcast, speech, drama, recitation and advertisement as well as real-world noise, strong and overlapped background speakers, significant variations in speaking styles, time-varying and cross-channel problems and long-short test scenarios. The CNSRC 2022 is open now. Please check the detailed information below about the challenge.
2022-Jan-30 CNSRC 2022 Registration System is open now.
2022-Feb-22 The Development Set for Task 2 SR is released now.
2022-Feb-22 The Baselines for Task 1 and Task 2 are open now.
2022-Mar-07 The Evaluation Set for Task 2 SR is released now.
2022-Apr-15 The submission deadline of system results is 16 May, 12:00 PM (UTC).
2022-May-03 The submission deadline for special session paper to Odyssey 2022 CNSRC special session is 20 May, 12:00 PM (UTC)..
2022-May-03 The submission deadline for system description to CNSRC 2022 is 30 May, 12:00 PM (UTC).
2022-May-16 The submission deadline of system results will be postponed to 18 May, 12:00 (UTC).
2022-May-21 Important: Notification to participants.
2022-June-30The Metadata for Task 2 SR is released now.
CNSRC 2022 defines two tasks: speaker verification (SV) and speaker retrieval (SR).
The objective of this task is to improve performance on the standard CN-Celeb evaluation set. According to the data used in system development, two tracks are defined for the SV task: fixed track and open track, shown as follows:
Fixed Track, where only the CN-Celeb training set is allowed for training/tuning the system.
Open Track, where any data sources can be used for developing the system, except the CN-Celeb evaluation set.
The purpose of this task is to find out the utterances spoken by a target speaker from a large data pool, given an enrollment data of the target speaker. Each target speaker forms a retrieval request. Each target individual has 1 enrolled utterance and 10 test utterances. The non-target set contains a large amount of utterances, coming from multiple sources. The target and non-target utterances are put together, and the participants are required to design their retrieval system to find top-10 candidates for each target speaker, and list them in descending order according to the LLR scores. Participants can use any data sources to train their system, except the CN-Celeb evaluation set.
The primary metric for SV performance evaluation is minimum Detection Cost Function (minDCF). Firstly define the detection cost function as follows:
where 𝑃𝑀𝑖𝑠𝑠 (𝜃) is the missing rate and 𝑃𝐹𝑎𝑙𝑠𝑒𝐴𝑙𝑎𝑟𝑚 (𝜃) is the false alarm rate with the decision threshold set to 𝜃. 𝐶𝑀𝑖𝑠𝑠 and 𝐶𝐹𝑎𝑙𝑠𝑒𝐴𝑙𝑎𝑟𝑚 are the cost of a missed detection and a spurious detection, respectively; 𝑃𝑇𝑎𝑟𝑔𝑒𝑡 is a prior probability of the specified target speaker. Then minDCF is obtained by minimizing 𝐶𝐷𝑒𝑡 (𝜃) with respect to 𝜃 and setting 𝐶𝑀𝑖𝑠𝑠 = 𝐶𝐹𝑎𝑙𝑠𝑒𝐴𝑙𝑎𝑟𝑚 = 1 and 𝑃𝑇𝑎𝑟𝑔𝑒𝑡 = 0.01:
Besides minDCF, the SV performance is also evaluated/analyzed in two ways:
• Equal Error Rate (EER). EER is defined as the balanced value of 𝑃𝑀𝑖𝑠𝑠 and 𝑃𝐹𝑎𝑙𝑠𝑒𝐴𝑙𝑎𝑟𝑚, formally 𝑃𝑀𝑖𝑠𝑠 (𝜃 ∗ ) = 𝑃𝐹𝑎𝑙𝑠𝑒𝐴𝑙𝑎𝑟𝑚 (𝜃 ∗ ), where 𝜃 ∗ is the decision threshold that achieves the balance. EER is used as the auxiliary metric and should be reported in the system description.
• Decision Error Tradeoff (DET) curve. DET curve is a curve within a two-dimensional space where the two axes represent 𝑃𝑀𝑖𝑠𝑠 and 𝑃𝐹𝑎𝑙𝑠𝑒𝐴𝑙𝑎𝑟𝑚 respectively. The DET curve reflects the trade-off between missing and false alarm, and presents the performance of the system at various operation points determined by 𝜃.
The performance of the SR system will be measured in terms of Mean Average Precision (mAP). For a single speaker 𝑖, suppose there are 𝑀 test utterances overall, and the system output maximum top-𝑁 candidates for each retrieval request. For the top-k case, the Precision is defined as:
The AP of top-𝑁 is defined as the averaged precision over the top-𝑘 (𝑘 = 1, 2, .., 𝑁) cases:
Then mAP is computed as the averaged AP over all the target speakers: retrieval awards of all the target speakers:
where 𝑆 is the number of target speakers. For the evaluation set of CNSRC 2022, the parameters are as follows: the number of target speakers 𝑆 = 5 in SV.dev and 𝑆 = 25 in SV.eval, the number of test utterances per target speaker 𝑀 = 10, and the SR system output maximum candidates 𝑁 = 10.
The evaluation toolkit can be downloaded from [Here].
In the fixed track, only the CN-Celeb training set is allowed to be used to perform system development. It contains 797 speakers of CN-Celeb1.dev and 1996 speakers of CN-Celeb2. Participants can download the dataset from OpenSRL.
In the open track, any data sources and tools can be used to develop the system.
In addition, to better verify the reliability and generalizability of each submission system, an extra validation test is set up by using a small validation set (SV.val) without labels.
Participants can obtain the SV.val set via clicking the following button and then signing the data user agreement. Once signed, the data download links will automatically send to your registration email.
Two datasets will be released: SR.dev and SR.eval. Each dataset contains two parts:
(1) Target speakers and associated enrollment data;
(2) Utterance pool that involves utterances of the target speakers as well as a large amount of non-target utterances. SR.dev will be provided to the participants for system development, while SR.eval will be released for system evaluation.
Participants can obtain the datasets via clicking the following button and then signing the data user agreement. Once signed, the data download links will automatically send to your registration email.
Participants must sign up for an evaluation account where they can perform various activities such as registering for the evaluation, signing the data user agreement, as well as uploading the submission and system description.
Once the account has been created, the registration can be performed online. The registration is free to all individuals and institutes. The regular case is that the registration takes effect immediately, but the organizers may check the registration information and ask the participants to provide additional information to validate the registration.
To sign up for an evaluation account, please click Quick Registration
The organizers prepared multiple baseline systems to demonstrate the process of training/evaluation required by the challenge. All the baseline systems are open-sourced.
For the fixed track SV system, three baselines are published with [Kaldi] [ASV-Subtools] [Sunine]. These baseline recipes can be easily adapted to develop an open-track system by involving more training data, except CN-Celeb.E.
Participants should submit their results via the submission system. Once the submission is completed, it will be shown in the Leaderboard, and all participants can check their positions. For each task and each track, participants can submit their results no more than 10 times.
All valid submissions are required to be accompanied with a system description, submitted via the submission system. All the system descriptions will be published at the web page of the CNSRC 2022 workshop.
In the system description, participants are allowed to hide their name and affiliation. The submission deadline for system description to CNSRC 2022 is 30 May, 12:00 PM (UTC). The template for system description can be downloaded here.
Participants are encouraged to formulate their system description as an Odyssey 2022 special session paper, submitted via EasyChair submission system.
The submitted paper will be subjected to the same review process as regular papers of Odyssey 2022. Considering the time cost of the review process, the submission system will start from 16 May, 12:00 PM (UTC) and end by 20 May, 12:00 PM (UTC). The template for Odyssey special session paper can be found at here.
Note that each accepted paper MUST be covered by a full registration of Odyssey 2022.
Note that ranks in the current Leaderboard is not the FINAL ranks . The FINAL Leaderboard will be announced in the CNSRC 2022 Workshop at Odyssey 2022.
All the participants in Task 1 SV tracks are required to submit the result on the extra validation set obtained with your FINAL submission system. The result on the validation set will not impact the rank of the Leaderboard. Note that this extra validation test is obligatory, otherwise your submission will be regarded as invalid.
The submission deadline of this extra validation test is 12 June, 12:00 (UTC)
|Mid Feb||Registration System Open.|
|Late Feb||Development Set for Task 2 SR Release.|
|Mid Mar||Evaluation Set for Task 2 SR Release.|
|Mid Mar||Submission System and Leader Board Open.|
|16 May||Deadline for Submission of Results.|
|20 May||Deadline for Submission of Special Session Paper to Odyssey 2022 CNSRC Special Session.|
|30 May||Deadline for Submission of System Description to CNSRC 2022.|
|27 Jun||CNSRC 2022 workshop at Odyssey 2022.|
Dong Wang, Tsinghua University, Beijing, China Qingyang Hong, Xiamen University, Xiamen, China Lantian Li, Tsinghua University, Beijing, China Wenqiang Du, Tsinghua University, Beijing, China Yang Zhang, Tsinghua University, Shenzhen, China Tao Jiang, TalentedSoft, Xiamen, China Hui Bu, AISHELL, Beijing, China Xin Xu, AISHELL, Beijing, China
Please contact e-mail firstname.lastname@example.org if you have any queries.
This work is supported by the National Natural Science Foundation of China (NSFC) under Grants No.61633013 and No.62171250.