CN-Celeb-AV is a multi-genre audio-visual person recognition dataset covering 11 different genres in the real world, collected from multiple Chinese open media sources.
1,136
SpeakersCN-Celeb-AV contains speech from Chinese celebrities.
420,000 +
UtterancesCN-Celeb-AV covers multiple genres of speech, including entertainment, interview, singing, play, movie, vlog, live broadcast, speech, drama, recitation and advertisement.
650 +
HoursCN-Celeb-AV consists of both full-modality and partial-modality challenge which meets the scenarios of most real applications.
A development set with full-modality information, contains both audio and visual information
An evaluation set with full-modality information, contains both audio and visual information
An evaluation set with partial-modality information, contains some segments whose audio or visual information is corrupted or fully lost
The dataset consists of three subsets, Dev-F, Eval-F and Eval-P. For each subset, we provide video and audio files and speaker meta-data. There is no overlap among the three subsets. Dev-F contains more than 93,000 segments from 689 Chinese celebrities, Eval-F contains more than 17,000 segments from 197 Chinese celebrities, and Eval-P contains more than 308,000 segments from 250 Chinese celebrities.
All the resources contained in the dataset are free for researchers. The copyright remains with the original owners of the audio/video.
No commerical usage is permitted.
This work is supported by the National Natural Science Foundation of China (NSFC) under Grants No.62171250.
CN-Celeb is a multi-genre dataset covering 11 different genres in real world, collected from multiple Chinese open media sources.
3,000
SpeakersCN-Celeb contains speech from Chinese celebrities.
600,000 +
UtterancesCN-Celeb covers multiple genres of speech, including entertainment, interview, singing, play, movie, vlog, live broadcast, speech, drama, recitation and advertisement.
1,200 +
Hours
CN-Celeb consists of complex long-short challenge which meets the scenarios of most
real applications.
The dataset consists of two subsets, CN-Celeb1 and CN-Celeb2. For each subset, we provide audio files and speaker meta-data. There is no overlap between the two subsets. CN-Celeb1 contains more than 125,000 utterances from 997 Chinese celebrities, and CN-Celeb2 contains more than 520,000 utterances from 1,996 Chinese celebrities.
All the resources contained in the dataset are free for research institutes and individuals. The copyright remains with the original owners of the audio/video. No commerical usage is permitted.
Publications based on the dataset welcome to cite the following papers:
Y.Fan, J.W.Kang, L.T.Li, K.C.Li, H.L.Chen, S.T.Cheng, P.Y.Zhang, Z.Y.Zhou, Y.Q.Cai, D.Wang*
CN-Celeb: A Challenging Chinese Speaker Recognition Dataset, ICASSP, 2020
L.T.Li, R.Q.Liu, J.W.Kang, Y.Fan, H.Cui, Y.Q.Cai, R.Vipperla, T.F.Zheng, D.Wang*
CN-Celeb: multi-genre speaker recognition, Speech Communication, 2022
We are hosting the first CN-Celeb Speaker Recognition Challenge (CNSRC) at Odyssey 2022 (The Speaker and Language Recognition Workshop). CNSRC aims to evaluate how well the current speaker recognition methods work in real world scenarios, usually with in-the-wild complexity and real-time processing speed. CNSRC consists of two parts, an evaluation challenge and an accompanying workshop. The challenge website can be found here and the workshop website can be found here .
This work is supported by the National Natural Science Foundation of China (NSFC) under Grants No.61633013 and No.62171250.