学术报告:Speech Recognition Development: A Dataset and Benchmark Perspective
报告人:陈果果 博士,SpeechColab
时间:2021/12/14 [周二] 下午 2:00-3:00
地点:腾讯会议 842 5204 8009
题目: Speech Recognition Development: A Dataset and Benchmark Perspective
摘要: The previous decade saw remarkable development in automatic speech recognition technologies. While there are a lot of technical articles explaining the improvements from the model point of view, the impact of datasets and benchmarks to speech recognition development is not well studied. In this talk, we first investigate the contribution of datasets and benchmarks to speech recognition development. We then introduce a large scale English speech recognition dataset named GigaSpeech. We will demonstrate the data creation pipeline, as well as initial benchmarks on this dataset. Finally, we close this talk by outlining our on-going work for speech recognition benchmarks.
报告人简介: Dr. Chen holds a Ph.D. degree in Electrical and Computer Engineering from the Johns Hopkins University and a B.Eng. degree in Electronic Engineering from Tsinghua University. During his Ph.D., he spent 5 years at the Center for Language and Speech Processing, Johns Hopkins University, where he worked on various aspects of speech recognition and was one of the key contributors to the open source speech recognition toolkit Kaldi, and the open source deep learning toolkit CNTK. He was the author of LibriSpeech, one of the most cited (2,500+ Google Scholar citations) speech recognition dataset/benchmark. He also spent two summers at Google Inc. where he developed the prototype of Android's wake word detection engine for "Okay Google", serving billions of Android/Google Home users. After graduation, Dr. Chen co-founded KITT.AI, a CBInsights AI 100 company in 2017, which was acquired by Baidu. In 2020, Dr. Chen co-founded Seasalt.ai. Dr. Chen also initiated SpeechColab, a volunteer organization for the speech recognition community, which released one of the largest speech recognition dataset named GigaSpeech, covering 10,000 hours of transcribed audio and 33,000 hours of total audio for speech recognition research.