学术报告:Investigating Sequence-Level Normalisation for CTC-Like End-to-End ASR
报告人:赵泽宇,爱丁堡大学
时间:2022/06/02 [周四] 下午 17:00-18:00
地点:腾讯会议 352-212-245
题目: Investigating Sequence-Level Normalisation for CTC-Like End-to-End ASR
摘要:
End-to-end Automatic Speech Recognition (E2E ASR) significantly simplifies the training process of an ASR model. Connectionist Temporal Classification (CTC) is one of the most popular methods for E2E ASR training. Implicitly, CTC has a unique topology which is very useful for sequence modelling. However, we find that by changing to another topology, we can make it even more effective. In this paper, we propose a new CTC-like method, for E2E ASR training, by modifying the topology of original CTC, so that the well-known abuse of the blank label in CTC can be resolved theoretically. As we change the topology, a normalisation term is necessary, which makes the form of the final loss function similar to Maximum Mutual Information (MMI); we hence name our method MMI-CTC. In addition to maximising the posterior probability of the target sequence, the normalisation enables models to explicitly minimise the probability of competing hypothesis at the word sequence level. Our experimental results show that MMI-CTC is more efficient than CTC, and that the normalisation is essential for sequence training.
报告人简介: Zeyu Zhao is a PhD student at Centre for Speech Technology Research, School of Informatics, University of Edinburgh, supervised by Dr. Peter Bell. He got his master's degree in 2020 from Department of Electronic Engineering, Tsinghua University, supervised by Dr. Wei-Qiang Zhang and his bachelor's degree from Beijing Institute of Technology in 2017. His research interest lies in end-to-end speech recognition.