学术报告:Advancing Transformer Transducer for Speech Recognition on Large-Scale Dataset: Efficient Streaming and LM Adaptation
报告人:陈谐 副教授,上海交通大学
时间:2021/11/30 [周二] 下午 2:00-3:00
地点:腾讯会议 842 5204 8009
题目: Advancing Transformer Transducer for Speech Recognition on Large-Scale Dataset: Efficient Streaming and LM Adaptation
摘要: Recent years have witnessed the great success of end-to-end (E2E) based models in speech recognition, especially the neural Transducer based models due to their streaming capability and promising performance. However, in order to replace the traditional hybrid system, which is applied widely and still the main-stream system in the speech community , there are still several key challenging issues to be addressed. Among these challenges, efficient streaming and domain adaptation are two essential factors to handle for developping E2E based ASR models. In this talk, I will introduce our recent effort on these two aspects on neural Transducer models. We proposed an approach called "attention mask is all you need to design" for efficient training and streaming transformer-transducer model. In addition, we designed a novel model architecture, "factorized neural transducer", for efficient language model adaptation. The experiment on large scale data set (65k hours) demonstrates the effectiveness of these two proposed approaches with controllable and low latency, as well as significant WER improvement from LM adaptation on text data.
报告人简介: Xie Chen is currently an Tenure-Track Associate Professor in the Department of Computer Science and Engineering at Shanghai Jiao Tong University, China. He obtained his Bachelor degree in Electronic Engineering department from Xiamen University in 2009, Master degree in the Electronic Engineering department from Tsinghua University in 2012 and PhD degree in the information engineering department in Cambridge University (U.K.) in 2017. Prior to joining SJTU, he worked in Cambridge University as a Research Associate from 2017 to 2018, and the speech and language research group in Microsoft as a researcher form 2018 to 2021. His main research interest lies in deep learning, especially its application on speech processing, including speech recognition and synthesis.