清华大学 Wei-Qiang Zhang--Home--Selected Conference Publications

Wei-Qiang Zhang

Name (Simplified Chinese):Wei-Qiang Zhang
Name (English):Wei-Qiang Zhang
E-Mail:
School/Department:Department of Electronic Engineering
Business Address:Room 5-111, Rohm Building
Contact Information:+86-10-62781847
Degree:Doctoral degree
Alma Mater:Tsinghua University
Teacher College:DZGCX
Discipline:Signal and Information Processing

Selected Conference Publications

Y. Yang, Z. Song, J. Zhuo, M. Cui, J. Li, B. Yang, Y. Du, Z. Ma, X. Liu, Z. Wang, K. Li, S. Fan, K. Yu, W.-Q. Zhang, G. Chen, and X. Chen, “GigaSpeech 2: An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement,” in Proc. ACL, 2025, pp. 2673–2686. doi: 10.18653/v1/2025.acl-long.135.
K. Jia, J. Li, K. Li, and W.-Q. Zhang, “Whisper-based multilingual Alzheimer’s disease detection and improvements for low-resource language,” in Proc. Interspeech, 2025.
Q. Sun, Z. Qiu, Y. Pu, J. Li, X. Chen, and W.-Q. Zhang, “PPGs-BERT: Leveraging phoneme sequence and BERT for Alzheimer’s disease detection from spontaneous speech,” in Proc. Interspeech, 2025.
Y. Pu, X. Liu, G. Zhang, Z. Yan, W.-Q. Zhang, and X. Chen, “Empowering large language models for end-to-end speech translation leveraging synthetic data,” in Proc. Interspeech, 2025.
W. Liang, R. Zhang, X. Zhang, Y. Ma, and W.-Q. Zhang, “DepressGEN: Synthetic data generation framework for depression detection,” in Proc. Interspeech, 2025.
Y. Pu and W.-Q. Zhang, “Integrating pause information with word embeddings in language models for Alzheimer’s disease detection from spontaneous speech,” in Proc. ICASSP, 2025. doi: 10.1109/ICASSP49660.2025.10888563.
Z. Wan, Z. Qiu, Y. Liu, and W.-Q. Zhang, “Metadata-enhanced speech emotion recognition: Augmented residual integration and co-attention in two-stage fine-tuning,” in Proc. ICASSP, 2025. doi: 10.1109/ICASSP49660.2025.10890812.
Z. Chen, Y.-F. Shao, Y. Ma, M. Wei, L. Zhang, and W.-Q. Zhang, “Improving acoustic scene classification in low-resource conditions,” in Proc. ICASSP, 2025. doi: 10.1109/ICASSP49660.2025.10888928.
A. Jiang, X. Zheng, B. Han, Y. Qiu, P. Fan, W.-Q. Zhang, L. Cheng, and J. Liu, “Adaptive prototype learning for anomalous sound detection with partially known attributes,” in Proc. ICASSP, 2025. doi: 10.1109/ICASSP49660.2025.10889514.
B. Han, W. Huang, Z. Chen, A. Jiang, P. Fan, L. Cheng, Z. Lv, J. Liu, W.-Q. Zhang, and Y. Qian, “Data-efficient low-complexity acoustic scene classification via distilling and progressive pruning,” in Proc. ICASSP, 2025. doi: 10.1109/ICASSP49660.2025.10890296.
K. Pang, M. Bai, J. Yang, W.-Q. Zhang, M. Jiang, and Y. Huang, “Winstega: An adaptive robust enhancement framework for generative linguistic steganography,” in Proc. ICASSP, 2025. doi: 10.1109/ICASSP49660.2025.10888944.
B. Han, Z. Lv, A. Jiang, W. Huang, Z. Chen, Y. Deng, J. Ding, C. Lu, W.-Q. Zhang, P. Fan, J. Liu, and Y. Qian, “Exploring large scale pre-trained models for robust machine anomalous sound detection,” in Proc. ICASSP, 2024, pp. 1327–1330. doi: 10.1109/ICASSP48485.2024.10447183.
J. Li and W.-Q. Zhang, “Whisper-based transfer learning for Alzheimer disease classification: Leveraging speech segments with full transcripts as prompts,” in Proc. ICASSP, 2024, pp. 11211–11215. doi: 10.1109/ICASSP48485.2024.10448004.
H. Wang, G. Hu, G. Lin, W.-Q. Zhang, and J. Li, “Simul-Whisper: Attention-guided streaming Whisper with truncation detection,” in Proc. Interspeech, 2024, pp. 4483–4487. doi: 10.21437/Interspeech.2024-1814.
J. Li, Y. Pu, Q. Sun, and W.-Q. Zhang, “Improving Whisper’s recognition performance for under-represented language Kazakh leveraging unpaired speech and text,” in Proc. Interspeech, 2024, pp. 2514–2518. doi: 10.21437/Interspeech.2024-1790.
A. Jiang, B. Han, Z. Lv, Y. Deng, W.-Q. Zhang, X. Chen, Y. Qian, J. Liu, and P. Fan, “AnoPatch: Towards better consistency in machine anomalous sound detection,” in Proc. Interspeech, 2024, pp. 107–111. doi: 10.21437/Interspeech.2024-1761.
X. Zheng, A. Jiang, B. Han, Y. Qian, P. Fan, J. Liu, and W.-Q. Zhang, “Improving anomalous sound detection via low-rank adaptation fine-tuning of pre-trained audio models,” in Proc. SLT, 2024, pp. 979–984. doi: 10.1109/SLT61566.2024.10832335.
A. Jiang, Y. Shi, P. Fan, W.-Q. Zhang, and J. Liu, “CoopASD: Cooperative machine anomalous sound detection with privacy concerns,” in Proc. GLOBECOM, 2024, pp. 346–351. doi: 10.1109/GLOBECOM52923.2024.10901774.
X. Chen, Y. Pu, J. Li, and W.-Q. Zhang, “Cross-lingual Alzheimer’s disease detection based on paralinguistic and pre-trained features,” in Proc. ICASSP, 2023. doi: 10.1109/ICASSP49357.2023.10095522.
A. Jiang, W.-Q. Zhang, Y. Deng, P. Fan, and J. Liu, “Unsupervised anomaly detection and localization of machine audio: A GAN-based approach,” in Proc. ICASSP, 2023. doi: 10.1109/ICASSP49357.2023.10096813.
H. Wang, S. Wang, W.-Q. Zhang, and J. Bai, “DistilXLSR: A light weight cross-lingual speech representation model,” in Proc. Interspeech, 2023, pp. 2273–2277. doi: 10.21437/Interspeech.2023-1444.
H. Wang, S. Wang, W.-Q. Zhang, H. Suo, and Y. Wan, “Task-agnostic structured pruning of speech representation models,” in Proc. Interspeech, 2023, pp. 231–235. doi: 10.21437/Interspeech.2023-1442.
Z. Cui, W. Wu, C. Zhang, W.-Q. Zhang, and J. Wu, “Transferring speech-generic and depression-specific knowledge for Alzheimer’s disease detection,” in Proc. ASRU, 2023. doi: 10.1109/ASRU57964.2023.10389785.
Y. Wang, C. Tang, Z. Ma, Z. Zheng, X. Chen, and W.-Q. Zhang, “Exploring effective distillation of self-supervised speech models for automatic speech recognition,” in Proc. ASRU, 2023. doi: 10.1109/ASRU57964.2023.10389746.
Q. Hou, A. Jiang, W.-Q. Zhang, P. Fan, and J. Liu, “Decoupling detectors for scalable anomaly detection in AIoT systems with multiple machines,” in Proc. GLOBECOM, 2023, pp. 5943–5948. doi: 10.1109/GLOBECOM54140.2023.10436800.
J. Zhao, H. Wang, J. Li, S. Chai, G. Wang, G. Chen, and W.-Q. Zhang, “The THUEE system description for the IARPA OpenASR21 challenge,” in Proc. Interspeech, 2022. doi: 10.21437/Interspeech.2022-269.
J. Zhao, G. Shi, G.-B. Wang, and W.-Q. Zhang, “Automatic speech recognition for low-resource languages: The THUEE systems for the IARPA OpenASR20 evaluation,” in Proc. ASRU, 2021, pp. 335–341. doi: 10.1109/ASRU51503.2021.9688260.
L. Xue, K. Song, D. Wu, X. Tan, N. L. Zhang, T. Qin, W.-Q. Zhang, and T.-Y. Liu, “DeepRapper: Neural rap generation with rhyme and rhythm modeling,” in Proc. ACL, 2021, pp. 69-81. doi: 10.18653/v1/2021.acl-long.6.
G. Chen, S. Chai, G. Wang, J. Du, W.-Q. Zhang, C. Weng, D. Su, D. Povey, J. Trmal, J. Zhang, M. Jin, S. Khudanpur, S. Watanabe, S. Zhao, W. Zou, X. Li, X. Yao, Y. Wang, Y. Wang, Z. You, and Z. Yan, “GigaSpeech: An evolving, multi-domain ASR corpus with 10,000 hours of transcribed audio,” in Proc. Interspeech, 2021, pp. 3670-3674. doi: 10.21437/Interspeech.2021-1965.
J. Zhao, Z. Lv, A. Han, G. Wang, G. Shi, J. Kang, J. Yan, P. Hu, S. Huang, and W.-Q. Zhang, “The TNT team system descriptions of Cantonese and Mongolian for IARPA OpenASR20,” in Proc. Interspeech, 2021, pp. 4344-4348. doi: 10.21437/Interspeech.2021-1063.
H. Yu, J. Zhao, S. Yang, Z. Wu, Y. Nie, and W.-Q. Zhang, “Language recognition based on unsupervised pretrained models,” in Proc. Interspeech, 2021, pp. 3271-3275. doi: 10.21437/Interspeech.2021-807.
Y. Yan, X. Tan, B. Li, G. Zhang, T. Qin, S. Zhao, Y. Shen, W.-Q. Zhang, and T.-Y. Liu, “Adaptive text to speech for spontaneous style,” in Proc. Interspeech, 2021, pp. 4668-4672. doi: 10.21437/Interspeech.2021-584.
K. He, Y. Shen, W.-Q. Zhang, and J. Liu, “Staged training strategy and multi-activation for audio tagging with noisy and sparse multi-label data,” in Proc. ICASSP, 2020, pp. 631-635. doi: 10.1109/ICASSP40776.2020.9053776.
J. Xie, R. Yan, S. Xiao, L. Peng, M. T. Johnson, and W.-Q. Zhang, “Dynamic temporal residual learning for speech recognition,” in Proc. ICASSP, 2020, pp. 7709-7713. doi: 10.1109/ICASSP40776.2020.9054653.
Z. Zhao and W.-Q. Zhang, “End-to-end keyword search based on attention and energy scorer for low resource languages,” in Proc. Interspeech, 2020, pp. 2587-2591. doi: 10.21437/Interspeech.2020-2613.
R. Li, T. Liang, D. Song, Y. Liu, Y. Wu, C. Xu, P. Ouyang, X. Zhang, X. Chen, W.-Q. Zhang, S. Yin, and L. He, “THUEE system for NIST SRE19 CTS challenge,” in Proc. Interspeech, 2020, pp. 2232-2236. doi: 10.21437/Interspeech.2020-1245.
Z. Li, L. He, J. Li, L. Wang, and W.-Q. Zhang, “Towards discriminative representations and unbiased predictions: Class-specific angular softmax for speech emotion recognition,” in Proc. Interspeech, 2019, pp. 1696-1700. doi: 10.21437/Interspeech.2019-1683.
K. He, Y. Shen, and W.-Q. Zhang, “Hierarchical pooling structure for weakly labeled sound event detection,” in Proc. Interspeech, 2019, pp. 3624-3628. doi: 10.21437/Interspeech.2019-2049.
H. Yang and W.-Q. Zhang, “Music genre classification using duplicated convolutional layers in neural networks,” in Proc. Interspeech, 2019, pp. 3382-3386. doi: 10.21437/Interspeech.2019-1298.
Y. Shen, K. He, and W.-Q. Zhang, “Learning how to listen: A temporal-frequential attention model for sound event detection,” in Proc. Interspeech, 2019, pp. 2563-2567. doi: 10.21437/Interspeech.2019-2045.
J. Kang, W.-Q. Zhang, and J. Liu, “Gated convolutional networks based hybrid acoustic models for low resource speech recognition,” in Proc. ASRU, 2017, pp. 157-164. doi: 10.1109/ASRU.2017.8268930.
Z.-Q. Lv, J. Kang, W.-Q. Zhang, and J. Liu, “An LSTM-CTC based verification system for proxy-word based OOV keyword search,” in Proc. ICASSP, 2017, pp. 5655-5659. doi: 10.1109/ICASSP.2017.7953239.
Y. Tian, L. He, M. Cai, W.-Q. Zhang, and J. Liu, “Deep neural networks based speaker modeling at different levels of phonetic granularity,” in Proc. ICASSP, 2017, pp. 5440-5444. doi: 10.1109/ICASSP.2017.7953196.
X.-K. Yang, D. Qu, W.-L. Zhang, and W.-Q. Zhang, “The NDSC transcription system for the 2016 multi-genre broadcast challenge,” in Proc. SLT, 2016, pp. 273-278. doi: 10.1109/SLT.2016.7846276.
Z.-Q. Lv, M. Cai, W.-Q. Zhang, and J. Liu, “A novel discriminative score calibration method for keyword search,” in Proc. Interspeech, 2016, pp. 745-749. doi: 10.21437/Interspeech.2016-606.
Y. Tian, M. Cai, H. Liang, W.-Q. Zhang, and J. Liu, “Improving deep neural networks based speaker verification using unlabeled data,” in Proc. Interspeech, 2016, pp. 1863-1867. doi: 10.21437/Interspeech.2016-614.
Z.-Q. Lv, M. Cai, C. Lu, J. Kang, L.-K. Hui, W.-Q. Zhang, and J. Liu, “Improved system fusion for keyword search,” in Proc. ASRU, 2015, pp. 231-236. doi: 10.1109/ASRU.2015.7404799.
M. Cai, Z.-Q. Lv, B.-L. Song, Y.-Z. Shi, W.-L. Wu, C. Lu, W.-Q. Zhang, and J. Liu, “The THUEE system for the OpenKWS14 keyword search evaluation,” in Proc. ICASSP, 2015, pp. 4734-4738. doi: 10.1109/ICASSP.2015.7178869.
J. Kang, C. Lu, M. Cai, W.-Q. Zhang, and J. Liu, “Neuron sparseness versus connection sparseness in deep neural network for large vocabulary speech recognition,” in Proc. ICASSP, 2015, pp. 4954-4958. doi: 10.1109/ICASSP.2015.7178913.
Y.-Z. Shi, W.-Q. Zhang, M. Cai, and J. Liu, “Variance regularization of RNNLM for speech recognition,” in Proc. ICASSP, 2014, pp. 4931-4935. doi: 10.1109/ICASSP.2014.6854532.
W.-W. Liu, W.-Q. Zhang, Y.-Z. Shi, A. Ji, J. Xu, and J. Liu, “Improved phonotactic language recognition based on RNN feature reconstruction,” in Proc. ICASSP, 2014, pp. 5359-5363. doi: 10.1109/ICASSP.2014.6854619.
W.-W. Liu, W.-Q. Zhang, and J. Liu, “Phonotactic language identification based on time-gap-weighted lattice kernels,” in Proc. Interspeech, 2014, pp. 3022-3026. doi: 10.21437/Interspeech.2014-606.
W.-L. Zhang, D. Qu, W.-Q. Zhang, and B.-C. Li, “Speaker adaptation based on sparse and low-rank eigenphone matrix estimation,” in Proc. Interspeech, 2014, pp. 2792-2796. doi: 10.21437/Interspeech.2014-496.
Z.-Y. Li, W.-Q. Zhang, W.-W. Liu, Y. Tian, and J. Liu, “Text-independent speaker verification via state alignment,” in Proc. Odyssey, 2014, pp. 68–72. doi: 10.21437/Odyssey.2014-10.
Y.-Z. Shi, W.-Q. Zhang, M. Cai, and J. Liu, “Temporal kernel neural network language model,” in Proc. ICASSP, 2013, pp. 8247-8251. doi: 10.1109/ICASSP.2013.6639273.
W.-Q. Zhang, Z.-Y. Li, W. Liu, and J. Liu, “THU-EE system fusion for the NIST 2012 speaker recognition evaluation,” in Proc. Interspeech, 2013, pp. 2474-2478. doi: 10.21437/Interspeech.2013-413.
W. Liu, W.-Q. Zhang, Zhang, Z.-Y. Li, and J. Liu. “Parallel absolute-relative feature based phonotactic language recognition,” in Proc. Interspeech, 2013, pp. 59-63. doi: 10.21437/Interspeech.2013-38.
W.-L. Zhang, W.-Q. Zhang, and B.-C. Li, “Compact acoustic modeling based on acoustic manifold using a mixture of factor analyzers,” in Proc. ASRU, 2013, pp. 37-42. doi: 10.1109/ASRU.2013.6707702.
Z.-Y. Li, W.-Q. Zhang, L. He, and J. Liu, “Complementary combination in i-vector level for language recognition,” in Proc. Odyssey, 2012, pp. 334-337. Available: https://www.isca-archive.org/odyssey_2012/li12_odyssey.html
Y.-Z. Shi, W.-Q. Zhang, and J. Liu, “Robust audio fingerprinting based on local spectral luminance maxima scheme,” in Proc. Interspeech, 2011, pp. 2485-2488. doi: 10.21437/Interspeech.2011-636.
W.-L. Zhang, W.-Q. Zhang, and B.-C. Li, “Speaker adaptation based on speaker-dependent eigenphone estimation,” in Proc. ASRU, 2011, pp. 48-52. doi: 10.1109/ASRU.2011.6163904.
W.-Q. Zhang, Y. Deng, L. He, and J. Liu, “Variant time-frequency cepstral features for speaker recognition,” in Proc. Interspeech, 2010, pp. 2122-2125. doi: 10.21437/Interspeech.2010-160.
S. Meng, W.-Q. Zhang, and J. Liu, “Combining Chinese spoken term detection systems via side-information conditioned linear logistic regression,” in Proc. Interspeech, 2010, pp. 685-688. doi: 10.21437/Interspeech.2010-260.
J. Yang, J. Liu, and W.-Q. Zhang, “A fast query by humming system based on notes,” in Proc. Interspeech, 2010, pp. 2898-2901. doi: 10.21437/Interspeech.2010-753.
W.-Q. Zhang, Y. Shan, and J. Liu, “Multiple background models for speaker verification,” in Proc. Odyssey, 2010, pp. 47-51. Available: https://www.isca-archive.org/odyssey_2010/zhang10_odyssey.html
W.-Q. Zhang and J. Liu, “Two-stage method for specific audio retrieval,” in Proc. ICASSP, 2007, pp. IV-85-88. doi: 10.1109/ICASSP.2007.367169.

Faculty Profile