Research

Publication

Ø  G. Chen, S. Chai, G. Wang, J. Du, W.-Q. Zhang, C. Weng, D. Su, D. Povey, J. Trmal, J. Zhang, M. Jin, S. Khudanpur, S. Watanabe, S. Zhao, W. Zou, X. Li, X. Yao, Y. Wang, Y. Wang, Z. You, and Z. Yan, “GigaSpeech: An evolving, multi-domain ASR corpus with 10,000 hours of transcribed audio,” in Proc. Interspeech, Brno, Czechia, Aug. 2021, pp. 3670-3674. doi: 10.21437/Interspeech.2021-1965.

Ø  J. Zhao, Z. Lv, A. Han, G. Wang, G. Shi, J. Kang, J. Yan, P. Hu, S. Huang, and W.-Q. Zhang, “The TNT team system descriptions of Cantonese and Mongolian for IARPA OpenASR20,” in Proc. Interspeech, Brno, Czechia, Aug. 2021, pp. 4344-4348. doi: 10.21437/Interspeech.2021-1063.

Ø  H. Yu, J. Zhao, S. Yang, Z. Wu, Y. Nie, and W.-Q. Zhang, “Language recognition based on unsupervised pretrained models,” in Proc. Interspeech, Brno, Czechia, Aug. 2021, pp. 3271-3275. doi: 10.21437/Interspeech.2021-807.

Ø  Y. Yan, X. Tan, B. Li, G. Zhang, T. Qin, S. Zhao, Y. Shen, W.-Q. Zhang, and T.-Y. Liu, “Adaptive text to speech for spontaneous style,” in Proc. Interspeech, Brno, Czechia, Aug. 2021, pp. 4668-4672. doi: 10.21437/Interspeech.2021-584.

Ø  L. Xue, K. Song, D. Wu, X. Tan, N. L. Zhang, T. Qin, W.-Q. Zhang, and T.-Y. Liu, “DeepRapper: Neural rap generation with rhyme and rhythm modeling,” in Proc. ACL-IJCNLP, Bangkok, Thailand, Aug. 2021, pp. 69-81. doi: 10.18653/v1/2021.acl-long.6.

Ø  Z. Zhao and W.-Q. Zhang, “End-to-end keyword search system based on attention mechanism and energy scorer for low resource languages,” Neural Networks, vol. 139, pp. 326-334, Jul. 2021. doi: 10.1016/j.neunet.2021.04.002.

Ø  K. He, Y. Shen, W.-Q. Zhang, and J. Liu, “Staged training strategy and multi-activation for audio tagging with noisy and sparse multi-label data,” in Proc. ICASSP, Barcelona, Spain, May 4-8, 2020, pp. 631-635.

Ø  J. Xie, R. Yan, S. Xiao, L. Peng, M. T. Johnson, and W.-Q. Zhang, “Dynamic temporal residual learning for speech recognition,” in Proc. ICASSP, Barcelona, Spain, May 4-8, 2020, pp. 7709-7713.

Ø  Z. Zhao and W.-Q. Zhang, “End-to-end keyword search based on attention and energy scorer for low resource languages,” in Proc. Interspeech, Shanghai, China, Oct. 25-29, 2020, pp. 2587-2591.

Ø  R. Li et al., “THUEE system for NIST SRE19 CTS challenge,” in Proc. Interspeech, Shanghai, China, Oct. 25-29, 2020, pp. 2232-2236.

Ø  S. Chai, W.-Q. Zhang, C. Lv, and Z. Yang, “An end-to-end model based on multiple neural networks with data augmentation for keyword spotting,” International Journal of Asian Language Processing, vol. 30, no. 2, Art. no. 2050006, Jun. 2020.

Ø  Z. Li, L. He, J. Li, L. Wang, and W.-Q. Zhang, “Towards discriminative representations and unbiased predictions: Class-specific angular softmax for speech emotion recognition,” in Proc. Interspeech, Graz, Austria, Sept. 15-19, 2019, pp. 1696-1700.

Ø  K. He, Y. Shen, and W.-Q. Zhang, “Hierarchical pooling structure for weakly labeled sound event detection,” in Proc. Interspeech, Graz, Austria, Sept. 15-19, 2019, pp. 3624-3628.

Ø  H. Yang and W.-Q. Zhang, “Music genre classification using duplicated convolutional layers in neural networks,” in Proc. Interspeech, Graz, Austria, Sept. 15-19, 2019, pp. 3382-3386.

Ø  Y. Shen, K. He, and W.-Q. Zhang, “Learning how to listen: A temporal-frequential attention model for sound event detection,” in Proc. Interspeech, Graz, Austria, Sept. 15-19, 2019, pp. 2563-2567.

Ø  C. Lu, Y. Liu, W.-Q. Zhang and S. Zhang, “Tightness of a new and enhanced semidefinite relaxation for MIMO detection,” SIAM Journal on Optimization, vol. 29, no. 1, pp. 719-742, Jan. 2019.

Ø  C. Lu, Z. Deng, W.-Q. Zhang, and S.-C. Fang, “Argument division based branch-and-bound algorithm for unit-modulus constrained complex quadratic programming,” Journal of Global Optimization, vol. 70, no. 1, pp. 171-187, Jan. 2018.

Ø  X.-K. Yang, L. He, D. Qu, and W.-Q. Zhang, “Semi-supervised minimum redundancy maximum relevance feature selection for audio classification,” Multimedia Tools and Applications, vol. 77, no. 1, pp. 713-739, Jan. 2018.

Ø  X. Yang, D. Qu, W.-L. Zhang, and W.-Q. Zhang, “An adapted data selection for deep learning-based audio segmentation in multi-genre broadcast channel,” Digital Signal Processing, vol. 81, pp. 8-15, Oct. 2018.

Ø  J. Kang, W.-Q. Zhang, W.-W. Liu, J. Liu, and M. T. Johnson, “Advanced recurrent network-based hybrid acoustic models for low resource speech recognition,” EURASIP Journal on Audio, Speech, and Music Processing , vol. 2018, Art. no. 6, Jul. 2018.

Ø  J. Kang, W.-Q. Zhang, and J. Liu, “Gated convolutional networks based hybrid acoustic models for low resource speech recognition,” in Proc. ASRU, Okinawa, Japan, Dec. 16-20, 2017, pp. 157-164.

Ø  Z.-Q. Lv, J. Kang, W.-Q. Zhang, and J. Liu, “An LSTM-CTC based verification system for proxy-word based OOV keyword search,” in Proc. ICASSP, New Orleans, USA, Mar. 5-9, 2017, pp. 5655-5659.

Ø  Y. Tian, L. He, M. Cai, W.-Q. Zhang, and J. Liu, “Deep neural networks based speaker modeling at different levels of phonetic granularity,” in Proc. ICASSP, New Orleans, USA, Mar. 5-9, 2017, pp. 5440-5444.

Ø  X.-K. Yang, D. Qu, W.-L. Zhang, and W.-Q. Zhang, “The NDSC transcription system for the 2016 multi-genre broadcast challenge,” in Proc. SLT, San Diego, USA, Dec. 13-16, 2016, pp. 273-278.

Ø  Z.-Q. Lv, M. Cai, W.-Q. Zhang, and J. Liu, “A novel discriminative score calibration method for keyword search,” in Proc. Interspeech, San Francisco, USA, Sept. 8-12, 2016, pp. 745-749.

Ø  Y. Tian, M. Cai, H. Liang, W.-Q. Zhang, and J. Liu, “Improving deep neural networks based speaker verification using unlabeled data,” in Proc. Interspeech, San Francisco, USA, Sept. 8-12, 2016, pp. 1863-1867.

Ø  W.-W. Liu, M. Cai, W.-Q. Zhang, J. Liu, and M. T. Johnson, “Discriminative boosting algorithm for diversified front-end phonotactic language recognition,” Journal of Signal Processing Systems, vol. 82, no. 2, pp. 229-239, Feb. 2016.

Ø  Z.-Q. Lv, M. Cai, C. Lu, J. Kang, L.-K. Hui, W.-Q. Zhang, and J. Liu, “Improved system fusion for keyword search,” in Proc. ASRU, Scottsdale, USA, Dec. 13-17, 2015, pp. 231-236.

Ø  M. Cai, Z.-Q. Lv, B.-L. Song, Y.-Z. Shi, W.-L. Wu, C. Lu, W.-Q. Zhang, and J. Liu, “The THUEE system for the OpenKWS14 keyword search evaluation,” in Proc. ICASSP, Brisbane, Australia, Apr. 19-24, 2015, pp. 4734-4738.

Ø  J. Kang, C. Lu, M. Cai, W.-Q. Zhang, and J. Liu, “Neuron sparseness versus connection sparseness in deep neural network for large vocabulary speech recognition,” in Proc. ICASSP, Brisbane, Australia, Apr. 19-24, 2015, pp. 4954-4958.

Ø  Z.-Y. Li, W.-Q. Zhang, and J. Liu, “Multi-resolution time frequency feature and complementary combination for short utterance speaker recognition,” Multimedia Tools and Applications, vol. 74, no. 74, pp. 937-953, Feb. 2015.

Ø  Y.-Z. Shi, W.-Q. Zhang, M. Cai, and J. Liu, “Variance regularization of RNNLM for speech recognition,” in Proc. ICASSP, Florence, Italy, May 4-9, 2014, pp. 4931-4935.

Ø  W.-W. Liu, W.-Q. Zhang, Y.-Z. Shi, A. Ji, J. Xu, and J. Liu, “Improved phonotactic language recognition based on RNN feature reconstruction,” in Proc. ICASSP, Florence, Italy, May 4-9, 2014, pp. 5359-5363.

Ø  W.-W. Liu, W.-Q. Zhang, and J. Liu, “Phonotactic language identification based on time-gap-weighted lattice kernels,” in Proc. Interspeech, Singapore, Sept. 14-18, 2014, pp. 3022-3026.

Ø  W.-L. Zhang, D. Qu, W.-Q. Zhang, and B.-C. Li, “Speaker adaptation based on sparse and low-rank eigenphone matrix estimation,” in Proc. Interspeech, Singapore, Sept. 14-18, 2014, pp. 2792-2796.

Ø  W.-Q. Zhang, W.-W. Liu, Z.-Y. Li, Y.-Z. Shi, and J. Liu, “Spoken language recognition based on gap-weighted subsequence kernels,” Speech Communication, vol. 60, pp. 1-12, May 2014.

Ø  Y.-Z. Shi, W.-Q. Zhang, J. Liu, and M. Johnson, “Efficient one-pass decoding with NNLM for speech recognition,” IEEE Signal Processing Letters, vol. 21, no. 4, pp. 377-381, Apr. 2014.

Ø  Y.-Z. Shi, W.-Q. Zhang, M. Cai, and J. Liu, “Temporal kernel neural network language model,” in Proc. ICASSP, Vancouver, Canada, May 26-31, 2013, pp. 8247-8251.

Ø  W.-Q. Zhang, Z.-Y. Li, W. Liu, and J. Liu, “THU-EE system fusion for the NIST 2012 speaker recognition evaluation,” in Proc. Interspeech Lyon, France, Aug. 25-29, 2013, pp. 2474-2478.

Ø  W. Liu, W.-Q. Zhang, Zhang, Z.-Y. Li, and J. Liu. “Parallel absolute-relative feature based phonotactic language recognition,” in Proc. Interspeech, Lyon, France, Aug. 25-29, 2013, pp. 59-63.

Ø  W.-L. Zhang, W.-Q. Zhang, and B.-C. Li, “Compact acoustic modeling based on acoustic manifold using a mixture of factor analyzers,” in ASRU, Olomouc, Czech Republic, Dec. 8-12, 2013, pp. 37-42.

Ø  W.-L. Zhang, D. Qu, W.-Q. Zhang, and B.-C. Li. “Rapid speaker adaptation using compressive sensing,” Speech Communication, vol. 55, no. 10, pp. 950-963, Nov.-Dec. 2013.

Ø  Z.-Y. Li, W.-Q. Zhang, L. He, and J. Liu, “Complementary combination in i-vector level for language recognition,” in Proc. Odyssey, Singapore, Jun. 25-28, 2012, pp. 334-337.

Ø  W.-L. Zhang, W.-Q. Zhang, B.-C. Li, D. Qu, and M. T. Johnson, “Bayesian speaker adaptation based on a new hierarchical probabilistic model,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 7, pp. 2002-2015, Sept. 2012.

Ø  Y.-Z. Shi, W.-Q. Zhang, and J. Liu, “Robust audio fingerprinting based on local spectral luminance maxima scheme,” in Proc. Interspeech, Florence, Italy, Aug. 27-31, 2011, pp. 2485-2488.

Ø  W.-L. Zhang, W.-Q. Zhang, and B.-C. Li, “Speaker adaptation based on speaker-dependent eigenphone estimation,” in Proc. ASRU, Hawaii, USA, Dec. 11-15, 2011, pp. 48-52.

Ø  W.-Q. Zhang, L. He, Y. Deng, J. Liu, and M. Johnson, “Time-frequency cepstral feature and constrained heteroscedastic linear discriminant analysis for language recognition,” IEEE Transactions on Audio, Speech and Language Processing,  vol. 19, no. 2, pp. 266-272, Feb. 2011.

Ø  W.-Q. Zhang, Y. Deng, L. He, and J. Liu, “Variant time-frequency cepstral features for speaker recognition,” in Proc. Interspeech, Makuhari, Japan, Sept. 26-30, 2010, pp. 2122-2125.

Ø  S. Meng, W.-Q. Zhang, and J. Liu, “Combining Chinese spoken term detection systems via side-information conditioned linear logistic regression,” in Proc. Interspeech, Makuhari, Japan, Sept. 26-30, 2010, pp. 685-688.

Ø  J. Yang, J. Liu, and W.-Q. Zhang, “A fast query by humming system based on notes,” in Proc. Interspeech, Makuhari, Japan, Sept. 26-30, 2010, pp. 2898-2901.

Ø  W.-Q. Zhang, Y. Shan, and J. Liu, “Multiple background models for speaker verification,” in Proc. Odyssey, Brno, Czech Republic, Jun. 28-Jul. 1, 2010, pp. 47-51.

Ø  W.-Q. Zhang, T. Hou, and J. Liu, “Discriminative score fusion for language identification,” Chinese Journal of Electronics, vol. 19, no. 19, pp. 124-128, Jan. 2010.

Ø  W.-Q. Zhang and J. Liu, “An equalized heteroscedastic linear discriminant analysis algorithm,” IEEE Signal Processing Letters, vol. 15, pp. 585-588, 2008.

Ø  R. Tao, W.-Q. Zhang, and E.-Q. Chen, “Two-stage method for joint time delay and Doppler shift estimation,” IET Radar, Sonar & Navigation, vol. 2, no. 1, pp. 71-77, Feb. 2008.

Ø  R. Tao, B. Deng, W.-Q. Zhang, and Y. Wang, “Sampling and sampling rate conversion of band limited signals in the fractional Fourier transform domain,” IEEE Transactions on Signal Processing, vol. 56, no. 1, pp. 158-171, Jan. 2008.

Ø  W.-Q. Zhang and J. Liu, “Two-stage method for specific audio retrieval,” in Proc. ICASSP, Hawaii, USA, Apr. 15-20, 2007, pp. IV-85-88.