Paper
8 May 2024 Chinese multi-dialect speech recognition based on instruction tuning
Timin Ding, Kai Sun, Xu Zhang, Jian Yu, Degen Huang
Author Affiliations +
Proceedings Volume 13162, Fourth Symposium on Pattern Recognition and Applications (SPRA 2023); 131620A (2024) https://doi.org/10.1117/12.3030013
Event: Fourth Symposium on Pattern Recognition and Applications (SPRA2023), 2023, Napoli, Italy
Abstract
The technology of Chinese dialect speech recognition contributes to the preservation and inheritance of regional culture, as well as providing more convenient and customized services, with broad application prospects. In recent years, end-to-end speech recognition methods have demonstrated strong performance in dialect recognition. However, training the model using only a single dialect dataset would cause the model to lose the commonalities in acoustics and linguistics at a broader level. On the other hand, directly training a single model with multiple dialects would overlook the differences between dialect texts, thus affecting the model’s performance. To address this issue, this paper proposes a Chinese multi-dialect speech recognition method based on instruction tuning. By adding different instruction sets before different dialect texts, the model can learn the commonalities among different dialects within the same language while preserving the differences between dialect texts. Additionally, this paper also attempts to enhance the model’s text generation capability by using an additional language model for rescoring the model outputs. We conducted tests on the Common Voice dataset using the Whisper model. The results show that compared to the method of direct mixed training, the instruction finetuning method and rescoring method reduced the Word Error Rate (WER) by 13.44% and 21.18% respectively.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Timin Ding, Kai Sun, Xu Zhang, Jian Yu, and Degen Huang "Chinese multi-dialect speech recognition based on instruction tuning", Proc. SPIE 13162, Fourth Symposium on Pattern Recognition and Applications (SPRA 2023), 131620A (8 May 2024); https://doi.org/10.1117/12.3030013
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Education and training

Data modeling

Speech recognition

Performance modeling

Acoustics

Modeling

Transformers

Back to Top