基于提示方法与知识蒸馏方法的口语语音识别模型构建

郭嘉; 彭太乐

doi:10.12198/j.issn.1673-159X.4983

基于提示方法与知识蒸馏方法的口语语音识别模型构建

郭嘉,
彭太乐

Construction of Spoken Language Recognition Models Based on Prompt Methods and Knowledge Distillation Methods

GUO Jia,
PENG Taile

摘要

摘要: 提示方法是利用预训练语言模型的一种有效技术，只需要少量的示例就可以使用语言模型进行一个新的自然语言任务。文章提出了一种新的基于提示方法和知识蒸馏方法的语音识别模型（SpokenPrompt-KD模型）。该模型利用Wav2Vec模型将语音转化为预训练语言模型可识别的文本嵌入形式，从而将语言模型的小样本学习能力拓展到语音识别领域，同时通过知识蒸馏方法将教师语言模型中的知识传递给学生语音模型，以提高模型在语音理解任务上的准确性。实验结果表明，在100 h的数据集上进行预训练后，模型在分类任务上的准确率可以达到88.4%，证明了这种小样本学习能力的模型在语音识别领域是可行的、有效的。

Abstract: Prompt method is an effective approach for using pre-trained language models. It requires only a small number of examples to perform a new natural language task. This article presents a novel speech recognition model, which is called SpokenPrompt-KD and based on prompting and knowledge distillation methods. This model utilizes the Wav2Vec model to convert speech into a text embedding format recognizable by pre-trained language models, thereby extends the language model's few-shot learning capabilities into the realm of speech recognition. Simultaneously, it employs knowledge distillation methods to transfer knowledge from a teacher language model to a student speech model, aiming to enhance the model's accuracy in speech understanding tasks. The experimental results indicate that after pre-training on a 100-hour dataset, the model achieves an accuracy of 88.4% in classification tasks. The experiments demonstrate the feasibility and effectiveness of this model with few-shot learning capabilities in the field of speech recognition.

HTML全文

参考文献(41)

文章被引

资源附件(0)