Paper
20 February 2024 Application of speech recognition and autoreference models for logging tasks
Yaroslav A. Shentsov, Tatiana Y. Chernysheva, Galina B. Barskaya
Author Affiliations +
Proceedings Volume 13065, Third International Conference on Optics, Computer Applications, and Materials Science (CMSD-III 2023); 1306503 (2024) https://doi.org/10.1117/12.3024859
Event: Third International Conference on Optics, Computer Applications, and Materials Science (CMSD-III 2023), 2023, Dushanbe, Tajikistan
Abstract
Protocols play an important role in decision-making across many fields. The effectiveness of the work of various organizations and teams directly depends on their quality and speed of writing. Therefore, automating this process is of utmost importance today. This article proposes an approach to address the issue of logging by processing existing audio recordings of meetings or events and using an ensemble of artificial intelligence models, including a pre-trained acoustic speech recognition model based on the "Quartznet" architecture, a language N-gram model implemented using the "KenLM" tool, and a Russian-language model "RuBERT" retrained for extractive abstracting. The algorithms of these models' operation are examined, and the processes of data collecting and retraining, as well as the implementation of the selected models, are described. Quality metrics for the speech recognition system are compared, and an analysis of the implemented systems is conducted. To utilize the developed system an HTTP web server was deployed via an API. The final results of the developed automatic logging system, capable of extracting timestamps of spoken words, highlighting speakers in audio recordings, and reducing the resulting text to a specified percentage are demonstrated.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Yaroslav A. Shentsov, Tatiana Y. Chernysheva, and Galina B. Barskaya "Application of speech recognition and autoreference models for logging tasks", Proc. SPIE 13065, Third International Conference on Optics, Computer Applications, and Materials Science (CMSD-III 2023), 1306503 (20 February 2024); https://doi.org/10.1117/12.3024859
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Education and training

Speech recognition

Systems modeling

Data modeling

Acoustics

Detection and tracking algorithms

Data processing

Back to Top