Paper
15 December 2022 MSA: an end-to-end scene text spotter with mask-supervised-attention
Xin Chen, Chaoyong Peng, Chunrong Qiu, Xiaorong Gao, Yu Zhang, Min Xu
Author Affiliations +
Proceedings Volume 12478, Thirteenth International Conference on Information Optics and Photonics (CIOP 2022); 124781Q (2022) https://doi.org/10.1117/12.2654503
Event: Thirteenth International Conference on Information Optics and Photonics (CIOP 2022), 2022, Xi'an, China
Abstract
End-to-end(E2E) scene text recognition (the joint detection and recognition of natural text images) is developing rapidly, and the joint optimization strategy for image sequence alignment has become a research hotspot. Those existing methods are either difficult to train or costly for character annotations. In this paper, a novel end-to-end scene text recognition framework is proposed, Based on the Swin-Transformer (Swin-T) FPN backbone network, the model adopts the instance segmentation method to obtain the text mask and binarizes it to directly locate its polygon boundaries. Meanwhile, to solve the problem of low fitting efficiency of the text sequence recognition module, we designed a self-monitoring Mask-Supervised Attention (MSA) mechanism to accelerate the fitting speed and fitting accuracy of the recognition module, thereby improving the joint performance for E2E text recognition. The results show that in the E2E text recognition task, the F-measure performance of the proposed model achieves not only 2.3%, 2.7% and 11.9% improvement on ICDAR 2015 on strong, weak and generic lexicons, but also 4.8%, 9.9% improvement on Total-text on full and none lexicons compared with other typical models.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Xin Chen, Chaoyong Peng, Chunrong Qiu, Xiaorong Gao, Yu Zhang, and Min Xu "MSA: an end-to-end scene text spotter with mask-supervised-attention", Proc. SPIE 12478, Thirteenth International Conference on Information Optics and Photonics (CIOP 2022), 124781Q (15 December 2022); https://doi.org/10.1117/12.2654503
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Performance modeling

Data modeling

Image segmentation

Feature extraction

Systems modeling

Visual process modeling

Network architectures

Back to Top