19 March 2023 Multilingual semantic fusion network for text recognition in the wild
Celi Lou, Minglei Tong, Liang Xue, Sisil Kumarawadu
Author Affiliations +
Abstract

Most current approaches in the literature of scene text recognition train the language model via a text dataset far sparser than in natural language processing, resulting in inadequate training. Therefore, we propose a simple transformer encoder–decoder model called the multilingual semantic fusion network (MSFN) that can leverage prior linguistic knowledge to learn robust language features. First, we label the text dataset with forward, backward sequences, and subwords, which are extracted by tokenization with linguistic information. Then we introduce a multilingual model to the decoder corresponding to three different channels of the labeled dataset. The final output is fused by different channels to get more accurate results. In experiments, MSFN achieves cutting-edge performance across six benchmark datasets, and extensive ablative studies have proven the effectiveness of the proposed method. Code is available at https://github.com/lclee0577/MLViT.

© 2023 SPIE and IS&T
Celi Lou, Minglei Tong, Liang Xue, and Sisil Kumarawadu "Multilingual semantic fusion network for text recognition in the wild," Journal of Electronic Imaging 32(2), 023015 (19 March 2023). https://doi.org/10.1117/1.JEI.32.2.023015
Received: 24 August 2022; Accepted: 20 February 2023; Published: 19 March 2023
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Transformers

Semantics

Performance modeling

Computer programming

Data modeling

Education and training

Visual process modeling

RELATED CONTENT


Back to Top