24 December 2021 Visual domain knowledge-based multimodal zoning for textual region localization in noisy historical document images
Chulwoo Pack, Leen-Kiat Soh, Elizabeth Lorang
Author Affiliations +
Abstract

Document layout analysis, or zoning, is important for textual content analysis such as optical character recognition. Zoning document images such as digitized historical newspaper pages are challenging due to noise and quality of the document images. Recently, effective data-driven approaches, such as leveraging deep learning, have been proposed, albeit with the concern of requiring larger training data and thus incurring additional cost of ground truthing. We propose a zoning solution by incorporating a knowledge-driven document representation, gravity map, into a multimodal deep learning framework to reduce the amount of time and data required for training. We first generate a gravity map for each image, considering the centroid distance and area between a cell in a Voronoi tessellation and its content to encode visual domain knowledge of a zoning task. Second, we inject the gravity maps into a deep convolution neural network (DCNN) during training, as an additional modality to boost performance. We report on two investigations using two state-of-the-art DCNN architectures and three datasets: two sets of historical newspapers and a set of born-digital contemporary documents. Evaluations show that our solution achieved comparable segmentation accuracy using fewer training epochs and less training data compared to a naïve training scheme.

© 2021 SPIE and IS&T 1017-9909/2021/$28.00 © 2021 SPIE and IS&T
Chulwoo Pack, Leen-Kiat Soh, and Elizabeth Lorang "Visual domain knowledge-based multimodal zoning for textual region localization in noisy historical document images," Journal of Electronic Imaging 30(6), 063028 (24 December 2021). https://doi.org/10.1117/1.JEI.30.6.063028
Received: 24 May 2021; Accepted: 10 December 2021; Published: 24 December 2021
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Visualization

Data modeling

Image segmentation

Performance modeling

Image fusion

Optical character recognition

Image quality

Back to Top