Paper
3 April 1997 Performance evaluation of document layout analysis algorithms on the UW data set
Jisheng Liang, Ihsin T. Phillips, Robert M. Haralick
Author Affiliations +
Proceedings Volume 3027, Document Recognition IV; (1997) https://doi.org/10.1117/12.270067
Event: Electronic Imaging '97, 1997, San Jose, CA, United States
Abstract
A performance evaluation protocol for the layout analysis is discussed in this paper. In the University of Washington English Document Image Database-III, there are 1600 English document images that come with manually edited ground truth of entity bounding boxes. These bounding boxes enclose text and non-text zones, text-lines, and words. We describe a performance metric for the comparison of the detected entities and the ground truth in terms of their bounding boxes. The Document Attribute Format Specification is used as the standard data representation. The protocol is intended to serve as a model for using the UW-III database to evaluate the document analysis algorithms. A set of layout analysis algorithms which detect different entities have been tested based on the data set and the performance metric. The evaluation results are presented in this paper.
© (1997) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jisheng Liang, Ihsin T. Phillips, and Robert M. Haralick "Performance evaluation of document layout analysis algorithms on the UW data set", Proc. SPIE 3027, Document Recognition IV, (3 April 1997); https://doi.org/10.1117/12.270067
Lens.org Logo
CITATIONS
Cited by 28 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image segmentation

Detection and tracking algorithms

Data modeling

Databases

Error analysis

Algorithm development

Image processing algorithms and systems

RELATED CONTENT


Back to Top