Paper
17 January 2005 DRR research beyond commercial off-the-shelf OCR software: a survey
Author Affiliations +
Proceedings Volume 5676, Document Recognition and Retrieval XII; (2005) https://doi.org/10.1117/12.581881
Event: Electronic Imaging 2005, 2005, San Jose, California, United States
Abstract
After decades of research, Optical Character Recognition (OCR) has entered into a relatively mature stage. Commercial off-the-shelf (COTS) OCR software packages have become powerful tools in Document Recognition and Retrieval (DRR) applications. One question naturally arises: What areas are left for new DRR research beyond COTS OCR software? There are many discussions around it in recent conferences. This paper attempts to address this question through a systematic survey of recently reported DRR projects as well as our own Digital Content Re-Mastering (DCRM) research at HP Labs. This survey has shown that custom DRR research is still in great need for better accuracy and reliability, complementary contents, or downstream information retrieval. Several concrete observations are also made on the basis of this survey: First, the basic character/word recognition is mostly taken on by COTS software, with a few exceptions. Second, system-level research with regard to reliability and guaranteed accuracy can seldom be replaced by COTS software. Third, document-level structure understanding still has much room to expand. Fourth, post-OCR information retrieval also has many challenging research topics.
© (2005) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Xiaofan Lin "DRR research beyond commercial off-the-shelf OCR software: a survey", Proc. SPIE 5676, Document Recognition and Retrieval XII, (17 January 2005); https://doi.org/10.1117/12.581881
Lens.org Logo
CITATIONS
Cited by 5 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Commercial off the shelf technology

Image segmentation

Video

Reliability

Error analysis

Information fusion

Back to Top