We present Intelligent Indexing: a general, scalable, collaborative approach to indexing and transcription of non-machinereadable documents that exploits visual consensus and group labeling while harnessing human recognition and domain expertise. In our system, indexers work directly on the page, and with minimal context switching can navigate the page, enter labels, and interact with the recognition engine. Interaction with the recognition engine occurs through preview windows that allow the indexer to quickly verify and correct recommendations. This interaction is far superior to conventional, tedious, inefficient post-correction and editing. Intelligent Indexing is a trainable system that improves over time and can provide benefit even without prior knowledge. A user study was performed to compare Intelligent Indexing to a basic, manual indexing system. Volunteers report that using Intelligent Indexing is less mentally fatiguing and more enjoyable than the manual indexing system. Their results also show that it reduces significantly (30.2%) the time required to index census records, while maintaining comparable accuracy. (a video demonstration is available at http://youtube.com/gqdVzEPnBEw)
We describe a system for indexing of census records in tabular documents with the goal of recognizing the content
of each cell, including both headers and handwritten entries. Each document is automatically rectified, registered
and scaled to a known template following which lines and fields are detected and delimited as cells in a tabular
form. Whole-word or whole-phrase recognition of noisy machine-printed text is performed using a glyph library,
providing greatly increased efficiency and accuracy (approaching 100%), while avoiding the problems inherent
with traditional OCR approaches. Constrained handwriting recognition results for a single author reach as high
as 98% and 94.5% for the Gender field and Birthplace respectively. Multi-author accuracy (currently 82%) can
be improved through an increased training set. Active integration of user feedback in the system will accelerate
the indexing of records while providing a tightly coupled learning mechanism for system improvement.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.