This paper describes a system for script identification of handwritten
word images. The system is divided into two main
phases, training and testing. The training phase performs a
moment based feature extraction on the training word images
and generates their corresponding feature vectors. The testing
phase extracts moment features from a test word image
and classifies it into one of the candidate script classes using
information from the trained feature vectors. Experiments
are reported on handwritten word images from three scripts:
Latin, Devanagari and Arabic. Three different classifiers are
evaluated over a dataset consisting of 12000 word images in
training set and 7942word images in testing set. Results show
significant strength in the approach with all the classifiers having
a consistent accuracy of over 97%.
This paper describes an OCR-based technique for word
spotting in Devanagari printed documents. The system
accepts a Devanagari word as input and returns a sequence
of word images that are ranked according to their
similarity with the input query. The methodology involves
line and word separation, pre-processing document
words, word recognition using OCR and similarity
matching. We demonstrate a Block Adjacency Graph
(BAG) based document cleanup in the pre-processing
phase. During word recognition, multiple recognition hypotheses
are generated for each document word using a
font-independent Devanagari OCR. The similarity matching
phase uses a cost based model to match the word
input by a user and the OCR results. Experiments are
conducted on document images from the publicly available
ILT and Million Book Project dataset. The technique
achieves an average precision of 80% for 10 queries and
67% for 20 queries for a set of 64 documents containing
5780 word images. The paper also presents a comparison
of our method with template-based word spotting techniques.
Transcript mapping or text alignment with handwritten documents is the automatic alignment of words in a text file with word images in a handwritten document. Such a mapping has several applications in fields ranging from machine learning where large quantities of truth data are required for evaluating handwriting recognition algorithms, to data mining where word image indexes are used in ranked retrieval of scanned documents in a digital library. The alignment also aids "writer identity" verification algorithms. Interfaces which display scanned handwritten documents may use this alignment to highlight manuscript tokens when a person examines the corresponding transcript word. We propose an adaptation of the True DTW dynamic programming algorithm for English handwritten documents. The integration of the dissimilarity scores from a word-model word recognizer and Levenshtein distance between the recognized word and
lexicon word, as a cost metric in the DTW algorithm leading to a fast and accurate alignment, is our primary contribution. Results provided, confirm the effectiveness of our approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.