The aim of this paper is to propose a document flow supervised segmentation approach applied to real world
heterogeneous documents. Our algorithm treats the flow of documents as couples of consecutive pages and studies the
relationship that exists between them. At first, sets of features are extracted from the pages where we propose an
approach to model the couple of pages into a single feature vector representation. This representation will be provided to
a binary classifier which classifies the relationship as either segmentation or continuity. In case of segmentation, we
consider that we have a complete document and the analysis of the flow continues by starting a new document. In case
of continuity, the couple of pages are assimilated to the same document and the analysis continues on the flow. If there is
an uncertainty on whether the relationship between the couple of pages should be classified as a continuity or
segmentation, a rejection is decided and the pages analyzed until this point are considered as a "fragment". The first
classification already provides good results approaching 90% on certain documents, which is high at this level of the
system.
We present in this paper a feature selection and weighting method for medieval handwriting images that relies on
codebooks of shapes of small strokes of characters (graphemes that are issued from the decomposition of manuscripts).
These codebooks are important to simplify the automation of the analysis, the manuscripts transcription and the
recognition of styles or writers. Our approach provides a precise features weighting by genetic algorithms and a highperformance
methodology for the categorization of the shapes of graphemes by using graph coloring into codebooks
which are applied in turn on CBIR (Content Based Image Retrieval) in a mixed handwriting database containing
different pages from different writers, periods of the history and quality. We show how the coupling of these two
mechanisms 'features weighting - graphemes classification' can offer a better separation of the forms to be categorized
by exploiting their grapho-morphological, their density and their significant orientations particularities.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.