Paper
16 January 2006 Document clustering: applications in a collaborative digital library
Fuad Rahman, Aman Kumar, Yuilya Tarnikova, Hassan Alam
Author Affiliations +
Proceedings Volume 6067, Document Recognition and Retrieval XIII; 60670K (2006) https://doi.org/10.1117/12.650161
Event: Electronic Imaging 2006, 2006, San Jose, California, United States
Abstract
This paper introduces a document clustering method within a commercial document repository, FileShare(R). FileShare(R) is a commercial collaborative digital library offering facilities for sharing and accessing documents over a simple Internet browser (e.g. Microsoft(R) Internet Explorer(R), Netscape(R) or Opera(R)) within groups of people working on common projects. As the number of documents increases within a digital library, displaying these documents in this environment poses a huge challenge. This paper proposes a document clustering method that uses a modified version of the traditional K-Means algorithm to categorize documents by their themes using lexical chaining within the FileShare(R) repository. The proposed algorithm is unsupervised, and has shown very high accuracy in a typical experimental setup.
© (2006) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Fuad Rahman, Aman Kumar, Yuilya Tarnikova, and Hassan Alam "Document clustering: applications in a collaborative digital library", Proc. SPIE 6067, Document Recognition and Retrieval XIII, 60670K (16 January 2006); https://doi.org/10.1117/12.650161
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Digital libraries

Databases

Distance measurement

Internet

Genetic algorithms

Human-machine interfaces

Visualization

RELATED CONTENT

Component-based WebGIS and map visualization objects
Proceedings of SPIE (March 19 2004)
Recent trends in print portals and Web2Print applications
Proceedings of SPIE (January 19 2009)
WISE: a content-based Web image search engine
Proceedings of SPIE (December 22 2000)
The LINC-NIRVANA common software
Proceedings of SPIE (June 27 2006)

Back to Top