Paper
29 January 2007 Combining text clustering and retrieval for corpus adaptation
Author Affiliations +
Proceedings Volume 6500, Document Recognition and Retrieval XIV; 65000P (2007) https://doi.org/10.1117/12.703646
Event: Electronic Imaging 2007, 2007, San Jose, CA, United States
Abstract
The application-relevant text data are very useful in various natural language applications. Using them can achieve significantly better performance for vocabulary selection, language modeling, which are widely employed in automatic speech recognition, intelligent input method etc. In some situations, however, the relevant data is hard to collect. Thus, the scarcity of application-relevant training text brings difficulty upon these natural language processing. In this paper, only using a small set of application specific text, by combining unsupervised text clustering and text retrieval techniques, the proposed approach can find the relevant text from unorganized large scale corpus, thereby, adapt training corpus towards the application area of interest. We use the performance of n-gram statistical language model, which is trained from the text retrieved and test on the application-specific text, to evaluate the relevance of the text acquired, accordingly, to validate the effectiveness of our corpus adaptation approach. The language models trained from the ranked text bundles present well discriminated perplexities on the application-specific text. The preliminary experiments on short message text and unorganized large corpus demonstrate the performance of the proposed methods.
© (2007) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Feng He and Xiaoqing Ding "Combining text clustering and retrieval for corpus adaptation", Proc. SPIE 6500, Document Recognition and Retrieval XIV, 65000P (29 January 2007); https://doi.org/10.1117/12.703646
Lens.org Logo
CITATIONS
Cited by 2 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Performance modeling

Systems modeling

Expectation maximization algorithms

Statistical modeling

Algorithm development

Data modeling

Speech recognition

Back to Top