17 November 2022 Semantic space captioner: generating image captions step by step
Chenhao Zhu, Xia Ye, Qiduo Lu
Author Affiliations +
Abstract

Image captioning is a popular research direction at the intersection of machine vision and natural language processing. Most of the existing image captioning methods adopt an encoder–decoder-like structure in which the image is encoded and fed into a decoder to generate a paragraph describing the image content. Although the existing methods have achieved great results in describing natural images, there is still much room for improvement in describing details. We propose the semantic space captioner model to introduce the concept of dense captioning into image captioning using contrastive language-image pretraining as an encoder for text and images. Dense captions are generated for image regions and are used as an extra semantic space for decoding to enhance the final caption. According to the experimental results, our model outperforms existing methods in generalizing image details and is able to generate diverse and meaningful captions. It also performs well on the MSCOCO dataset-related metrics scores.

© 2022 SPIE and IS&T
Chenhao Zhu, Xia Ye, and Qiduo Lu "Semantic space captioner: generating image captions step by step," Journal of Electronic Imaging 31(6), 063021 (17 November 2022). https://doi.org/10.1117/1.JEI.31.6.063021
Received: 26 June 2022; Accepted: 27 October 2022; Published: 17 November 2022
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Computer programming

Image processing

Image retrieval

Visual process modeling

Visualization

Data modeling

Lutetium

RELATED CONTENT


Back to Top