18 April 2018 Deep hierarchical attention network for video description
Shuohao Li, Min Tang, Jun Zhang
Author Affiliations +
Abstract
Pairing video to natural language description remains a challenge in computer vision and machine translation. Inspired by image description, which uses an encoder–decoder model for reducing visual scene into a single sentence, we propose a deep hierarchical attention network for video description. The proposed model uses convolutional neural network (CNN) and bidirectional LSTM network as encoders while a hierarchical attention network is used as the decoder. Compared to encoder–decoder models used in video description, the bidirectional LSTM network can capture the temporal structure among video frames. Moreover, the hierarchical attention network has an advantage over single-layer attention network on global context modeling. To make a fair comparison with other methods, we evaluate the proposed architecture with different types of CNN structures and decoders. Experimental results on the standard datasets show that our model has a more superior performance than the state-of-the-art techniques.
© 2018 SPIE and IS&T 1017-9909/2018/$25.00 © 2018 SPIE and IS&T
Shuohao Li, Min Tang, and Jun Zhang "Deep hierarchical attention network for video description," Journal of Electronic Imaging 27(2), 023027 (18 April 2018). https://doi.org/10.1117/1.JEI.27.2.023027
Received: 3 January 2018; Accepted: 30 March 2018; Published: 18 April 2018
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication and 1 patent.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Computer programming

Video acceleration

Data modeling

Performance modeling

Machine vision

Systems modeling

RELATED CONTENT

Integrating knowledge distillation of multiple strategies
Proceedings of SPIE (December 28 2022)
Model recommendation for pedestrian detection
Proceedings of SPIE (August 29 2016)
Suggestive modeling for machine vision
Proceedings of SPIE (November 01 1992)

Back to Top