Prediction of visual saliency in video with deep CNNs

Souad Chaabouni; Jenny Benois-Pineau; Ofer Hadar

doi:10.1117/12.2238956

28 September 2016 Prediction of visual saliency in video with deep CNNs

Souad Chaabouni, Jenny Benois-Pineau, Ofer Hadar

Proceedings Volume 9971, Applications of Digital Image Processing XXXIX; 99711Q (2016) https://doi.org/10.1117/12.2238956
Event: SPIE Optical Engineering + Applications, 2016, San Diego, California, United States

Abstract

Prediction of visual saliency in images and video is a highly researched topic. Target applications include Quality assessment of multimedia services in mobile context, video compression techniques, recognition of objects in video streams, etc. In the framework of mobile and egocentric perspectives, visual saliency models cannot be founded only on bottom-up features, as suggested by feature integration theory. The central bias hypothesis, is not respected neither. In this case, the top-down component of human visual attention becomes prevalent. Visual saliency can be predicted on the basis of seen data. Deep Convolutional Neural Networks (CNN) have proven to be a powerful tool for prediction of salient areas in stills. In our work we also focus on sensitivity of human visual system to residual motion in a video. A Deep CNN architecture is designed, where we incorporate input primary maps as color values of pixels and magnitude of local residual motion. Complementary contrast maps allow for a slight increase of accuracy compared to the use of color and residual motion only. The experiments show that the choice of the input features for the Deep CNN depends on visual task:for th eintersts in dynamic content, the 4K model with residual motion is more efficient, and for object recognition in egocentric video the pure spatial input is more appropriate.

Conference Presentation

Citation Download Citation

Souad Chaabouni, Jenny Benois-Pineau, and Ofer Hadar "Prediction of visual saliency in video with deep CNNs", Proc. SPIE 9971, Applications of Digital Image Processing XXXIX, 99711Q (28 September 2016); https://doi.org/10.1117/12.2238956

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available