Bottom-up spatio-temporal saliency detection identifies perceptually important regions of interest in video sequences. The center-surround model proves to be useful for visual saliency detection. In this work, we explore using 3D FFT local spectra as features for saliency detection within the center-surround framework. We develop a spectral location based decomposition scheme to divide a 3D FFT cube into two components, one related to temporal changes and the other related to spatial changes. Temporal saliency and spatial saliency are detected separately using features derived from each spectral component through a simple center-surround comparison method. The two detection results are then combined to yield a saliency map. We apply the same detection algorithm to different color channels (YIQ) and incorporate the results into the final saliency determination. The proposed technique is tested with the public CRCNS database. Both visual and numerical evaluations verify the promising performance of our technique.
In this paper, we propose a method to extract depth from motion, texture and intensity. We first analyze the depth map to
extract a set of depth cues. Then, based on these depth cues, we process the colored reference video, using texture, motion,
luminance and chrominance content, to extract the depth map. The processing of each channel in the YCRCB-color space
is conducted separately. We tested this approach on different video sequences with different monocular properties. The
results of our simulations show that the extracted depth maps generate a 3D video with quality close to the video rendered
using the ground truth depth map. We report objective results using 3VQM and subjective analysis via comparison of
rendered images. Furthermore, we analyze the savings in bitrate as a consequence of eliminating the need for two video
codecs, one for the reference color video and one for the depth map. In this case, only the depth cues are sent as a side
information to the color video.
KEYWORDS: Video, Video coding, Quantization, Statistical analysis, Computer programming, Error analysis, Distortion, Video compression, Video processing, Standards development
We examine the impact of various encoding parameters on the distribution of the DCT coefficients for H.264-like
video coders. We model the distribution of the frame DCT coefficients using the most common Laplacian and
Cauchy distributions. We show that the resolution, the quantization levels and the coding type have significant
impact on the accuracy of the Laplacian and Cauchy distribution based models. We also show that the transform
kernel (4 ×4 vs 8 × 8) has little impact. Moreover, we show that for the video sources that have little temporal
or spatial detail, such as flat regions, the distribution of the frame DCT coefficients resembles a Laplacian
distribution. When the video source exhibits more detail, such as texture and edges, the distribution of the
frame DCT coefficients resembles a Cauchy distribution. The correlation between the details of the video source
to the two probability distributions can be used to further improve the estimation of the distribution of the frame
DCT coefficients, by using a classification based approach.
The curvelet transform is a recently introduced non-adaptive multi-scale transform that have gained popularity in the image processing field. In this paper, we study the effect of customized tiling of frequency content in the curvelet transform. Specifically, we investigate the effect of the size of the coarsest level and its relationship to denoising performance. Based on the observed behavior, we introduce an algorithm to automatically choose the optimal number of decompositions. Its performance shows a clear advantage, in denoising applications, when compared to default curvelet decomposition. We also examine how denoising is affected by varying the number of divisions per scale.
In this paper we introduce a depth adaptive approach for disocclusion removal in depth image-based rendering (DIBR).
This approach extends the hierarchical hole-filling (HHF) presented in an earlier work. Similar to HHF, the depth adaptive
approach results in synthesized 3D videos that are free of geometric distortions. Furthermore, the edges and texture
around the disoccluded areas can be sharpened and enhanced through adding the depth adaptive preprocessing step before
applying the hierarchal hole-filling. The subjective and objective results show a significant improvement in quality for the
synthesized views using the depth adaptive approach.
Conference Committee Involvement (4)
Video Surveillance and Transportation Imaging Applications 2015
10 February 2015 | San Francisco, California, United States
Video Surveillance and Transportation Imaging Applications 2014
3 February 2014 | San Francisco, California, United States
Video Surveillance and Transportation Imaging Applications
4 February 2013 | Burlingame, California, United States
Video Surveillance and Transportation Imaging Applications
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.