It has recently been demonstrated that spatial resolution adaptation can be integrated within video compression to improve overall coding performance by spatially down-sampling before encoding and super-resolving at the decoder. Significant improvements have been reported when convolutional neural networks (CNNs) were used to perform the resolution up-sampling. However, this approach suffers from high complexity at the decoder due to the employment of CNN-based super-resolution. In this paper, a novel framework is proposed which supports the flexible allocation of complexity between the encoder and decoder. This approach employs a CNN model for video down-sampling at the encoder and uses a Lanczos3 filter to reconstruct full resolution at the decoder. The proposed method was integrated into the HEVC HM 16.20 software and evaluated on JVET UHD test sequences using the All Intra configuration. The experimental results demonstrate the potential of the proposed approach, with significant bitrate savings (more than 10%) over the original HEVC HM, coupled with reduced computational complexity at both encoder (29%) and decoder (10%).
Spatial resolution adaptation is a technique which has often been employed in video compression to enhance coding efficiency. This approach encodes a lower resolution version of the input video and reconstructs the original resolution during decoding. Instead of using conventional up-sampling filters, recent work has employed advanced super-resolution methods based on convolutional neural networks (CNNs) to further improve reconstruction quality. These approaches are usually trained to minimise pixel-based losses such as Mean-Squared Error (MSE), despite the fact that this type of loss metric does not correlate well with subjective opinions. In this paper, a perceptually-inspired super-resolution approach (M-SRGAN) is proposed for spatial up-sampling of compressed video using a modified CNN model, which has been trained using a generative adversarial network (GAN) on compressed content with perceptual loss functions. The proposed method was integrated with HEVC HM 16.20, and has been evaluated on the JVET Common Test Conditions (UHD test sequences) using the Random Access configuration. The results show evident perceptual quality improvement over the original HM 16.20, with an average bitrate saving of 35.6% (Bjøntegaard Delta measurement) based on a perceptual quality metric, VMAF.
This paper presents a resolution adaptation framework for video compression. It dynamically applies spatial resampling, trading off the relationship between spatial resolution and quantization. A learning-based Quantization-Resolution Optimization (QRO) module, trained on a large database of video content, determines the optimal spatial resolution among multiple options, based on spatial and temporal video features of the uncompressed video frames. In order to improve the quality of upscaled videos, a modified CNN-based single image super-resolution method is employed at the decoder. This super-resolution model has been trained using compressed content from the same training database. The proposed resolution adaptation framework was integrated with the High Efficiency Video Coding (HEVC) reference software, HM 16.18, and tested on UHD content from several databases including videos from the JVET (Joint Video Exploration Team) test set. Experimental results show that the proposed method offers significant overall bit rate savings for a wide range of bitrates compared with the original HEVC HM 16.18, with average BD-rate savings of 12% (based on PSNR) and 15% (based on VMAF) and lower encoding complexity.
This paper reports an empirical investigation into increasing the efficiency of subjective data collection by reducing the length of test sequences below the recommended 10 seconds. Twenty-four observers viewed four 10 second reference sequences, in addition to four truncated versions of each: 7 seconds; 5 seconds; 3 seconds and 1.5 seconds. Results indicated that, compared to the 10 second sequences, the ability of observers to identify compression artefacts was significantly reduced only when viewing the 1.5 second sequences. These results indicate that, when using the DSCQS methodology, a significant benefit in the efficiency of subjective data collection can be gained by reducing the length of test sequences to potentially as low as 3 seconds, without a significant impact upon reliability.
This paper presents a parametric video compression framework which exploits both texture warping and dynamic
texture synthesis. A perspective motion model is employed to warp static textures and a dynamic texture model
is used to synthesise time-varying textures. An artefact-based video quality metric (AVM) is proposed which
prevents spatial and temporal artefacts and assesses the reconstructed video quality. This is validated using
both the VQEG database and subjective assessment, and shows competitive performance on both non-synthetic
and synthetic video content. Moreover, a local Rate-Quality Optimisation (RQO) strategy is developed based
on AVM in order to make a decision between waveform coding and texture warping/synthesis. The proposed
method has been integrated into an H.264 video coding framework with results offering significant bitrate savings
for similar visual quality (based on both AVM and subjective scores).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.