Videos uploaded to Meta's Family-of-Apps are transcoded into multiple bitstreams of various codec formats, resolutions and quality to provide the best video quality across the wide variety of devices and connection bandwidth constraints. On Facebook alone, there are more than 4 billion video views per day and to address the video processing at this scale, we needed a video processing solution that can deliver the best video quality possible, with the shortest amount of encoding time — all while being energy efficient, programmable, and scalable. In this paper, we present, Meta Scalable Video Processor (MSVP) that can do video processing at on-par quality compared to SW solutions but at a small fraction of the compute time and energy. Each MSVP ASIC can offer a peak SIMO (Single Input Multiple Output) transcoding performance of 4K at 15fps at the highest quality configuration and can scale up to 4K at 60fps at the standard quality configuration. This performance is achieved at ~10W of PCIe module power. We achieved a throughput gain of ~9x for H.264 when compared against libx264 SW encoding. For VP9, we achieved a throughput gain of ~50x when compared with libVPX speed 2 preset. Key components of MSVP transcoding include video decode, scalar, encoding and quality metric computation. In this paper, we go over ASIC architecture of MSVP, design of individual components and compare the perf/W vs quality against standard industry used SW encoders.
This paper describes FB-MOS metric that measures video quality at scale in Facebook ecosystem. As the quality of uploaded UGC source itself varies widely, FB-MOS consists of both a no-reference component to assess input (upload) quality and a full-reference component, based on SSIM, to assess quality preserved in the transcoding and delivery pipeline. Note that the same video may be watched on a variety of devices (Mobile/laptop/TV) in varying network conditions that cause quality fluctuations; moreover, the viewer can switch between in-line view and full-screen view during the same viewing session. We show how FB-MOS metric accounts for all this variation in viewing condition while minimizing the computation overhead. Validation of this metric on FB-content has shown that SROCC is 0.9147 using internally selected videos. The paper also discusses some of the optimizations to reduce metric computation complexity and scale the complexity in proportion to video popularity.
This paper explores several encoder-side techniques aimed at improving the compression performance of encoding for
the draft JPEG XR standard. Though the syntax and decoding process are fixed by the standard, significant variation in
encoder design and some variation in decoder design are possible. For a variety of selected quality metrics, the paper
discusses techniques for achieving better compression performance according to each metric. As a basic reference
encoder and decoder for the discussion and modifications, the publically available Microsoft HD Photo DPK (Device
Porting Kit) 1.0, on which the draft JPEG XR standard was based, was used. The quality metrics considered include
simple mathematical objective metrics (PSNR and L∞) as well as pseudo-perceptual metrics (single-scale and multi-scale
MSSIM).
This paper discusses cascaded multiple encoding/decoding cycles and their effect on image quality for lossy image
coding designs. Cascaded multiple encoding/decoding is an important operating scenario in professional editing
industries. In such scenarios, it is common for a single image to be edited by several people while the image is
compressed between editors for transit and archival. In these cases, it is important that decoding followed by re-encoding
introduce minimal (or no) distortion across generations. A significant number of potential sources of distortion
introduction exist in a cascade of decoding and re-encoding, especially if such processes as conversion between RGB
and YUV color representations, 4:2:0 resampling, etc., are considered (and operations like spatial shifting, resizing, and
changes of the quantization process or coding format). This paper highlights various aspects of distortion introduced by
decoding and re-encoding, and remarks on the impact of these issues in the context of three still-image coding designs:
JPEG, JPEG 2000, and JPEG XR. JPEG XR is a draft standard under development in the JPEG committee based on
Microsoft technology known as HD Photo. The paper focuses particularly on the JPEG XR technology, and suggests
that the design of the draft JPEG XR standard has several quite good characteristics in regard to re-encoding robustness.
KEYWORDS: High dynamic range imaging, Computer programming, Image compression, Data conversion, RGB color model, Standards development, Image processing, Erbium, Range imaging, Signal processing
High Dynamic Range (HDR) imaging support is one of the major features for the emerging draft JPEG XR standard.
JPEG XR is being standardized within the JPEG committee based on Microsoft technology known as HD Photo.
JPEG XR / HD Photo is primarily an integer-based coding technology design, accepting integer valued samples at the
encoder and producing integer valued samples at the decoder, with internal processing entirely in the integer space. Yet,
it can support compression of multiple HDR formats, including 16- and 32-bit float, 16-bit and 32-bit signed and
unsigned integer, and RGBE. Further, JPEG XR can enable lossless compression of some HDR formats such as 16-bit
signed and unsigned, 16-bit float and RGBE. This paper describes how HDR formats are handled in JPEG XR. It
examines in depth how these various HDR formats are converted to and from integer valued samples within the
JPEG XR codec, and the internal processing of these HDR formats. This paper describes how JPEG XR provides
flexible ways to compress HDR formats within the same codec framework as integer-valued formats, while maintaining
from the high compression efficiency and low computational complexity.
JPEG XR is a draft international standard undergoing standardization within the JPEG committee, based on a Microsoft
technology known as HD Photo. One of the key innovations in the draft JPEG XR standard is its integer-reversible
hierarchical lapped transform. The transform can provide both bit-exact lossless and lossy compression in the same
signal flow path. The transform requires only a small memory footprint while providing the compression benefits of a
larger block transform. The hierarchical nature of the transform naturally provides three levels of multi-resolution signal
representation. Its small dynamic range expansion, use of only integer arithmetic and its amenability to parallelized
implementation lead to reduced computational complexity. This paper provides an overview of the key ideas behind the
transform design in JPEG XR, and describes how the transform is constructed from simple building blocks.
KEYWORDS: Image compression, Quantization, Image quality, Computer programming, High dynamic range imaging, RGB color model, Digital photography, Digital imaging, Raster graphics, Image resolution
This paper introduces the HD Photo coding technology developed by Microsoft Corporation. The storage format for this
technology is now under consideration in the ITU-T/ISO/IEC JPEG committee as a candidate for standardization under
the name JPEG XR. The technology was developed to address end-to-end digital imaging application requirements,
particularly including the needs of digital photography. HD Photo includes features such as good compression capability,
high dynamic range support, high image quality capability, lossless coding support, full-format 4:4:4 color sampling,
simple thumbnail extraction, embedded bitstream scalability of resolution and fidelity, and degradation-free compressed domain
support of key manipulations such as cropping, flipping and rotation. HD Photo has been designed to optimize
image quality and compression efficiency while also enabling low-complexity encoding and decoding implementations.
To ensure low complexity for implementations, the design features have been incorporated in a way that not only
minimizes the computational requirements of the individual components (including consideration of such aspects as
memory footprint, cache effects, and parallelization opportunities) but results in a self-consistent design that maximizes
the commonality of functional processing components.
KEYWORDS: Quantization, Video, Computer programming, Video compression, Smoothing, Nonlinear filtering, Video coding, Digital signal processing, Control systems, Visualization
VC-1 is a state-of-the-art video compression format that provides highly competitive video quality, from very low
through very high bit rates, at reasonable computational complexity. The technology includes several innovations to
improve the rate-distortion performance envelope of the format and to provide additional tools for perceptual quality
improvement. This paper describes some of the key innovations of VC-1 that contributes to its quality and complexity
advantages. The focus of this paper is on those tools which are unique to VC-1, and differentiate it from other video
compression standards.
We investigate joint source-channel coding for transmission of video over time-varying channels. We assume that the channel state is known at the receiver, but only statistical description of the time varying nature of the channel is available at the transmitter. A multimode coder is proposed to efficiently quantize the input video, and generate a quasi fixed-length bit stream of unequal importance. We vary the error protection offered to the individual bits, by matching it to both its importance, and the channel noise statistics. Based on the channel state, the decoder makes the best estimate of the source vector from the received codeword. We present a design algorithm which optimizes the overall rate- distortion performance of the system. Simulations results show that the proposed system outperforms a reference scheme where the multimode (source) codes and the channel codes were designed separately. Further, both the multimode coding schemes provide substantial gains over fixed length JSCC coding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.