PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 7073, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Subpixel scene registration is useful for certain image processing applications. The scene shifts do not necessarily have to be integer shifts. In this paper, we present an image registration approach that is based on the wavelet decomposition and the Fitts correlation algorithm. The original Fitts algorithm is ideal for small-scale translations. A successful image-based tracker using Fitts correlation for position measurement will require additional modifications to the original algorithm to enable it to perform the small-scale translations. We used a wavelet transform to preprocess the images
before performing scene registration.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Classic correlation filters for detection and location estimation of a reference require explicit information about the object to be recognized. In this work we assume that a reference signal is not available. However, we suppose that the target is placed at unknown coordinates in a noisy reference image. Optimal correlation filters with respect to signal-to-noise ratio and peak-to-output energy for detection and localization of a target embedded into an input scene are derived. Computer simulation results obtained with proposed filters are presented and compared with those of common correlation filters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a new image denoising method which is based on the uHMT(universal Hidden Markov Tree) model in
the wavelet domain is proposed. The MAP (Maximum a Posteriori) estimate is adopted to deal with the ill-conditioned
problem (such as image denoising) in the wavelet domain. The uHMT model in the wavelet domain is applied to construct
a prior model for the MAP estimate. By using the optimization method Conjugate Gradient, the closest approximation to
the true result is achieved. The results show that images restored by our method are much better and sharper than other
methods not only visually but also quantitatively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Design and Implementation of Color Transformations
A new technique is described for color conversions of JPEG images. For each input block of each
component, the conversion for the 63 AC coefficients is processed in the transform domain instead of the
spatial domain. Only the DC coefficients for each input block of the color components are transformed to
the spatial domain and then processed through the traditional lookup table to create color-converted output
DC coefficients for each block. Given each converted DC value for each block, the remaining 63 AC
coefficients are then converted directly in the transform domain via scaling functions that are accessed via a
table as a function of only the DC term. For n-dimensional color space to m-dimensional color space
conversion, n component blocks create m component blocks. An IDCT can then be applied to the m
component blocks to create spatial domain data or these output blocks can be quantized and entropy
encoded to create JPEG compressed data in the m-dimensional color space.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper reviews a set of color spaces that allow reversible mapping between red-green-blue and luma-chroma
representations in integer arithmetic. The YCoCg transform and its reversible form YCoCg-R can improve coding gain
by over 0.5 dB with respect to the popular YCrCb transform, while achieving much lower computational complexity.
We also present extensions of the YCoCg transform for four-channel CMYK pixel data. Thanks to their reversibility
under integer arithmetic, these transforms are useful for both lossy and lossless compression. Versions of these
transforms are used in the HD Photo image coding technology (which is the basis for the upcoming JPEG XR standard)
and in recent editions of the H.264/MPEG-4 AVC video coding standard.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Converting images between color spaces is a computationally very demanding task. The conversion is based on lookup tables that have an output colorspace value defined for each node in a mesh that covers the input space. If the input color value to be converted is not a mesh node, the output is computed by interpolating values in the surrounding mesh nodes. For a three dimensional input space, such as the RGB, tetrahedral and trilinear interpolations are used. If the input space is four dimensional, quadrilinear interpolation is used. This paper discusses how to reduce the complexity of lookup
table implementation by exploiting the relationships between input and output color space components and using moderate lookup table expansion to achieve significant speed advantage. For example, a CMYK to K conversion, commonly implemented using quadrilinear interpolation on a 9x9x9x9 mesh can be reduced to bilinear interpolation
assuming the mesh can grow from 6561 nodes to 684288 nodes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We study a problem of approximate computation of color transforms (with real and possibly irrational factors) using integer arithmetics. We show that precision of such computations can be significantly improved if we allow input or output variables to be scaled by some constant. The problem of finding such a constant turns out to be related to the classic Diophantine approximation problem. We use this relation to explain how best scaled approximations can be derived, and provide several examples of using this technique for design of color transforms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many researchers are working on various aspects of color imaging. However, it is apparent from the papers published
that it is difficult for many of them to produce high quality color matching and reproductions. While steps needed to
produce calibrated color images are fairly well developed, the problem of implementation high quality color matching
and reproduction continues to be a serious problem in terms of creating huge color matching data sets or complicated
procedures and last the complexity of creating the appropriate mach is often very significant. To surmount this final
barrier, this paper is proposes a method to reduce the complexity of the color matching and reproductions without
sacrificing quality. Additionally the amount of data that is collected is also significantly reduced. The color tables are
used typical color matching of many deice types. Most of the device color spaces are nonlinear and matching and
characterization is done in changing conditions. Typical color image processing techniques use profiles consisting of
sparse multi-dimensional lookup tables that interpolate between adjacent nodes to prepare an image for rendering. Due
to the proliferation of low-cost colour devices (digital colour cameras, scanners, printers etc.) during the last few years,
colour calibration has become an important issue.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
JPEG XR is a draft international standard undergoing standardization within the JPEG committee, based on a Microsoft
technology known as HD Photo. One of the key innovations in the draft JPEG XR standard is its integer-reversible
hierarchical lapped transform. The transform can provide both bit-exact lossless and lossy compression in the same
signal flow path. The transform requires only a small memory footprint while providing the compression benefits of a
larger block transform. The hierarchical nature of the transform naturally provides three levels of multi-resolution signal
representation. Its small dynamic range expansion, use of only integer arithmetic and its amenability to parallelized
implementation lead to reduced computational complexity. This paper provides an overview of the key ideas behind the
transform design in JPEG XR, and describes how the transform is constructed from simple building blocks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents an overview of the new Scalable Video Coding (SVC) Amendment of H.264/AVC and the results of a performance evaluation for this new video coding specification. Whereas temporal scalability is already enabled by the existing H.264/AVC specification, the introduction of spatial and quality scalabiliy
requires new coding tools. Here, the layered structure of SVC and the main new coding tools are briefly described, and an overview of the newly defined SVC profiles and levels is provided. The second part of the paper describes a subjective evaluation that was carried out to test the efficiency of the SVC concept. The results of this evaluation show, that the coding tools introduced in the scalable extension of H.264/AVC provide a reasonable degree of spatial and quality scalability at very low costs in terms of additional bit rate. The evaluation consisted of a series of subjective quality tests and is backed up by objective PSNR measurements.
The results show, that SVC supports spatial and quality scalability with a bit rate overhead of less than or about 10%, and an indistinguishable visual quality compared to state of the art
single-layer coding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We introduce an efficient mode selection method in the enhancement layers of spatial scalability in the SVC encoder by selectively performing the inter-layer residual coding of the SVC. The proposed method is to make an analysis of the characteristics of integer transform coefficients for the subtracted signals for two residuals from lower and upper spatial layers. Then it selectively performs the inter-layer residual prediction coding in the spatial scalability if the SAD values of inter-layer residuals exceed adaptive threshold values. Therefore, by classifying the residuals according to
the properties of integer-transform coefficients only with the SAD of inter-layer residual signals between two layers, the SVC encoder can perform the inter-layer residual coding selectively, thus significantly reducing the total encoding time with 51.2% in average while maintaining the RD performance with negligible amounts of quality degradation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There has been increased momentum recently in the production of 3D content for cinema applications; for the most part, this has been limited to stereo content. There are also a variety of display technologies on the market that support 3DTV, each offering a different viewing experience and having different input requirements. More specifically, stereoscopic displays support stereo content and require glasses, while auto-stereoscopic displays avoid the need for glasses by rendering view-dependent stereo pairs for a multitude of viewing angles. To realize high quality auto-stereoscopic displays, multiple views of the video must either be provided as input to the display, or these views must be created locally at the display. The former approach has difficulties in that the production environment is typically limited to stereo, and transmission bandwidth for a large number of views is not likely to be available. This paper discusses an emerging 3D data format that enables the latter approach to be realized. A new framework for efficiently representing a
3D scene and enabling the reconstruction of an arbitrarily large number of views prior to rendering is introduced. Several design challenges are also highlighted through experimental results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a high-throughput and low-cost context adaptive binary arithmetic coding (CABAC) codec for
H.264/AVC High and Main Profile. We analyze the similarities between CABAC encoding and decoding algorithms and
propose an efficient pipeline architecture to accelerate their operations. A binary arithmetic unit is designed elaborately
which integrates the three engines regular, bypass and terminate for the encoder and decoder by taking advantage of
hardware sharing. The proposed CABAC codec is already integrated in a H.264 hardware core and is able to achieve
real-time decoding for H.264/AVC High Profile HD level 4.1. The implemented design can operate at 265MHz with
39.2k gate count under 0.18μm silicon technology.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper offers an overall review of recent advances in the design of modern transforms for image and video coding
applications. Transforms have been an integral part of signal coding applications from the beginning, but emphasis had
been on true floating-point transforms for most of that history. Recently, with the proliferation of low-power handheld
multimedia devices, a new vision of integer-only transforms that provide high performance yet very low complexity has
quickly gained ascendency. We explore two key design approaches to creating integer transforms, and focus on a
systematic, universal method based on decomposition into lifting steps, and use of (dyadic) rational coefficients. This
method provides a wealth of solutions, many of which are already in use in leading media codecs today, such as H.264,
HD Photo/JPEG XR, and scalable audio. We give early indications in this paper, and more fully elsewhere.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a fast macroblock mode decision scheme in the H.264|MPEG-4 Part 10 Advanced Video Coding
(AVC) for mobile video telephony applications. In general, the face region around a speaker is considered region of
interest (ROI) while the background is not importantly considered, thus being regarded as non-ROI for the video
telephony situations. We usually have to consider the following two issues: (1) the platforms of mobile video telephony
are usually computationally limited and the AVC codecs are computationally expensive; and (2) the allowed channel
bandwidths are usually very small so the compressed video streams are transmitted with visual quality degraded. In this
paper, we challenge the two issues: Firstly, a fast macroblock mode decision scheme is contrived to alleviate the
computational complexity of H.264|MPEG-4 Part 10 AVC; and secondly, ROI/non-ROI coding of H.264|MPEG-4 Part
10 AVC is incorporated to enhance subjective visual quality by encoding ROI data in higher quality but non-ROI data in
lower quality.
Our proposed fast macroblock mode decision scheme consists of three parts: early skip mode detection, fast inter
macroblock mode decision and intra prediction skip parts. The skip mode detection method is to decide whether or not to
perform the rest inter macroblock modes in P-Slices. The fast inter macroblock mode method is to reduce the candidate
block mode by SATDMOTION from 16x16 and 8x8 block motion estimation. The intra prediction skipping condition is set
to decide whether or not to perform the 4x4 intra prediction in P-Slices using the relation between magnitude of the
motion vectors of the current macroblock and the occurrence frequencies of intra predicted macroblocks. The
experimental results show that the proposed scheme yields up to 51.88% of the computational complexity reduction in
total encoding time with negligible amounts of PSNR drops and bit rate increments, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In a packet erasure network environment, video streaming benefits from error control in two ways to achieve
graceful degradation. The first approach is application-level (or the link-level) forward error-correction (FEC)
to provide erasure protection. The second error control approach is error concealment at the decoder end to
compensate lost packets. A large amount of research work has been done in the above two areas. More recently,
network coding (NC) techniques have been proposed for efficient data multicast over networks. It was shown in
our previous work that multicast video streaming benefits from NC for its throughput improvement. An algebraic
model is given to analyze the performance in this work. By exploiting the linear combination of video packets
along nodes in a network and the SVC video format, the system achieves path diversity automatically and enables
efficient video delivery to heterogeneous receivers in packet erasure channels. The application of network coding
can protect video packets against the erasure network environment. However, the rank defficiency problem of
random linear network coding makes the error concealment inefficiently. It is shown by computer simulation that
the proposed NC video multicast scheme enables heterogenous receiving according to their capacity constraints.
But it needs special designing to improve the video transmission performance when applying network coding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Distributed Video Coding (DVC) is a new paradigm in video coding, based on the Slepian-Wolf and Wyner-Ziv theorems. DVC offers a number of potential advantages: flexible partitioning of the complexity between the encoder and decoder, robustness to channel errors due to intrinsic joint source-channel coding, codec independent scalability, and multi-view coding without communications between the cameras. In this paper, we evaluate the performance of DVC in an error-prone wireless communication environment. We also present a hybrid spatial
and temporal error concealment approach for DVC. Finally, we perform a comparison with a state-of-the-art AVC/H.264 video coding scheme in the presence of transmission errors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The bulk of the video content available today over the Internet and over mobile networks suffers from many
imperfections caused during acquisition and transmission. In the case of user-generated content, which is typically
produced with inexpensive equipment, these imperfections manifest in various ways through noise, temporal
flicker and blurring, just to name a few. Imperfections caused by compression noise and temporal flicker are
present in both studio-produced and user-generated video content transmitted at low bit-rates. In this paper,
we introduce an algorithm designed to reduce temporal flicker and noise in video sequences. The algorithm takes
advantage of the sparse nature of video signals in an appropriate transform domain that is chosen adaptively based
on local signal statistics. When the signal corresponds to a sparse representation in this transform domain, flicker
and noise, which are spread over the entire domain, can be reduced easily by enforcing sparsity. Our results show
that the proposed algorithm reduces flicker and noise significantly and enables better presentation of compressed
videos.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Video applications on handheld devices such as smart phones pose a significant challenge to achieve high quality user
experience. Recent advances in processor and wireless networking technology are producing a new class of multimedia
applications (e.g. video streaming) for mobile handheld devices. These devices are light weight and have modest sizes,
and therefore very limited resources - lower processing power, smaller display resolution, lesser memory, and limited
battery life as compared to desktop and laptop systems. Multimedia applications on the other hand have extensive
processing requirements which make the mobile devices extremely resource hungry. In addition, the device specific
properties (e.g. display screen) significantly influence the human perception of multimedia quality. In this paper we
propose a saliency based framework that exploits the structure in content creation as well as the human vision system to
find the salient points in the incoming bitstream and adapt it according to the target device, thus improving the quality of
new adapted area around salient points. Our experimental results indicate that the adaptation process that is cognizant of
video content and user preferences can produce better perceptual quality video for mobile devices. Furthermore, we
demonstrated how such a framework can affect user experience on a handheld device.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The visual saliency map represents the most attractive regions in video. Automatic saliency map determination
is important in mobile video applications such as autofocusing in video capturing. It is well known that motion
plays a critical role in visual attention modeling. Motion in video consists of camera's motion and foreground
target's motion. In determining the visual saliency map, we are concerned with the foreground target's motion.
To achieve this, we evaluate the camera/global motion and then identify the moving target from the background.
Specifically, we propose a three-step procedure for visual saliency map computation: 1) motion vector (MV) field
filtering, 2) background extraction and 3) contrast map computation. In the first step, the mean value of the MV
field is treated as the camera's motion. As a result, the MV of the background can be detected and eliminated,
and the saliency map can be roughly determined. In the second step, we further remove noisy image blocks in the
background and provide a refined description of the saliency map. In the third step, a contrast map is computed
and integrated with the result of foreground extraction. All computations required in the our proposed algorithm
are low so that they can be used in mobile devices. The accuracy and robustness of the proposed algorithm is
supported by experimental results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Scorebox plays an important role in understanding contents of sports videos. However, the tiny scorebox may give
the small-display-viewers uncomfortable experience in grasping the game situation. In this paper, we propose a novel
framework to extract the scorebox from sports video frames. We first extract candidates by using accumulated intensity
and edge information after short learning period. Since there are various types of scoreboxes inserted in sports videos,
multiple attributes need to be used for efficient extraction. Based on those attributes, the optimal information gain is
computed and top three ranked attributes in terms of information gain are selected as a three-dimensional feature vector
for Support Vector Machines (SVM) to distinguish the scorebox from other candidates, such as logos and advertisement
boards. The proposed method is tested on various videos of sports games and experimental results show the efficiency
and robustness of our proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Emerging communications trends point to streaming video as a new form of content delivery. These systems are
implemented over wired systems, such as cable or ethernet, and wireless networks, cell phones, and portable game
systems. These communications systems require sophisticated methods of compression and error-resilience encoding to
enable communications across band-limited and noisy delivery channels. Additionally, the transmitted video data must
be of high enough quality to ensure a satisfactory end-user experience. Traditionally, video compression makes use of
temporal and spatial coherence to reduce the information required to represent an image. In many communications
systems, the communications channel is characterized by a probabilistic model which describes the capacity or fidelity
of the channel. The implication is that information is lost or distorted in the channel, and requires concealment on the
receiving end. We demonstrate a generative model based transmission scheme to compress human face images in video,
which has the advantages of a potentially higher compression ratio, while maintaining robustness to errors and data
corruption. This is accomplished by training an offline face model and using the model to reconstruct face images on the
receiving end. We propose a sub-component AAM modeling the appearance of sub-facial components individually, and
show face reconstruction results under different types of video degradation using a weighted and non-weighted version
of the sub-component AAM.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Mobile IPTV is a multimedia service based on wireless networks with interactivity and mobility. Under mobile IPTV
scenarios, people can watch various contents whenever they want and even deliver their request to service providers
through the network. However, the frequent change of the wireless channel bandwidth may hinder the quality of service.
In this paper, we propose an objective video quality measure (VQM) for mobile IPTV services, which is focused on the
jitter measurement. Jitter is the result of frame repetition during the delay and one of the most severe impairments in the
video transmission via mobile channels. We first employ YUV color space to compute the duration and occurrences of
jitter and the motion activity. Then the VQM is modeled by the combination of these three factors and the result of
subjective assessment. Since the proposed VQM is based on no-reference (NR) model, it can be applied for real-time
applications. Experimental results show that the proposed VQM highly correlates to subjective evaluation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Nowadays, most video material is coded using a non-scalable format. When transmitting these single-layer video bitstreams, there may be a problem for connection links with limited capacity. In order to solve this problem, requantization transcoding is often used. The requantization transcoder applies coarser quantization in order
to reduce the amount of residual information in the compressed video bitstream. In this paper, we extend a requantization transcoder for H.264/AVC video bitstreams with a rate-control algorithm. A simple algorithm is proposed which limits the computational complexity. The bit allocation is based on the bit distribution in the original video bitstream. Using the bit budget and a linear model between rate and quantizer, the new quantizer is calculated. The target bit rate is attained with an average deviation lower than 6%, while the rate-distortion performance shows small improvements over transcoding without rate control.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a network-aware macroblock (MB) coding mode decision method, which is both error resilient and
coding efficient. This method differs from traditional mode decision methods since MB mode decisions are made by
simultaneously taking into account: i) their rate-distortion (RD) cost and also ii) their impact on error resilience by
considering feedback information from the underlying network regarding current error characteristics. By doing so, the
amount of Intra coded MBs can be varied to better suit, in a cost efficient way, the current state of the network and,
therefore, further improve the decoded video quality for a given packet loss rate. The proposed approach outperforms a
network-aware version of the H.264/AVC reference software with cyclic MB Intra refresh, for typical test sequences
encoded at various bit rates and for several error conditions in terms of packet loss rate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Rate control is a very important tool for any kind of video coding and transmission. Even though there are many rate
control techniques designed for 2D video, not much work has been carried out in this area for 3D video. In this paper, a
novel rate control approach for 3D video based on color and depth maps is introduced. The aim of this rate control
algorithm is to keep the 3D video quality near constant. Since the 3D video is synthesized from color and depth maps
and the final quality of 3D video is more influenced by color sequence rather than depth maps, the qualities of both color
and depth maps are firstly varied until the target 3D quality is achieved with minimal bit rate. Subsequently, the PSNR
of both color and depth maps at the optimum point are maintained for the entire group of picture (GOP). The bit-allocation
problem is solved by introducing a Lagrangian optimization cost function. According to experimental results,
the proposed rate control technique is capable to adjust the bit rate allocated to the color and depth maps sequences
adaptively in order to maximize 3D video quality which is measured by PSNR of the synthesized left and right views
using reconstructed color and depth map sequences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Performance Evaluation in Image Coding Applications
Objective quality assessment of lossy image compression codecs is an important part of the recent call of the JPEG for Advanced Image Coding. The target of the AIC ad-hoc group is twofold: First, to receive state-of-the-art still image codecs and to propose suitable technology for standardization; and second, to study objective image quality metrics to evaluate the performance of such codes. Even tthough the performance of an objective metric is defined by how well it predicts the outcome of a subjective assessment, one can also study the usefulness of a metric in a non-traditional way indirectly, namely by measuring the subjective quality improvement of a codec that has been optimized for a specific objective metric. This approach shall be demonstrated here on the recently proposed HDPhoto format14 introduced by Microsoft and a SSIM-tuned17 version of it by one of the authors. We compare these two implementations with JPEG1 in two variations and a visual and PSNR optimal JPEG200013 implementation. To this end, we use subjective and objective tests based on the multiscale SSIM and a new DCT based metric.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The task of comparing the performance of different codecs is strictly related to the research in the field of objective
quality metrics. Even if several objective quality metrics have been proposed in literature, the lack of standardization in
the field of objective quality assessment and the lack of extensive and reliable comparisons of the performance of the
different state-of-the-art metrics often make the results obtained using objective metrics not very reliable. In this paper
we aim at comparing the performance of three of the existing alternatives for compression of digital pictures, i.e. JPEG,
JPEG 2000, and JPEG XR compression, by using different objective Full Reference metrics and considering also
perceptual quality metrics which take into account the color information of the data under analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traditionally, the performance of a image compression codec is measured by its rate-distortion curve, where a
distortion metric measures the fitness of the reconstructed image for a particular purpose. However, metrics
employed here address the needs of professional photography only partially: Images are typically not published
as taken, but are post-process to express the intent of the photographer. Compression codecs might degrade
the image visibly by an interaction of the codec-specific loss with the editing tools. In this work, we present
the idea of application specific metrics and measure the robustness of state of the art compression codecs under
prototypical image manipulations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Initiatives in Image Coding and Accessories and Applications
This paper discusses cascaded multiple encoding/decoding cycles and their effect on image quality for lossy image
coding designs. Cascaded multiple encoding/decoding is an important operating scenario in professional editing
industries. In such scenarios, it is common for a single image to be edited by several people while the image is
compressed between editors for transit and archival. In these cases, it is important that decoding followed by re-encoding
introduce minimal (or no) distortion across generations. A significant number of potential sources of distortion
introduction exist in a cascade of decoding and re-encoding, especially if such processes as conversion between RGB
and YUV color representations, 4:2:0 resampling, etc., are considered (and operations like spatial shifting, resizing, and
changes of the quantization process or coding format). This paper highlights various aspects of distortion introduced by
decoding and re-encoding, and remarks on the impact of these issues in the context of three still-image coding designs:
JPEG, JPEG 2000, and JPEG XR. JPEG XR is a draft standard under development in the JPEG committee based on
Microsoft technology known as HD Photo. The paper focuses particularly on the JPEG XR technology, and suggests
that the design of the draft JPEG XR standard has several quite good characteristics in regard to re-encoding robustness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Joint Photographic Experts Group (JPEG) baseline standard remains a popular and pervasive standard for
continuous tone, still image coding. The "J" in JPEG acknowledges its two main parent organizations, ISO
(International Organization for Standardization) and the ITU-T (International Telecommunications Union -
Telecommunication). Notwithstanding their joint efforts, both groups have subsequently (and separately) standardized
many improvements for still image coding. Recently, the ITU-T Study Group 16 completed the standardization for a
new entropy coder - called the Q15-coder, whose statistical model is from the original JPEG-1 standard. This new
standard, ITU-T Rec. T.851, can be used in lieu of the traditional Huffman (a form of variable length coding) entropy
coder, and complements the QM arithmetic coder, both originally standardized in JPEG as ITU-T T.81 | ISO/IEC
10918:1. In contrast to Huffman entropy coding, arithmetic coding makes no assumptions about an image's statistics,
but rather responds in real time. This paper will present a tutorial on arithmetic coding, provide a history of arithmetic
coding in JPEG, share the motivation for T.851, outline its changes, and provide comparison results with both the
baseline Huffman and the original QM-coder entropy coders. It will conclude with suggestions for future work.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper explores several encoder-side techniques aimed at improving the compression performance of encoding for
the draft JPEG XR standard. Though the syntax and decoding process are fixed by the standard, significant variation in
encoder design and some variation in decoder design are possible. For a variety of selected quality metrics, the paper
discusses techniques for achieving better compression performance according to each metric. As a basic reference
encoder and decoder for the discussion and modifications, the publically available Microsoft HD Photo DPK (Device
Porting Kit) 1.0, on which the draft JPEG XR standard was based, was used. The quality metrics considered include
simple mathematical objective metrics (PSNR and L∞) as well as pseudo-perceptual metrics (single-scale and multi-scale
MSSIM).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
High Dynamic Range (HDR) imaging support is one of the major features for the emerging draft JPEG XR standard.
JPEG XR is being standardized within the JPEG committee based on Microsoft technology known as HD Photo.
JPEG XR / HD Photo is primarily an integer-based coding technology design, accepting integer valued samples at the
encoder and producing integer valued samples at the decoder, with internal processing entirely in the integer space. Yet,
it can support compression of multiple HDR formats, including 16- and 32-bit float, 16-bit and 32-bit signed and
unsigned integer, and RGBE. Further, JPEG XR can enable lossless compression of some HDR formats such as 16-bit
signed and unsigned, 16-bit float and RGBE. This paper describes how HDR formats are handled in JPEG XR. It
examines in depth how these various HDR formats are converted to and from integer valued samples within the
JPEG XR codec, and the internal processing of these HDR formats. This paper describes how JPEG XR provides
flexible ways to compress HDR formats within the same codec framework as integer-valued formats, while maintaining
from the high compression efficiency and low computational complexity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we describe split field coding, an approach for low complexity, error-resilient entropy coding which splits
code words into two fields: a variable length prefix and a fixed length suffix. Once a prefix has been decoded correctly,
then the associated fixed length suffix is error-resilient, with bit errors causing no loss of code word synchronization and
only a limited amount of distortion on the decoded value. When the fixed length suffixes are segregated to a separate
block, this approach becomes suitable for use with a variety of methods which provide varying protection to different
portions of the bitstream, such as unequal error protection or progressive ordering schemes. Split field coding is
demonstrated in the context of a wavelet-based image codec, with examples of various error resilience properties, and
comparisons to the rate-distortion and computational performance of JPEG 2000.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Remote visualization of volumetric data has gained importance over the past few years in order to realize the full potential of tele-radiology. Volume rendering is a computationally intensive process, often requiring hardware acceleration to achieve real time visualization. Hence a remote visualization model that is well-suited for high speed networks would be to transmit rendered images from the server (with dedicated hardware) based on view point requests from clients. In this regard, a compression scheme for the rendered images is vital for efficient utilization of the server-client bandwidth. Also, the complexity of the decompressor should be considered so that
a low end client workstation can decode images at the desired frame rate. We present a scalable low complexity image coder that has good compression efficiency and high throughput.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automatic classification of medical images is a part of our computerised medical imaging programme to support
the pathologists in their diagnosis. Hyperspectral data has found its applications in medical imagery. Its usage
is increasing significantly in biopsy analysis of medical images. In this paper, we present a histopathological
analysis for the classification of colon biopsy samples into benign and malignant classes. The proposed study is
based on comparison between 3D spectral/spatial analysis and 2D spatial analysis. Wavelet textural features in
the wavelet domain are used in both these approaches for classification of colon biopsy samples. Experimental
results indicate that the incorporation of wavelet textural features using a support vector machine, in 2D spatial
analysis, achieve best classification accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years digital image processing techniques have become a very popular way of determining strains and
full-field displacements in the field of experimental mechanics due to advancements in image processing techniques
and also because the actual process of measurement is simpler and not intrusive compared to traditional sensor
based techniques. This paper presents a filtering technique which processes the polar components of the image
displacement fields. First, pyramidal gradient-based optical flow is calculated between blocks of each two frames
of a speckle image sequence while trying to compensate in the calculation small rotations and shears of the
image blocks. The polar components of the resulting motion
vectors - phase and amplitude - are then extracted.
Each of the motion vector angle values is smoothed temporally using a Kalman filter that takes into account
previously calculated angles located at the same spatial position in the motion fields. A subsequent adaptive
spatial filter is used to process both the temporally smoothed angles and amplitudes of the motion field. Finally,
test results of the proposed method being applied to a speckle image sequence that illustrates plastic materials
being subjected to uniaxial stress and to artificial data sets are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An advanced SRIE (super-resolution image enhancement) process has been developed for improving image
resolution, for enhancing image signal-to-noise ratio (SNR) and contrast, as well as for correcting image
blur and distortion caused by atmospheric turbulence. The SRIE process is composed of five sub-processes:
1) over-sampling process, 2) image registration process, 3) image averaging process, 4) high-pass filtering
process, and 5) histogram-equalization process. These five sub-processes are discussed in detail in this
paper. Performance of SRIE process was tested using a long wave infrared (LWIR) imagery and charge-coupled
device (CCD) camera video imagery.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Peripheral neuropathy can be caused by diabetes or AIDS or be a side-effect of chemotherapy. Fibered Fluorescence Microscopy (FFM) is a recently developed imaging modality using a fiber optic probe connected to a laser scanning unit. It allows for in-vivo scanning of small animal subjects by moving the probe along the tissue surface. In preclinical research, FFM enables non-invasive, longitudinal in vivo assessment of intra epidermal nerve fibre density in various
models for peripheral neuropathies. By moving the probe, FFM allows visualization of larger surfaces, since, during the
movement, images are continuously captured, allowing to acquire an area larger then the field of view of the probe. For analysis purposes, we need to obtain a single static image from the multiple overlapping frames. We introduce a mosaicing procedure for this kind of video sequence. Construction of mosaic images with sub-pixel alignment is indispensable and must be integrated into a global consistent image aligning. An additional motivation for the mosaicing is the use of overlapping redundant information to improve the signal to noise ratio of the acquisition, because the individual frames tend to have both high noise levels and intensity inhomogeneities. For longitudinal analysis, mosaics captured at different times must be aligned as well. For alignment, global correlation-based matching is compared with interest point matching. Use of algorithms working on multiple CPU's (parallel processor/cluster/grid) is imperative for use in a screening model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In mobile lens domain, the glass lens is often to be applied in high-resolution requirement situation; but the glass zoom
lens needs to be collocated with movable machinery and voice-coil motor, which usually arises some space limits in
minimum design. In high level molding component technology development, the appearance of liquid lens has become
the focus of mobile phone and digital camera companies. The liquid lens sets with solid optical lens and driving circuit
has replaced the original components. As a result, the volume requirement is decreased to merely 50% of the original
design. Besides, with the high focus adjusting speed, low energy requirement, high durability, and low-cost
manufacturing process, the liquid lens shows advantages in the competitive market. In the past, authors only need to
inspect the scrape defect made by external force for the glass lens. As to the liquid lens, authors need to inspect the state
of four different structural layers due to the different design and structure. In this paper, authors apply machine vision
and digital image processing technology to administer inspections in the particular layer according to the needs of users.
According to our experiment results, the algorithm proposed can automatically delete non-focus background, extract the
region of interest, find out and analyze the defects efficiently in the particular layer. In the future, authors will combine
the algorithm of the system with automatic-focus technology to implement the inside inspection based on the product inspective demands.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Although magnetic resonance imaging (MRI) is routinely used in clinical practice, long acquisition times limit its practical utility in many applications. To increase the data acquisition speed of MRI, parallel MRI (pMRI) techniques have recently been proposed. These techniques utilize multi-channel receiver arrays and are based on simultaneous acquisition of data from multiple receiver coils. Recently, a novel framework called Compressed Sensing (CS) was introduced. Since this new framework illustrates how signals can be
reconstructed from much fewer samples than suggested by the Nyquist theory, it has the potential to significantly accelerate data acquisition in MRI. This paper illustrates that CS and pMRI techniques can be combined and such joint processing yields results that are superior to those obtained from independent utilization of each technique.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A review of several recently-developed maximum likelihood template-based automatic target recognition (ATR) algorithms for extended targets in synthetic aperture radar (SAR) imagery data is presented. The algorithms are based on 'gradient' peaks, 'ceiling' peaks, edges, corners, shadows, and rectangular-fits. A weight-based Bayesian
maximum likelihood scheme to combine multiple template-based classifiers is presented. The feature weights are derived from prior recognition accuracies, i.e., confidence levels, achieved by individual template-based classifiers. Application of feature-based weights instead of target specific feature-based weights reduces the resulting ATR accuracy by only a small amount. Preliminary results indicate that (1) the ceiling peaks provide the most target-discriminating power, (2) inclusion of more target-discriminating features leads to higher classification accuracy. Dempster-Shaffer rule of combination is suggested as a potential alternative to the implemented Bayesian decision theory approach to resolve conflicting reports from multiple template-based classifiers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recently, there has been an increasing interest in using panoramic images in surveillance and target tracking
applications. With the wide availability of off-the-shelf web-based pan-tilt-zoom (PTZ) cameras and the advances of
CPUs and GPUs, object tracking using mosaicked images that cover a scene of 360° in near real-time has become a
reality. This paper presents a system that automatically constructs and maps full view panoramic mosaics to a cube-map
from images captured from an active PTZ camera with 1-25x optical zoom. A hierarchical approach is used in storing
and mosaicking multi-resolution images captured from a PTZ camera. Techniques based on scale-invariant local features
and probabilistic models for verification are used in the mosaicking process. Our algorithm is automatic and robust in
mapping each incoming image to one of the six faces of a cube with no prior knowledge of the scene structure. This
work can be easily integrated to a surveillance system that wishes to track moving objects in its 360° surrounding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Pan-tilt-zoom (PTZ) cameras are frequently used in surveillance applications as they can observe a much larger region of
the environment than a fixed-lens camera while still providing high-resolution imagery. The pan, tilt, and zoom
parameters of a single camera may be simultaneously controlled by online users as well as automated surveillance
applications. To accurately register autonomously tracked objects to a world model, the surveillance system requires
accurate knowledge of camera parameters. Due to imprecision in the PTZ mechanism, these parameters cannot be
obtained from PTZ control commands but must be calculated directly from camera imagery. This paper describes the
efforts undertaken to implement a real-time calibration system for a stationary PTZ camera. The approach continuously
tracks distinctive image feature points from frame to frame, and from these correspondences, robustly calculates the
homography transformation between frames. Camera internal parameters are then calculated from these homographies.
The calculations are performed by a self contained program that continually monitors images collected by the camera as
it performs pan, tilt, and zoom operations. The accuracy of the calculated calibration parameters are compared to ground
truth data. Problems encountered include inaccuracies in large orientation changes and long algorithm execution time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Historically, due to its uniqueness and immutability, fingerprints have been used as evidence in criminal cases and in
security identification as well as authorization verification applications. In this research, adaptive linear DWT
models are developed to describe the fingerprint features (DWT coefficients) to be identified. The proposed model
can be used to enhance the fingerprint characteristics identified from fingerprint images to improve recognition. This
adaptive model identification technique is then applied to degraded or incomplete fingerprint images to demonstrate
the efficacy of the technique under non-ideal conditions. The performance of the method is then compared to
previously published research by the authors on identification of degraded fingerprints using PCA-and ICA-based
features.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the past decade the use of digital data has increased significantly. The advantages of digital data are, amongst others, easy editing, fast, cheap and cross-platform distribution and compact storage. The most crucial disadvantages are the unauthorized copying and copyright issues, by which authors and license holders can suffer
considerable financial losses. Many inexpensive methods are readily available for editing digital data and, unlike analog information, the reproduction in the digital case is simple and robust. Hence, there is great interest in developing technology that helps to protect the integrity of a digital work and the copyrights of its owners. Watermarking, which is the embedding of a signal (known as the watermark) into the original digital data, is one method that has been proposed for the protection of digital media elements such as audio, video and images. In this article, we examine watermarking schemes for still images, based on selective quantization of the coefficients of a wavelet transformed image, i.e. sparse quantization-index modulation (QIM) watermarking. Different grouping schemes for the wavelet coefficients are evaluated and experimentally verified for robustness against several attacks. Wavelet tree-based grouping schemes yield a slightly improved performance over block-based
grouping schemes. Additionally, the impact of the deployment of error correction codes on the most promising configurations is examined. The utilization of BCH-codes (Bose, Ray-Chaudhuri, Hocquenghem) results in an improved robustness as long as the capacity of the error codes is not exceeded (cliff-effect).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The amount of information available about urban traffic from aerial video imagery is extremely high. Here we discuss the collection of such video imagery from a helicopter platform with a low-cost sensor, and the post-processing used to correct radial distortion in the data and register it. The radial distortion correction is accomplished using a Harris model. The registration is implemented in a two-step process, using a globally applied polyprojective correction model followed by a fine scale local displacement field adjustment. The resulting cleaned-up data is sufficiently well-registered to allow subsequent straight-forward vehicle tracking.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Various problems in different domains are related to the operation of the Human Visual System (HVS). This is notably the case when interest turns to the driver's visual perception and road safety in general. As we know, 90% of the information used by a driver is supplied to him by his visual system. That's why the Laboratoire Régional des Ponts et Chaussées d'Angers has developed a human visual signal capture system. This system is based on a CCD video camera
calibrated for luminance and chrominance which are the two physical magnitudes that make up the human visual signal. The first phase in the development is to adapt the spectral response of the optical system to the spectral characteristics of the human eye. Once the camera is adapted the second stage is calibration. This consists in transforming the graylevels registered by the camera into values of luminance and chrominance. The innovative feature of this system is the ability to record a single image containing the entire set of information carried by the visual signal. The framerate is 10 frames
per second which allows the camera to be carried inside a vehicle and to record images of road scenes exactly as the driver has actually perceived them. After recording, vision algorithms can be applied to these images in order to reproduce the physiological processes which take place on the retina or in the brain. These tools can then be used to evaluate visibility levels of roadway infrastructure, of public lighting, or the saliency of objects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The reconstruction of turbulence-affected images has been an active research topic in the field of astronomical imaging. Many approaches have been proposed in the literature. Recently, researchers have
extended the methods to the recovery of long-path territorial natural scene surveillance, which is affected even more by air turbulence. Some approaches from astronomical imaging also work well in the
latter problem. However, although these methods have involved statistics, such as a statistical model of atmospheric turbulence or the probability distribution of photons forming an image, they have not taken account of the statistical properties of natural scenes observed in long-path horizontal imagery. Recent research by others has made use of the fact that a real world image generally has a sparse distribution of its derivatives. In this paper, we investigate algorithms with such a constraint imposed during the restoration of turbulence-affected images. This paper proposes an iterative, blind deconvolution algorithm that follows a registration and
averaging method to remove anisoplanatic warping in a time sequence of degraded images. The use of a sparse prior helps to reduce noise, produce sharper edges and remove unwanted artifacts in the
estimated image for the reason that it pushes only a small number of pixels to have non-zero (or large) derivatives. We test the new algorithm with simulated and natural data and experiments show that it
performs well.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a new method for unsupervised video segmentation based on mean shift clustering in spatio-temporal
domain. The main novelties of the proposed approach are dynamic temporal adaptation of clusters
due to which the segmentation evolves quickly and smoothly over time. The proposed method consists of a short
initialization phase and an update phase. The proposed method significantly reduce the computation load for
the mean shift clustering. In the update phase only the positions of relatively small number of cluster centers are
updated and new frames are segmented based on the segmentation of previous frames. The method segments
video in real-time and tracks video objects effectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper introduces a new approach for 3D shape recovery based on Discrete Wavelet Transform (DWT) and Principal
Component Analysis (PCA). Contrary to computing focus quality locally by summing all values in a 2D or 3D window
obtained after applying a focus measure, a vector consisting of seven neighboring pixels is populated for each pixel in
the image volume. Each vector in the sequence is decomposed by using DWT and then PCA is applied on the energies of
detailed coefficients to transform the data into eigenspace. The first feature, as it contains maximum variation, is
employed to compute the depth. Though DWT and PCA are both computationally expensive transformations, the
reduced data elements and algorithm iterations have made the proposed method efficient. The new approach was
experimented and its performance was compared with other methods by using synthetic and real image sequences. The
evaluation is gauged on the basis of unimodality, monotonicity and resolution of the focus curve. Two other global
statistical metrics Root Mean Square Error (RMSE) and correlation have also been applied for synthetic image sequence.
Experimental results demonstrate the effectiveness and the robustness of the new method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents the use of Genetic Algorithm as a search method for focus measure in Shape From Focus (SFF). Previous methods compute focus value for each pixel locally by summing all values within a small window. This summation is a good approximation of focus quality, but is not optimal one. The Genetic Algorithm is used
as a fine tuning process in which a measure of best focus is used as the fitness function corresponding to motion parameter values which make up each gene. The experimental results show that the proposed method performs better than previous algorithms such as Sum of the Modified Laplacian(SML), Grey Level Variance(GLV) and
Tenenbaum Focus Measure. The results are compared using root mean square error(RMSE) and correlation. The experiments are conducted using objects simulated cone, real cone and TFT-LCD color filter1 to evaluate performance of the proposed algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose shape recovery method for measuring protrusions on LCD Color filter in TFT-LCD manufacturing process. We use 3-D Focus Measure operator to find focused points. Then we find the lens step that maximizes the sum of the Focus Measure. In order to reduce the computational complexity, we apply the successive focus measure update algorithm. The 3-D shape of the object can be easily estimated from the best-focused points. Experiments are conducted on both synthetic and real images to evaluate performance of the proposed algorithms. The experimental results show that our new method is faster than previous method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Resizing an image is an important technique in image processing. When increasing the size of the image, some details of the image are smeared or blurred with common interpolation techniques, such as bilinear interpolation. Edges do not appear as sharp as the original image. In addition, when performing interpolation with high magnification, blocking effects start to appear. In this paper, we present an approach that performs interpolation in the direction of the edges, rather than just horizontal and vertical direction. A wavelet preprocessing is used to extract edge direction information
before performing interpolation in multiple directions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In medical area, many image segmentation methods have been proposed for the segmentation of the medical image. However, there are few multiscale segmentation methods that can segment the medical image so that various components within the image could be separated at multiple resolutions or scales. In this paper, we present a new algorithm for multiscale segmentation of high-resolution computed tomography (HRCT) images. With this new segmentation technique, we demonstrate that it is possible to segment the HRCT images into its
various components at multiple scales hence separating the information available in HRCT image. We show that the HRCT image can be segmented such that we get separate images for bones, tissues, lungs and anatomical structures within lungs. The processing is done in frequency domain using the Discrete Cosine Transform (DCT).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Estimation of surface roughness is an important parameter for many applications including optics, polymers,
semiconductor etc. In this paper, we propose to estimate surface roughness using one of the 3D shape recovery optical
passive methods, i.e., shape from focus. Three-dimensional shape recovery from one or multiple observations is a
challenging problem of computer vision. The objective of shape from focus is to calculate the depth map. That depth
map can further be used in techniques and algorithms leading to recovery of three dimensional structure of object which
is required in many high level vision applications. The same depth map can also be used for surface roughness
estimation. One of the requirements, of researchers is to quickly compare the samples being fabricated based on various
measures including surface roughness. However, the high cost involved in estimation of surface roughness limits its
extensive and exhaustive usage. Therefore, we propose an inexpensive and fast method based on Shape From Focus
(SFF). We use two microscopic test objects, i.e., coin and TFT-LCD cell for estimating the surface roughness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The objective of 3D shape recovery using focus is to estimate depth map of the scene or object based on best focus points
from camera lens. In Shape From Focus (SFF), the measure of
focus - sharpness - is the crucial part for final 3D shape
estimation. The conventional methods compute sharpness by applying focus measure operator on each 2D image frame of
the image sequence. However, such methods do not reflect the accurate focus levels in an image because the focus levels for
curved objects require information from neighboring pixels in the adjacent frames too. To address this issue, we propose a
new method based on focus adjustment which takes the values of the neighboring pixels from the adjacent image frames that
have the same initial depth as of the center pixel and then it
re-adjusts the center value accordingly. Experimental results
show that the proposed technique generates better shape and takes less computation time in comparison to previous SFF
methods based on Focused Image Surface (FIS) and dynamic programming.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we propose an algorithm for reducing the complexity of motion estimation module in standard video
compression applications. In several video coding standards, motion estimation becomes the most time consuming sub
system such as H.264/AVC. Therefore recently research focuses on the development of novel algorithms to save
computations with minimal effects over the video distortion. Since real world video sequences usually exhibit a wide
range of motion content, from uniform to random, adaptive algorithms have revealed as the most robust general purpose
solutions.
In this paper a simple, computationally efficient and robust scheme for multi pattern motion estimation algorithm based
on the nature of error surfaces has been proposed. A combination of spatial and temporal predictors has been used for
multiple initial search center prediction, determination of magnitude of motion and search pattern selection. The multiple
initial predictors help to identify the absolute zero motion blocks and true location of global minimum based on the
characteristic of error surfaces. Hence the final predictive search center selected is closer to the global minimum. This
results in smaller number of search steps to reach minimum location and increases the computation speed. Further
computational speed up has been obtained by considering half stop technique and threshold for minimum distortion
point. The computational complexity of the proposed algorithm is drastically decreased (average speedup ~ 43%)
whereas the image quality measured in terms of PSNR (~.20 dB loss with respect to Full Search) also shows results close
to Full Search algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Common restoration techniques perform signal processing using a single observed image. In this paper we show that the accuracy of restoration could be significantly increased if at least three observed degraded images obtained from a microscanning camera are used. It is assumed that the degraded images contain information about an original image, linear degradation and illumination functions, and additive sensor's noise. Using spatial information from camera, a set of equations and objective function are formed. By solving the system of equations with the help of an iterative algorithm, the original image can be recovered. Computer simulation results presented and discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the modern information society, networks are getting faster, costs are getting lower, and displays are getting clearer.
Today, just about anyone can easily use precise, dynamic, image distribution systems in their everyday life. Now, the
question is how to give the benefits of network systems to the local community, as well as to each individual.This study
was designed to use communication with realistic sensations to examine the effectiveness of remote individual
counseling intervention in reducing depression, anxiety and stress in child-rearing mothers. Three child-rearing mothers
residing in the city of Osaka each received one session of remote counseling intervention. The results showed an
alleviation of stress related to child-rearing, i.e., the reduction in state anxiety, depression and subjective stress related to
child-rearing. Moreover, an experimental demonstration employed a HyperMirror system capable of presenting visual
and auditory images similar to reality, in order to provide the counselees with realistic sensations. While the voice
communication environment was poor, the remote counseling allowed for the communication of sensory information,
i.e., skinship that communicated information related to assurance/peace of mind, and auditory information, i.e., a
whispering voices in which signals of affection were transmitted; the realistic sensation contributed to a reduction in
stress levels. The positive effects of the intervention were confirmed through a pre and post intervention study. The
results suggested the need to conduct future studies to confirm the mid- and long-term improvements caused by the
intervention, as well as the need to improve the voice transmission environment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The vectorial shearing interferometer includes a pair of wedge prisms as a shearing system. Perfect alignment of the shearing system is crucial for the optimal detection and analysis of asymmetrical wave fronts. This paper describes a recognition algorithm for optical misalignment detection and prisms orientation based in the intensity pattern obtained in the calibration process. The key of the present algorithm is the comparison of a reference intensity pattern, against a sheared interferogram that depends on the wedge prism position. First, an optimum phase only filter is obtained from a set of reference images with the objective to discriminate between different phase changes. Then, the optimal filter is used in a digital correlator, which results in a simple and robust calibration system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A new rotational invariance computational filter is presented. The filter was applied to a problem image, in this case, an image of 256 by 256 pixels of black background with a centered white Arial letter. The complete alphabet is represented in those images. The image is rotated one degree by one degree until complete 360 degrees; hence, for each alphabet letter we are generating 360 images. To achieve the rotational invariance, first of all, a translational invariance is applied and then a 256 by 256 binary mask of concentric circular rings of three pixels of thickness and separation is used. The sum of the information in the circular rings represents the signature of the image. The average of the signature of the 360 images of a selected letter is the filter used to compute the phase correlation with all alphabet letter and their rotated images. The confidence level is
calculated by the mean value with two standard errors (2SE) of those 360 correlation values for each letter. The confidence level shows that this system works efficiently on the discrimination between letters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work we use non linear composite filters in object recognition, even when they have rotation, scale and noise
distortions. We generated 936 images of the letters E, F, H, P and B. The images consisted of these letters scaled from 70% to 130% and rotated 360°. The maximum number of images supported by these filters was determined by a numerical experiment. This was done by generating filters with different amount of images each. We have images at 13
scales and each scale with 72 different angles, tests were done to two different kinds of filters, one where all the scales were present and we add more angles to increase the number of images, and another where all of the angles were present and more scales were added to increase the number of images. Considering a system confidence level of at least 80%, the maximum number of images allowed by the filter is around 216. In one type of filter we have the letter rotated 360°. We found a "rotation problem", since circles were introduced in the Fourier plane, in other words first order Bessel functions were introduced in the image spectrum, which creates complications when working with images that also have circles in their spectrum. Due to this we propose a segmented filter which breaks the circular symmetry. Non-linear composite filters can recognize the target in presence of distortions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Classical correlation-based methods for pattern recognition are very sensitive to geometrical distortions of objects to be
recognized. Besides, most captured images are corrupted by noise. In this work we use novel nonlinear composite filters
for distortion-invariant pattern recognition. The filters are designed with an iterative algorithm to reject a background
noise and to achieve a desired discrimination capability. The recognition performance of the proposed filters is compared
with that of linear composite filters in terms of noise robustness and discrimination capability. Computer simulation
results are provided and discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes an image recognition system designed to inspect the standards quality of electronic assemblies. The essence of the present algorithm is the location of electronics components, at the input image, that disrupt the acceptance requirements for the manufacture of printed circuit board assemblies, which have been adopted by association connecting electronics industries. To this end, image processing modules, based on a nonlinear composite filter are employed with the objective to discriminate between the electronics components that meet the acceptance condition and those that are in defect condition. The proposed recognition system is based on nonlinear composite filter, which is obtained from a training set of reference images. Then, the optimal filter is used in a digital correlator, which results in a simple and robust inspection system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This work presents the development and utilization of vectorial signatures filters obtained from the application of properties of the scale and Fourier transform for images recognition. The filters were applied to different input scene, which consisted in the 26 letters of the alphabet. Each letter is an image of 256 × 256 pixels of black background with a centered white Arial letter. The image was rotated 360 degrees in increment of 1o and scaled from 70% to 130% in increment of 0.5%. In order to find a new invariant correlation digital system we obtained two unidimensional vector after to achieve different mathematical transformation in the target as well as the input scene. To recognize a target, signatures were compared, calculating the Euclidean distance between the target and the input scene; then, confidence levels are obtained. The results demonstrate that this system has a good performance to discriminate between letters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An encoding system was implemented purely in software for sending SVC video multicast to mobile devices. The SVC
encoder is a multi-layer, multi-threaded parallel-GOP encoder. It is capable of running in real-time with an Intel®
Xeon® 5160 Processor dual-core platform in Scalable Baseline profile (H.264/AVC Baseline QVGA@25fps at 250 kbps
to SVC Scalable Baseline VGA@25fps resolution at 1 Mbps. Real-time video encoding is accomplished with aggressive
Single Instruction Multiple Data assembly code optimizations and advanced algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The use of digital holography for plankton detection is discussed, namely - for evaluation of the plankton particles shapes, orientation, and 3-D location. Algorithms are considered for improve the quality of the reconstructed plankton holographic image. The algorithms are based on the digital hologram pre-processing before reconstruction.
Algorithm for recording and reconstruction of digital holographic video is described. The possibility is shown for reconstruction either fixed plane of volume contained plankton, or reconstruction the plane moving together with plankton species.
Experimental results as well as their discussion are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The wavefront coding imaging technology is a good approach of extending DOF (depth of field) of imaging
optical system with an identical exit pupil. A phase mask being introduced in the exit pupil of the optical system makes
the MTF of the system to be defocus-independent. As a result, the DOF of the system is extended without a loss of
resolution. How much this technique can extend the DOF of optical system? It was shown by experimental and
simulation results of some papers that a wavefront coding imaging system can make the DOF raise an order of
magnitude. However, there is no theoretical analysis of DOF extension of wavefront coding technique up to now. In this
paper, a special condition under which MTF of wavefront coding system is defocus-independent is discovered by
investigating the OTF of the wavefront coding imaging system with cubic phase mask. Based on the condition, an
expression of the DOF of wavefront coding imaging system is obtained. The conclusion is verified by the simulation
results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the field of machine vision, determination of the tridimensional information from images is of great importance. One of the most used ways for achieving this task is based on stereo vision. Shape-from-stereo method computes a depth map from a stereo-pair of images. Using several images of different perspectives, tridimensional representation of a test scene can be obtained. The key point in this process is the stereo matching. Stereo matching determines the position change of a pixel in one image of the stereo-pair with regard to another image. In this paper a stereo matching algorithm based on dynamic programming is suggested for 3D shape recovery from textured images. The proposed technique uses the information of a two-dimensional small neighborhood to solve the corresponding problem.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, the study results on the fish age estimation issues, as an application of the method of digital image
processing, presented by analysis from otolith image. First, a kind of Artificial Neural Network (ANN), the Pulse-
coupled Neural Networks (PCNN), is proposed, and used to identify the different summer or winter year-rings patterns.
Second, a well-founded approach, using morphological features, is brought forwand to automatically detect the nucleus
within the otolith images. Finally, the Morphological Method is used to deduce the fish's age. The results of this paper
maybe significant in fishery research, and the methods can be used in other biologic features identify fields.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An invariant correlation digital system using a nonlinear filter is presented. The invariance to position, rotation and scale
of the target is achieved via Fourier transform, mapping polar and Scale transform, respectively. We analyzed the
performance of this filter with different nonlinearities k values according to the peak-to-correlation energy (PCE) metric.
We found experimentally the best k value for rotation and scale and the confidence levels of the filters. The filter was
applied to the complete alphabet letters where each letter is a problem image of 256x256 pixels in size. The results are
presented and show a better performance when they are compared with linear filters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes an algorithm for generating a video signature based on an ordinal measure. Current methods which
use a measure of temporal ordinal rank are robust to many transformations but can only detect the entire query video, not
a segment of the query, while methods which use local features may be more robust to certain transformations but less
robust to excessive noise. The proposed algorithm incorporates region-based spatial information while maintaining a
strong robustness to noise, different resolutions, illumination shifts and video file formats. In our method, a frame is first
divided into blocks. For each pixel in a block, a slice (a binary image computed based on the comparison between the
greyscale intensity of each pixel in the frame and the reference pixel) is generated. The slices of all the pixels in a block
are then added component-wise to obtain a metaslice for the block. In order to compute the distance between any two
frames, the Euclidean distance between corresponding metaslices of the two frames is computed to obtain the
metadistance between two blocks. Summing the metadifferences over all blocks and normalizing give the final measure
of distance between the two frames. To improve the speed of the algorithm, keyframes are first downsized and pixel
intensity values are represented by the average of a small block. A table of frame differences between two sets of
keyframes from two video sequences is constructed and then converted to a similarity matrix using a threshold. The
longest chain of consecutive similar keyframes is found and this produces the best matching video sequence between the
two videos. This algorithm is capable of taking into account differences between videos at various scales and is useful
for finding duplicate or modified copies of a query video in a database. Preliminary experimental results are encouraging
and demonstrate the potential of the proposed algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
3-D profilometry, it is necessary to locate the in-focus region of the image and to reconstruct the best 3D
profile. A series of images are collected on-the-fly. The contrast and the intensity indices of each region of
each image are calculated in the scanning procedure. The proposed method will reconstruct 3D shape from
moving platform. The proposed method is applied on some preliminary experiments and it shows that the
large-scale 3-D profile reconstruction can be realized.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.