Automated cellular nuclei segmentation is often an important step for digital pathology and other analyses such as computer aided diagnosis. Most existing machine learning methods for microscopy image analysis require postprocessing such as watershed transform or connected component analysis to obtain instance segmentation from semantic segmentation results. This becomes prohibitively expensive computationally especially when used with 3D microscopy volumes. UNet Transformers for Instance Segmentation (UNETRIS) is proposed to eliminate the postprocessing steps necessary for nuclei instance segmentation in 3D microscopy images. UNETRIS, an extension of UNETR which utilizes a transformer as the encoder in the successful “U-shaped” network design for the encoder-decoder structure of U-Net, uses additional transformers to separate individual instances of cell nuclei directly during the inference step without the need for expensive postprocessing steps. UNETRIS does not require but can use manual ground truth annotations for training. UNETRIS was tested on a variety of microscopy volumes collected from multiple regions of organ tissues.
KEYWORDS: Microscopy, Luminescence, Optical spheres, 3D image processing, Image segmentation, Surgery, Data modeling, 3D modeling, Network architectures, Binary data
Purpose: Fluorescence microscopy visualizes three-dimensional subcellular structures in tissue with two-photon microscopy achieving deeper penetration into tissue. Nuclei detection, which is essential for analyzing tissue for clinical and research purposes, remains a challenging problem due to the spatial variability of nuclei. Recent advancements in deep learning techniques have enabled the analysis of fluorescence microscopy data to localize and segment nuclei. However, these localization or segmentation techniques would require additional steps to extract characteristics of nuclei. We develop a 3D convolutional neural network, called Sphere Estimation Network (SphEsNet), to extract characteristics of nuclei without any postprocessing steps.
Approach: To simultaneously estimate the center locations of nuclei and their sizes, SphEsNet is composed of two branches to localize nuclei center coordinates and to estimate their radii. Synthetic microscopy volumes automatically generated using a spatially constrained cycle-consistent adversarial network are used for training the network because manually generating 3D real ground truth volumes would be extremely tedious.
Results: Three SphEsNet models based on the size of nuclei were trained and tested on five real fluorescence microscopy data sets from rat kidney and mouse intestine. Our method can successfully detect nuclei in multiple locations with various sizes. In addition, our method was compared with other techniques and outperformed them based on object-level precision, recall, and F1 score. Our model achieved 89.90% for F1 score.
Conclusions: SphEsNet can simultaneously localize nuclei and estimate their size without additional steps. SphEsNet can be potentially used to extract more information from nuclei in fluorescence microscopy images.
Microscopy image analysis can provide substantial information for clinical study and understanding of biological structures. Two-photon microscopy is a type of fluorescence microscopy that can image deep into tissue with near-infrared excitation light. We are interested in methods that can detect and characterize nuclei in 3D fluorescence microscopy image volumes. In general, several challenges exist for counting nuclei in 3D image volumes. These include “crowding” and touching of nuclei, overlapping of nuclei, and shape and size variances of the nuclei. In this paper, a 3D nuclei counter using two different generative adversarial networks (GAN) is proposed and evaluated. Synthetic data that resembles real microscopy image is generated with a GAN and used to train another 3D GAN that counts the number of nuclei. Our approach is evaluated with respect to the number of groundtruth nuclei and compared with common ways of counting used in the biological research. Fluorescence microscopy 3D image volumes of rat kidneys are used to test our 3D nuclei counter. The accuracy results of proposed nuclei counter are compared with the ImageJ’s 3D object counter (JACoP) and the 3D watershed. Both the counting accuracy and the object-based evaluation show that the proposed technique is successful for counting nuclei in 3D.
Biomedical imaging when combined with digital image analysis is capable of quantitative morphological and physiological characterizations of biological structures. Recent fluorescence microscopy techniques can collect hundreds of focal plane images from deeper tissue volumes, thus enabling characterization of three-dimensional (3-D) biological structures at subcellular resolution. Automatic analysis methods are required to obtain quantitative, objective, and reproducible measurements of biological quantities. However, these images typically contain many artifacts such as poor edge details, nonuniform brightness, and distortions that vary along different axes, all of which complicate the automatic image analysis. Another challenge is due to “multitarget labeling,” in which a single probe labels multiple biological entities in acquired images. We present a “jelly filling” method for segmentation of 3-D biological images containing multitarget labeling. Intuitively, our iterative segmentation method is based on filling disjoint tubule regions of an image with a jelly-like fluid. This helps in the detection of components that are “floating” within a labeled jelly. Experimental results show that our proposed method is effective in segmenting important biological quantities.
KEYWORDS: Image segmentation, Microscopy, Image analysis, Two photon excitation microscopy, Digital filtering, Error analysis, 3D image enhancement, 3D modeling
Fluorescence microscopy is used to image multiple subcellular structures in living cells which are not readily
observed using conventional optical microscopy. Moreover, two-photon microscopy is widely used to image
structures deeper in tissue. Recent advancement in fluorescence microscopy has enabled the generation of large
data sets of images at different depths, times, and spectral channels. Thus, automatic object segmentation is
necessary since manual segmentation would be inefficient and biased. However, automatic segmentation is still
a challenging problem as regions of interest may not have well defined boundaries as well as non-uniform pixel
intensities. This paper describes a method for segmenting tubular structures in fluorescence microscopy images
of rat kidney and liver samples using adaptive histogram equalization, foreground/background segmentation,
steerable filters to capture directional tendencies, and connected-component analysis. The results from several
data sets demonstrate that our method can segment tubular boundaries successfully. Moreover, our method has
better performance when compared to other popular image segmentation methods when using ground truth data
obtained via manual segmentation.
Segmentation is a fundamental step in quantifying characteristics, such as volume, shape, and orientation of cells and/or tissue. However, quantification of these characteristics still poses a challenge due to the unique properties of microscopy volumes. This paper proposes a 2D segmentation method that utilizes a combination of adaptive and global thresholding, potentials, z direction refinement, branch pruning, end point matching, and boundary fitting methods to delineate tubular objects in microscopy volumes. Experimental results demonstrate that the proposed method achieves better performance than an active contours based scheme.
Optical microscopy poses many challenges for digital image analysis. One particular challenge includes correction
of image artifacts due to respiratory motion from specimens imaged in vivo. We describe a non-rigid registration
method using B-splines to correct these motion artifacts. Current attempts at non-rigid medical image
registration have typically involved only a single pair of images. Extending these techniques to an entire series
of images, possibly comprising hundreds of images, is presented in this paper. Our method involves creating a
uniform grid of control points across each image in a stack. Each control point is manipulated by optimizing a
cost function consisting of two parts: a term to determine image similarity, and a term to evaluate deformation
grid smoothness. This process is repeated for all images in the stack. Analysis is evaluated using block motion
estimation and other visualization techniques.
Multi-photon microscopy has provided biologists with unprecedented opportunities for high resolution imaging deep
into tissues. Unfortunately deep tissue multi-photon microscopy images are in general noisy since they are acquired at
low photon counts. To aid in the analysis and segmentation of such images it is sometimes necessary to initially enhance
the acquired images. One way to enhance an image is to find the maximum a posteriori (MAP) estimate of each pixel
comprising an image, which is achieved by finding a constrained least squares estimate of the unknown distribution. In
arriving at the distribution it is assumed that the noise is Poisson distributed, the true but unknown pixel values assume a
probability mass function over a finite set of non-negative values, and since the observed data also assumes finite values
because of low photon counts, the sum of the probabilities of the observed pixel values (obtained from the histogram of
the acquired pixel values) is less than one. Experimental results demonstrate that it is possible to closely estimate the
unknown probability mass function with these assumptions.
KEYWORDS: Computer programming, Video, Error control coding, Error analysis, Video coding, Video compression, Visualization, Distortion, Visual communications, Data transmission
Compressed video is very sensitive to channel errors. A few bit losses can stop the entire decoding process.
Therefore, protecting compressed video is always necessary for reliable visual communications. In recent years,
Wyner-Ziv lossy coding has been used for error resilience and has achieved improvement over conventional
techniques. In our previous work, we proposed an unequal error protection algorithm for protecting data elements
in a video stream using a Wyner-Ziv codec. We also presented an improved method by adapting the parity
data rates of protected video information to the video content. In this paper, we describe a feedback aided error
resilience technique, based on Wyner-Ziv coding. By utilizing feedback regarding current channel packet-loss
rates, a turbo coder can adaptively adjust the amount of parity bits needed for correcting corrupted slices at the
decoder. This results in an effcient usage of the data rate budget for Wyner-Ziv coding while maintaining good
quality decoded video when the data has been corrupted by transmission errors.
Compressed video is very sensitive to channel errors. A few bit losses can derail the entire decoding process. Thus, protecting compressed video is imperative to enable visual communications. Since different elements in a compressed video stream vary in their impact on the quality of the decoded video, unequal error protection can be used to provide efficient protection. This paper describes an unequal error protection method for protecting data elements in a video stream, via a Wyner--Ziv encoder that consists of a coarse quantizer and a Turbo coder based lossless Slepian--wolf encoder. Data elements that significantly impact the visual quality of decoded video, such as modes and motion vectors as used by H.264, are provided more parity bits than coarsely quantized transform coefficients. This results in an improvement in the quality of the decoded video when the transmitted sequence is corrupted by transmission errors, than obtained by the use of equal error protection.
Due to the ease with which digital data can be manipulated and due to the ongoing advancements that have brought us closer to pervasive computing, the secure delivery of video and images has become a challenging problem. Despite the advantages and opportunities that digital video provide, illegal copying and distribution as well as plagiarism of digital audio, images, and video is still ongoing. In this paper we describe two techniques for securing H.264 coded video streams. The first technique, SEH264Algorithm1, groups the data into the following blocks of data: (1) a block that contains the sequence parameter set and the picture parameter set, (2) a block containing a compressed intra coded frame, (3) a block containing the slice header of a P slice, all the headers of the macroblock within the same P slice, and all the luma and chroma DC coefficients belonging to the all the macroblocks within the same slice, (4) a block containing all the ac coefficients, and (5) a block containing all the motion vectors. The first three are encrypted whereas the last two are not. The second method, SEH264Algorithm2, relies on the use of multiple slices per coded frame. The algorithm searches the compressed video sequence for start codes (0x000001) and then encrypts the next N bits of data.
A new embedded rate scalable and resolution scalable, lossless image compression technique based on Highly Scalable - Set Partitioning in Hierarchical Trees (HS-SPIHT) is developed. The new method, called Resolution Scalable-SPIHT (RS-SPIHT), encodes the regions in the next higher resolution that are most likely to be significant along with the current resolution. The advantage of doing so is two fold: (1) the resulting data stream is rate scalable - spatially scalable bit stream, that is at any given data rate the next higher image resolution can be achieved which is not possible using SPIHT, and (2) the largest possible recoverable resolution at a particular date rate has a higher quality than could be previously reached using HS-SPIHT.
Due to the ease with which digital data can be manipulated and due to the ongoing advancements that have brought us closer to pervasive computing, the secure delivery of video and images has become a challenging problem. Despite the advantages and opportunities that digital video provide, yet, the illegal copying and distribution as well as plagiarism of digital audio, images, and video is still ongoing. In this paper we describe a technique for securing digital images based on combining image compression and encryption. The underlying idea in this is to combine encryption with rate scalable image compression such as EZW or SPIHT.
EZW or SPIHT compresses image in such a way that the present coding state of a wavelet coefficient is dependent on the current coding of the coefficient’s parent and on prior coding states of the coefficient and its parent. A consequence of this within a bit stream of length N is that if we obscure the first M leading bits of the bit stream and leave the remaining N-M bits unchanged then the trailing N-M bits reveal “very little information” about the original image. Thus, by exploiting this inter-data dependency we can selectively encrypt part of the data stream and hence, reduce the computational burden and bandwidth requirement for transmitting images securely.
KEYWORDS: Distortion, Microchannel plates, Quantization, Video coding, Video, Signal to noise ratio, Electroluminescence, Computer programming, Data modeling, Error analysis
Leaky prediction layered video coding (LPLC) incorporates a scaled
version of the enhancement layer in the motion compensated prediction (MCP) loop, by using a leaky factor between 0 and 1, to
balance between coding efficiency and error resilience performance. In this paper, we address the theoretic analysis of LPLC using two different approaches: the one using rate distortion theory and the one using quantization noise modeling. In both approaches, an alternative block diagram of LPLC is first developed, which significantly simplifies the theoretic analysis. We consider two scenarios of LPLC, with and without prediction drift in the enhancement layer, and obtain two sets of rate distortion functions in closed form for both scenarios. We evaluate both closed form expressions, which are shown to conform with the operational results.
Low complexity video encoding shifts the computational complexity
from the encoder to the decoder, which is developed for applications characterized by scarce resources at the encoder. Wyner-Ziv and Slepian-Wolf theorems have provided the theoretic bases for low complexity video encoding. In this paper, we propose a low complexity video encoding using B-frame direct modes. We extend the direct-mode idea that was originally developed for encoding B frames, and design new B-frame direct modes. Motion vectors are obtained for B-frames at the decoder and transmitted back to the encoder using a feedback channel, hence no motion estimation is needed at the encoder to encoding any B frame. Experimental results implemented by modifying ITU-T H.26L software show that our approach can obtain a competitive rate distortion performance compared to that of conventional high complexity video encoding.
Leaky prediction layered video coding (LPLC) partially includes the enhancement layer in the motion compensated prediction loop, by using a leaky factor between 0 and 1, to balance the coding efficiency and error resilience performance. In this paper, rate distortion functions are derived for LPLC from rate distortion theory. Closed form expressions are obtained for two scenarios of LPLC, one where the enhancement layer stays intact and the other where the enhancement layer suffers from data rate truncation. The rate distortion performance of LPLC is then evaluated with respect to different choices of the leaky factor, demonstrating that the theoretical analysis well conforms with the operational results.
Two types of scalabilities exist in current scalable video streaming schemes: (1) nested scalability, in which different representations (i.e., descriptions) of each frame are generated using layered scalable coding and have to be decoded in a fixed sequential order, and (2) parallel scalability, which is used in multiple description coding (MDC) where different descriptions are mutually refinable and independently decodable. In this paper, we present a general framework that includes both scalabilities and demonstrate the similarity between the leaky prediction based layered coding and an MDC scheme that uses motion compensation. Based on this framework, we introduce nested scalability into each description of the MDC stream and propose a fine granularity scalability (FGS) based MDC approach. We also develop a scalable video coding structure that is characterized by the dual-leaky prediction to balance the trade-off between coding efficiency and the error resilience performance of the coded bit stream.
Color embedded image compression is investigated by means of a set of core experiments that seek to evaluate the advantages of various color transformations, spatial orientation trees and the use of monochrome embedded coding schemes such as EZW and SPIHT. In order to take advantage of the interdependencies of the color components for a given color space, two new spatial orientation trees that relate frequency bands and color components are investigated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.