PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 13034, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to evolving chip architectures that support artificial intelligence (AI) on the edge, producing models that achieve top performance on small devices with limited resources has become a priority. The challenge is to construct superior deep neural networks by finding viable solutions in the face of memory limitations, computational bottlenecks, latency requirements, and power restrictions. Past research has primarily been focused on improving models by optimizing only a subset of the latency, memory, and power consumption aspects. We propose that possible solutions can be found using genetic algorithms by posing the above variables as part of a multi-reward optimization problem. Further, few methods have incorporated the device itself in the training process efficiently. In this paper, we construct an initial population of viable network layers, and their respective weights as genes for the proposed genetic algorithm. We then track these layers’ viability through generations by constructing neural networks and monitoring an amalgamated score, which we term as influence, representing multiple metrics. To facilitate device-specific optimization, all network layers are constructed while maintaining the requirement that portability to a target device remains possible – failure to meet this requirement results in removal from the population. Furthermore, upon construction, networks are also validated to ensure portability prior to evaluation. The proposed genetic algorithm is utilized as a neural architecture search (NAS) strategy where we optimize constructed networks’ performance in latency, accuracy, and memory on the target device. For this work, the ultra-low-power MAXIM 78002 framework is utilized to define layer and network constraints; results against the CIFAR10 computer vision dataset are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we proposed a template matching technique using deep learning to match pairs of wide fields of view and narrow field of view infrared images. The Deep Learning network has a similar structure with the Atrous Spatial Pyramid Pooling (ASPP) module and both wide and narrow fields of view images are input to the same network, so the network weights are shared. Our experiments used the Galaxy S20 (Qualcomm Snapdragon 865) platform and show that the trained network has higher matching accuracy than other template matching techniques and is fast enough to be used in real time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recent advancements in volumetric displays have opened doors to immersive, glass-free holographic experiences in our everyday environments. This paper introduces Holoportal, a real-time, low-latency system that captures, processes, and displays 3D video of two physically separated individuals as if they are conversing face-to-face in the same location. The evolution of work in multi-view immersive video communication from a Space-Time-Flow (STF) media technology to real time Holoportal communication is also discussed. Multiple cameras at each location capture subjects from various angles, with wireless synchronization for precise video-frame alignment. Through this technology we envision a future where any living space can transform into a Holoportal with a wireless network of cameras placed on various objects, including TVs, speakers, and refrigerators.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper tackles the problem of mixed Gaussian and impulsive noise suppression in color images. The proposed method comprises two essential steps. Firstly, we detect impulsive noise through an approach based on the concept of digital path exploring the local pixel neighborhood. Each pixel is assigned a cost of a path connecting the boundary of a local processing window with its center. When the central pixel exhibits a high value of the path with lowest cost, it is identified as an impulse. To achieve this, we use a thresholding procedure for detecting corrupted pixels. Analyzing the distribution of minimum path costs, we employ the k-means technique to classify pixels into three distinct categories: those nearly undistorted, those corrupted by Gaussian noise, and those affected by impulsive noise. Subsequently, we employ the Laplace interpolation technique to restore the impulsive pixels — a fast and effective method yielding satisfactory denoising results. In the second step, we address the residual Gaussian noise using the Non-Local Means method, which selectively considers pixels from the local window that have not been flagged as impulsive. The experimental results confirm that our proposed hybrid method consistently yields superior outcomes compared to state-of-the-art denoising techniques. Moreover, its computational complexity remains low, rendering it suitable for real-time applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Assessing smile genuineness from video sequences is a vital topic concerned with recognizing facial expression and linking them with the underlying emotional states. There have been a number of techniques proposed underpinned with handcrafted features, as well as those that rely on deep learning to elaborate the useful features. As both of these approaches have certain benefits and limitations, in this work we propose to combine the features learned by a long short-term memory network with the features handcrafted to capture the dynamics of facial action units. The results of our experiments indicate that the proposed solution is more effective than the baseline techniques and it allows for assessing the smile genuineness from video sequences in real-time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recent diffusion-based generative models employ methods such as one-shot fine-tuning an image diffusion model for video generation. However, this leads to long video generation times and suboptimal efficiency. To resolve this long generation time, zero-shot text-to-video models eliminate the fine-tuning method entirely and can generate novel videos from a text prompt alone. While the zero-shot generation method greatly reduces generation time, many models rely on inefficient cross-frame attention processors, hindering the diffusion model’s utilization for real-time video generation. We address this issue by introducing more efficient attention processors to a video diffusion model. Specifically, we use attention processors (i.e. xFormers, FlashAttention, and HyperAttention) that are highly optimized for efficiency and hardware parallelization. We then apply these processors to a video generator and test with both older diffusion models such as Stable Diffusion 1.5 and newer, high-quality models such as Stable Diffusion XL. Our results show that using efficient attention processors alone can reduce generation time by around 25%, while not resulting in any change in video quality. Combined with the use of higher quality models, this use of efficient attention processors in zero-shot generation presents a substantial efficiency and quality increase, greatly expanding the video diffusion model’s application to real-time video generation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Modern wafer inspection systems in Integrated Circuit (IC) manufacturing utilize deep neural networks. The training of such networks requires the availability of a very large number of defective or faulty die patterns on a wafer called wafer maps. The number of defective wafer maps on a production line is often limited. In order to have a very large number of defective wafer maps for the training of deep neural networks, generative models can be utilized to generate realistic synthesized defective wafer maps. This paper compares the following three generative models that are commonly used for generating synthesized images: Generative Adversarial Network (GAN), Variational Auto-Encoder (VAE), and CycleGAN which is a variant of GAN. The comparison is carried out based on the public domain wafer map dataset WM‐811K. The quality aspect of the generated wafer map images is evaluated by computing the five metrics of peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), inception score (IS), Fréchet inception distance (FID), and kernel inception distance (KID). Furthermore, the computational efficiency of these generative networks is examined in terms of their deployment in a real-time inspection system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the advent of deep learning, there has been an ever-growing list of applications to which Deep Convolutional Neural Networks (DCNNs) can be applied. The field of Multi-Task Learning (MTL) attempts to provide optimizations to many-task systems, improving performance by optimization algorithms and structural changes to these networks. However, we have found that current MTL optimization algorithms often impose burdensome computation overheads, require meticulously labeled datasets, and do not adapt to tasks with significantly different loss distributions. We propose a new MTL optimization algorithm: Batch Swapping with Multiple Optimizers (BSMO). We utilize single-task labeled data to train on a multi-task hard parameter sharing (HPS) network through swapping tasks at the batch level. This dramatically increases the flexibility and scalability of training on an HPS network by allowing for per-task datasets and augmentation pipelines. We demonstrate the efficacy of BSMO versus current SOTA algorithms by benchmarking across contemporary benchmarks & networks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Lower resolutions and a lack of distinguishing features in large satellite imagery datasets make identification tasks challenging for traditional image classification models. Vision Transformers (ViT) address these issues by creating deeper spatial relationships between image features. Self attention mechanisms are applied to better understand not only what features correspond to which classification profile, but how the features correspond to each other within each separate category. These models, integral to computer vision machine learning systems, depend on extensive datasets and rigorous training to develop highly accurate yet computationally demanding systems. Deploying such models in the field can present significant challenges on resource constrained devices. This paper introduces a novel approach to address these constraints by optimizing an efficient Vision Transformer (TinEVit) for real-time satellite image classification that is compatible with ST Microelectronics AI integration tool, X-Cube-AI.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recent years have witnessed great progress in the development of deep neural networks (DNNs), which has led to growing interest in deploying DNNs in resource-constrained environments such as network-edge and edge-cloud environments. To address objectives of efficient DNN inference, numerous approaches as well as specialized platforms have been designed for inference acceleration. The flexibility and diverse capabilities offered by these approaches and platforms result in large design spaces with complex trade-offs for DNN deployment. Relevant objectives involved in these trade-offs include inference accuracy, latency, throughput, memory requirements, and energy consumption. Tools that can effectively assist designers in deriving efficient DNN configurations for specific deployment scenarios are therefore needed. In this work, we present a design space exploration framework for this purpose. In the proposed framework, DNNs are represented as dataflow graphs using a lightweight-dataflowbased modeling tool, and schedules (strategies for managing processing resources across different DNN tasks) are modeled in a formal, abstract form using dataflow methods as well. The dataflow-based application and schedule representations are integrated systematically with a multiobjective particle swarm optimization (PSO) strategy, which enables efficient evaluation of implementation trade-offs and derivation of Pareto fronts involving alternative deployment configurations. Experimental results using different DNN architectures demonstrate the effectiveness of our proposed framework in exploring design spaces for DNN deployment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper showcases the integration of several technologies to develop an Unmanned Traffic Management System that enables the centralized coordination of unmanned ground and aerial vehicles. By addressing the need for safe and efficient autonomous vehicle operations, this system contributes to improved safety and reliability in various applications, from civilian to military contexts. Furthermore, the exploration of dynamic vision-based drone detection methods adds valuable insights into the field of real-time image processing and deep learning. In that perspective, a more in-depth computer vision development is been presented. The system’s core components include the Swarmie, an unmanned ground vehicle (UGV) guided through a wireless mesh network through radio frequency enabled (RF) markers. Simultaneously, an unmanned aircraft vehicle (UAV) is controlled by an IoT cloud platform that sends coordinates to an embedded system. The integration of wireless communication and navigation markers is a proof to the importance of circuitry and microcontrollers in developing RF markers to enhance navigation. One of the primary objectives of this research is the development of a dynamic vision-based drone detection system for sense and avoid actions. Two different methods are explored for drone detection. The first method utilizes the Viola & Jones algorithm. The second method involves the You Only Look Once (YOLO) RealTime Object Detection algorithm. The performance of these methods is evaluated, providing insights into the effectiveness of each approach in real-time drone detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Surgical image and video applications using endoscopic datasets have been actively investigated to develop advanced surgical assistant systems. These applications are particularly crucial for understanding surgical scenes during procedures. Specifically, segmentation techniques allow for identifying anatomical structures and surgical instruments, while quality control methods refine surgical techniques, and action recognition aids in discerning surgical steps. A significant improvement in performance across different downstream tasks has been achieved due to the advancements in deep neural networks and the expansive training dataset available. However, the exploration of surgical action recognition remains limited. Existing methods face challenges in real-world settings, mainly due to the lack of adaptability in a dynamic imaging environment. In this study, we present a framework for surgical action recognition in endoscopic datasets by leveraging video-masked autoencoders (VideoMAE), which has shown promise in video dataset analysis with minimal datasets. Additionally, we incorporate a temporal data augmentation technique to represent diverse imaging conditions and resolve the issue of using single-source data with low quality. For our experiments, we utilize VideoMAE v2 pre-trained on Unlabeled Hybrid datasets and fine-tune the model on the CholecT45 dataset for validation. Our proposed method shows the effectiveness of using the VideoMAE structure with focal loss, particularly for action recognition tasks in surgical scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image-based Large Language Models (LLMs) are AI models that can understand the captured images and generate textual content based on the analysis of images or visual data. Incorporating the LLMs for assessing water quality, pressure, and environmental conditions can help analyze historical data and predict potential risks and threats in underwater environments. This can improve the intervention of autonomous underwater vehicles ( AUV) and remotely operated vehicles ( ROV) during emergencies where the visual data must be interpreted to make informed decisions. While LLMs are primarily associated with processing and generating text, they can be integrated with images through a process known as multimodal learning, where text and images are combined for tasks that involve both modalities. Implementing such frameworks is challenging when deployed in low-power microcontrollers primarily used in monitoring systems. This research proposes evaluating multimodal tokens to enable edge computing in bio-inspired robots to monitor the underwater environment. This can help break down large real-time videos into tokens of text-based instructions associated with the description of images. The mini-robots will transmit the collected “tokens” to the nearest AUV or ROV, where the image-based LLM will be deployed. We propose to evaluate this image-based LLM in our NVIDIA Jetson Nano-based AUV. In the proposed architecture, the mini-robots can move along the length of the water column to capture images of the underwater environment. Our proposed model is evaluated to generate texts for boat and fish images. This proposed framework with integrated image-based tokens can significantly reduce the response time and data traffic in underwater real-time monitoring systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Seagrass ecosystems play a vital role in maintaining marine biodiversity and ecological balance, making their monitoring and management essential. This study proposes a novel approach for clustering of seagrass images into three distinct age categories: young, medium, and old, using deep learning and unsupervised machine learning techniques. VGG-16 convolutional neural networks (CNN) are employed for feature extraction from the seagrass images, followed by K-means clustering to categorize the image samples into the specified age groups. The implemented methodology begins with the collection and annotation of a diverse seagrass image dataset, including samples from various locations and conditions. Images are first pre-processed to ensure consistent size and quality. To enable real-time capabilities, an optimized VGG-16 CNN is then fine-tuned on the annotated dataset to learn discriminative features that capture age-related characteristics of the seagrass leaves within the constraints of real-time image processing. After feature extraction, the Kmeans clustering algorithm is applied to group the images into young, medium, and old categories based on the learned features. The clustering results are evaluated using quantitative metrics such as the silhouette score and Davies-Bouldin index, demonstrating the effectiveness of the proposed method in capturing age-related patterns in seagrass imagery. This research contributes to the field of seagrass monitoring by providing an automated and real-time approach to classifying seagrass images into age categories which can facilitate more accurate assessments of seagrass health and growth dynamics. A real-time capability would equip decision-makers with a valuable tool for immediate responses and support the sustainable management of seagrass ecosystems in various marine environments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the global agricultural landscape, dairy cattle are of paramount economic importance because they produce essential products like milk, butter, and cheese. Ensuring their well-being and sustaining production necessitate effective feed management. Traditional methods for assessing feed quality are labor-intensive and destructive, posing risks of resource wastage and production interruptions. This study addresses this challenge by introducing a novel approach to classify feed materials and Total Mixed Rations (TMR) for dairy cattle. Utilizing RGB images and a dual-branch neural network based on the VGG16 architecture, the model achieved 86.72% accuracy in feed categorization. This automates real-time feed analysis, offering high precision, and lays the foundation for further advancements in precision animal production through deep learning in practical agricultural contexts.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the context of the advancing digital landscape, there is a discernible demand for robust and defensible methodologies in addressing the challenges in multi-class image classification. The evolution of intelligent systems mandates swift evaluations of environmental variables to facilitate decision-making within an authorized workflow. Recognizing the imperative role of ensemble models, this paper undertakes an exploration into the efficacy of layered Convolutional Neural Network (CNN) architectures for the nuanced task of multi-class image classification, specifically applied to traffic signage recognition in the dynamic context of a moving vehicle. The research methodology employs a YOLO (You Only Look Once) model to establish a comprehensive training and testing dataset. Subsequently, a stratified approach is adopted, leveraging layered CNN architectures to categorize clusters of objects and, ultimately, extrapolate the pertinent speed limit values. Our endeavor aims to elucidate the procedural framework for integrating CNN models, providing insights into their accuracy within the application domain.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In lensfree microscopy, the sample is placed close to the image sensor without any imaging lenses in between. This configuration provides the benefits of low cost and compact hardware assemblies as well an ultra-large field of view and a high space-bandwidth product. Image focusing and reconstruction are performed computationally, relying on algorithms such as pixel superresolution and the angular spectrum method of propagation. We present recent progress on improving the resolution to characterize nanoscale materials, protein sensing, ultrafine air pollution monitoring, and high resolution incoherent (fluorescent) imaging.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The capacity to track eyeball movements beneath closed eyelids holds significant promise across commercial, security, and medical domains. Our work presents a simple, effective, non-invasive method for closed-eye eyeball motion detection using videos. This method relies on detecting the temporal variations in eyelid shadows cast by the eyeball bulge in the subject’s video following face alignment and video registration. We key points used for face alignment and video registration are the detected facial landmarks. The eye movement signals derived using the presented technique closely correlate with simultaneously captured electrooculography (EOG) signals. We showcase the potential of fusing the eyeball movement signals obtained thus with data acquired from ultra-wideband (UWB) or millimeter-wave (mmWave) Doppler sensors. This fusion, supported by machine learning-based algorithms, enables the classification of sleep stages in a smart sleep chair that is designed to enhance and extend good quality sleep.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Neural networks continue to be vulnerable to adversarial attacks. In addressing this, two primary defensive strategies have emerged based on network composition: those targeting individual networks and those grounded in ensemblebased strategies. While merging both strategies is ideal, on edge devices, a combined defense that scales with ensemble size could result in significant inference latency increases. Many of the ensemble based approaches in the literature offer robust protection while necessitating large ensemble size. To address the challenge of deploying ensemble based adversarial defenses on edge device, this work introduces the Categorized Ensemble networks (CAEN) training methodology. CAEN’s foundation lies in two observations: 1. Under adversarial conditions, models frequently confuse conceptually contrastive classes with each-other and 2. Assigning soft label values to contrastive class pairs enhances network resilience against adversarial attacks. Building on these insights, CAEN initially identifies contrastive classes under Projected Gradient Decent (PGD) based attacks through a confusion matrix. It then formulates the problem of pairing contrastive classes across ensemble members as an Integer Linear Program (ILP). Following this, CAEN applies soft label assignments to identified contrastive class pairs during the ensemble training process. By averaging the outputs of the independently trained ensemble members, a CAEN ensemble is formed. CAEN training surpasses current state-of-theart robust ensemble training techniques, achieving an average 1.11X/1.57X improvement in robust accuracy against whitebox and black-box attacks. Additionally, by limiting ensemble members to just two networks, CAEN training produces ensembles that offer robust protection while reducing runtime FLOPs by 16% compared to SOTA, making CAEN ensembles suitable for deployment on edge devices.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Transportation Security Administration (TSA) is responsible for air travel safety within the United States but faces a significant challenge. Recent studies show an alarming 80% failure rate in threat detection at most screening locations, primarily due to heavy reliance on human judgment. With more than 50,000 TSA officers screening over 2 million passengers daily, it is essential to address this issue promptly, as evidenced by a 42% increase in complaints related to TSA screening over the past year, according to the US Department of Transportation's monthly air travel consumer report. These complaints underscore the pressing need for improved threat detection procedures in airport security. In response to these critical concerns, we present a novel and efficient neural network classification algorithm as a potential solution, specifically designed to mitigate the identified shortcomings in the TSA's threat detection capabilities. By reducing the overall complexity of larger models, through the application of advanced layers and an artfully configured structure, we achieve a solution that maximizes efficiency without compromising accuracy. This research endeavors to bridge the gap between the demands of contemporary threat detection and the practical limitations of airport security procedures. By introducing a tailored solution, we aim to significantly enhance the effectiveness of threat detection, thereby contributing to the overall safety and security of air travel. This work represents a pivotal step in addressing the critical issues associated with the TSA's current screening methods and underscores the potential of advanced technology to bolster the reliability of threat detection systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Modern digital color cameras depend on Color Filter Arrays (CFA) for capturing color information. The majority of the commercial CFAs are designed by hand with different physical and application-specific considerations. The available machine learning (ML)-based CFA learning architectures dismiss the considerations of a physical camera device. This study aims to develop an alternative approach to jointly learn binary Color Filter Arrays (CFA) in a deep learning-based filtering-demosaicing pipeline. The proposed approach provides higher reconstruction performance than the compared hand-designed filters while learning physically applicable CFAs. This paper includes the learned binary CFAs for various color configurations and training data size, their analysis with common reconstruction metrics, and a short discussion on future works.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the rapidly evolving realm of machine learning, the integration of the Open Neural Network Exchange (ONNX) has become increasingly significant, particularly in image processing applications. This study conducts a comprehensive examination of the role of ONNX in enhancing image processing efficiency. Utilizing a diverse range of peer-reviewed articles, conference papers, and technical reports, we quantitatively evaluate ONNX's adoption, impact, and innovation trajectory within the field. Our findings reveal a consistent rise in ONNX's use for various image processing tasks, attributable to its versatility in integrating with multiple machine learning frameworks and harnessing hardware-specific optimizations. A notable observation from our study is the positive relationship between ONNX implementation and reduced image processing times, evident in applications like real-time object detection and high-resolution image synthesis. Our analysis also highlights the growing collaborations between academic and industrial sectors in advancing ONNX capabilities, underlining its pivotal role in future imaging solutions. In summary, this paper emphasizes ONNX's transformative influence in the field of image processing. The ongoing developments and active community engagement point towards a promising future for more rapid and efficient image processing methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.