PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 1303501 (2024) https://doi.org/10.1117/12.3037153
This PDF file contains the front matter associated with SPIE Proceedings Volume 13035, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 1303503 (2024) https://doi.org/10.1117/12.3013119
Machine tools (MT) are critical to modern manufacturing. They allow precision manufacturing of complex components at high volumes. MTs are large capital investments that require maintenance and monitoring to ensure they remain in good working condition. To best achieve reliability and high performance it is necessary to implement condition monitoring, fault detection and predictive maintenance. One solution for implementing these is by utilizing data-driven methods such as neural networks. One issue with any data-driven method is that they require large quantities of labeled data. This is especially difficult for fault detection applications as faults tend to be rare, and as a result, the datasets tend to be very imbalanced. One emerging technology that can be implemented to solve this issue is the digital twin (DT). DTs provide a solution for data collection, modeling, simulation, and smart services. One way that DTs can be used is to generate synthetic data which can be used for various data-driven methods. This data can be validated on a test bench to ensure its accuracy before implementation in production. Synthetic data generated from the DT model can be used to create a dataset for various condition monitoring DT services. This study involved the use of simulation software to generate synthetic data which was used to implement a fault detection algorithm for preload loss monitoring. This method has been demonstrated to be effective at identifying the current operating conditions of the system. This method shows promise to improve reliability and performance in MTs, and could be adapted to condition monitoring in other systems such as vehicles, buildings, and power generation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 1303504 https://doi.org/10.1117/12.3013877
This research presents an in-depth investigation into the application of Convolutional Neural Networks (CNN) for acoustic remote sensing on multi-rotor UAVs, with a specific focus on detecting large vehicles on the ground. We used a multi-rotor UAV equipped with a custom audio recorder, calibrated microphones, and uniquely designed microphone mounts for data collection. We explored optimal features for training our CNN, experimented with different normalization techniques, and examined their synergy between various activation functions. The study further explores the fine-tuning of model parameters to enhance detection performance and reliability. The outcome was a CNN model, trained with a combination of both real-world and synthetic data, demonstrating a proficient capability in target detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 1303505 https://doi.org/10.1117/12.3014093
Vision-based object detection remains an active research area in both civilian and military domains. While the state-of-the-art relies on deep learning techniques, these demand large multi-context datasets. Given the rarity of open-access datasets for military applications, alternative methods for data collection and training dataset creation are essential. This paper presents a novel vehicle signature acquisition based on indoor 3D-scanning of miniature military vehicles. By using 3D projections of the scanned vehicles as well as off-the-shelves computer aided design models, relevant image signatures are generated showing the vehicle from different perspectives. The resulting context-independent signatures are enhanced with data augmentation techniques and used for object detection model training. The trained models are evaluated by means of aerial test sequences showing real vehicles and situations. Results are compared to state-of-the art methodologies. Our method is shown to be a suitable indoor solution for training a vehicle detector for real situations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 1303506 (2024) https://doi.org/10.1117/12.3014722
In Machine Learning (ML) based autonomous technology research (ATR), it is crucial to have large and reliable data sets to train deep learning-based classifiers and implement object detection methods. For air-to-ground ATR, the gold standard, obtained by limited and expensive controlled field collections, is measured data. However, carefully curated research data intended to test or isolate specific qualities of object detection (low-light, heavy shadow, cloud cover, obscurations, and other operational use cases) is still difficult to obtain. For advanced research problems, synthetic data generated in simulated environments meets both quantity and quality requirements. Most synthetic data is generated in a software simulated environment using various rendering techniques, limited by available computational resources. Among the many types of synthetic data is scale model data, generated by 3D printing and imaging the same 3D Computer-Aided Design (CAD) models at a reduced scale (1:285 or 1:125) on a turntable in controlled environmental conditions. We present a workflow for the rapid generation of ATR Training Data customized to isolate and identify features of interest in advanced research problems. Publicly accessible data is available upon request to lead author.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 1303507 (2024) https://doi.org/10.1117/12.3013502
It is becoming more common for search and track algorithms to need to account for observations that can arise from both radio frequency (RF) and electro-optical infrared (EO/IR) measurements in the same scenario. Development of novel algorithms for search and track applications requires measured or synthetically generated data, and frequently only considers one or the other. Historically, the synthetic data generation process for RF and EO/IR developed independent of one another and did not share a common sense of “truth” about the environment or the objects within the simulation. This lack of a common framework with a consistent environment and platform representation between the two sensing modalities can lead to errors in the algorithm development process. For example, if the RF data assumed one set of atmospheric conditions while the EO/IR assumed a different set of conditions, the RF modality could over or under perform compared to the EO/IR. To address this issue, Georgia Tech Research Institute (GTRI) has developed General High-fidelity Omni-Spectrum Toolbox (GHOST) as a plug and play simulation architecture to generate high-fidelity EO/IR and RF synthetic data for search and track algorithm development. Additionally, because GHOST is plug and play, it can potentially provide synthetic or measured result to developmental algorithms without needing to change the algorithm’s interface. This paper presents the efforts GTRI has put into extending GHOST into the RF domain and presents sample results from search and track algorithm development. It also presents a look forward into how GHOST is being adapted to accommodate measured data alongside synthetic data for improved algorithm development.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Jeffrey Kerley, Derek T. Anderson, Andrew R. Buck, Brendan Alvey
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 1303508 (2024) https://doi.org/10.1117/12.3013460
Enabling abstraction within a programming language has benefits. However, the associated complexity of such abstractions often pose a steep learning curve for users. While user interfaces or visual scripting can help alleviate this to some extent, they often lack readability and reproducibility, especially as complexity grows. Herein, we explore the use of Large Language Models (LLMs) as an intermediate between the nuanced, syntactical programming language and the natural (human) way of describing the world. Our formal language LSCENE is a way to procedurally generate realistic synthetic scenes in the Unreal Engine. This tool is useful because artificial intelligence (AI) typically requires large volumes of labeled data with variety. To generate such data for training and evaluating AI, we employ an LLM to interpret and sample LSCENEs that are compatible with user input. Through this approach, we demonstrate a reduction in abstract complexity, elimination of syntax complexity, and the ability to tackle complex tasks in LSCENE using natural language. To illustrate our findings, we present three experiments with quantitative results focused on spatial reasoning, along with a more intricate qualitative example of automatically generating an environment for a specific biome.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 1303509 (2024) https://doi.org/10.1117/12.3013375
Automatic object detection is increasingly important in the military domain, with potential applications including target identification, threat assessment, and strategic decision-making processes. Deep learning has become the standard methodology for developing object detectors, but obtaining the necessary large set of training images can be challenging due to the restricted nature of military data. Moreover, for meaningful deployment of an object detection model, it needs to work in various environments and conditions, in which prior data acquisition might not be possible. The use of simulated data for model development can be an alternative for real images and recent work has shown the potential for training a military vehicle detector using simulated data. Nevertheless, fine-grained classification of detected military vehicles, using training on simulated data, remains an open challenge.
In this study, we develop an object detector for 15 vehicle classes, containing similar appearing types, such as multiple battle tanks and howitzers. We show that combining few real data samples with a large amount of simulated data (12,000 images) leads to a significant improvement in comparison with using one of these sources individually. Adding just two samples per class improves the mAP to 55.9 [±2.6], compared to 33.8 [±0.7] when only simulated data is used. Further improvements are achieved by adding more real samples and using Grounding DINO, a foundation model pretrained on vast amounts of data (mAP = 90.1 [±0.5]). In addition, we investigate the effect of simulation variation, which we find is important even when more real samples are available.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350A (2024) https://doi.org/10.1117/12.3011890
Optical whole sky imaging (WSI) is a valuable tool for atmospheric intelligence across a diverse array of applications including solar radiation prediction and microenvironment characterization. In this work, we introduce standalone algorithms and software to render clouds of different sizes, shapes, and base heights with the goal of developing datasets suitable for machine learning applications such as cloud position and base height estimation, and resultant ground shadow prediction. Three-dimensional voxel cloud textures are generated with thresholded fractal noise and rendered with twostep ray tracing. We compare real and synthetic imagery for fisheye camera views and predict what two-camera stereo pairs might look like when the WSI cameras become operational at the Multipurpose Sensor Array (MSA) in White Sands, NM.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350B (2024) https://doi.org/10.1117/12.3013911
Machine learning algorithms are capable of processing image-based scenes, detecting and recognizing embedded targets. This has been demonstrated by data scientists and computer vision engineers, but performant algorithms must be robustly trained to successfully complete such a complex task. This typically requires a large set of training data on which the algorithm can base statistical predictions. Electro-optical infrared (EO/IR) remote sensing applications necessitate a substantial image database with suitable variation for adept learning to occur. For human detection/recognition applications diversity in clothing ensembles, pose, season, times of day, sensor platform perspectives, scene backgrounds and weather conditions can be included in training image sets to ensure sufficient input variety. However, acquiring such a diverse image set from measured sources can be a challenge, especially in thermal infrared wavebands (e.g., MWIR and LWIR). Alternatively, generating synthetic imagery with appropriate features is possible and has been shown to perform well, but a careful methodology must be followed if robust training is to be accomplished. In this work, MuSES and CoTherm are used to generate synthetic EO/IR remote sensing imagery of various human dismounts with a range of clothing, poses and environmental factors. The performance of a YOLO (“you only look once”) deep learning algorithm is studied, and sensitivity conclusions are discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350D (2024) https://doi.org/10.1117/12.3012393
In this paper, we propose to enhance action recognition accuracy by leveraging synthetic data and domain adaptation. Specifically, We achieve this through the creation of a synthetic dataset mimicking the Multi-View Extended Video with Activities (MEVA) dataset and the introduction of a multi-modal model for domain adaptation. This synthetic-to-real adaptation approach improves recognition accuracy by leveraging the synthetic data to enhance model generalization. Firstly, we focus on creating and utilizing synthetic datasets generated through a high-fidelity physically-based rendering system. The sensor simulation incorporates domain randomization and photo-realistic rendering to reduce the domain gap between the synthetic and real data, effectively addressing the persistent challenges of real data scarcity in action recognition.
Complementing the synthetic dataset generation, we leverage the multi-modal models in the synthetic-to-real adaptation experiments that utilize RGB images and skeleton features. Our experiments show that even relatively straightforward techniques, such as synthetic data pre-training, provide improvements to the models. Our work highlights the effectiveness of the approach and its practical applications across various domains, including surveillance systems, threat identification, and disaster response.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350E (2024) https://doi.org/10.1117/12.3013318
Effectively recognizing human actions from variant viewpoints is crucial for successful collaboration between humans and robots. Deep learning approaches have achieved promising performance in action recognition given sufficient well-annotated data from the real world. However, collecting and annotating real-world videos can be challenging, particularly for rare or violent actions. Synthetic data, on the other hand, can be easily obtained from simulators with fine-grained annotations and variant modalities. To learn domain-invariant feature representations, we propose a novel method to distill the pseudo labels from the strong mesh-based action recognition model into a light-weighted I3D model. In this way, the model can leverage robust 3D representations and maintain real-time inference speed. We empirically evaluate our model on the Mixamo→Kinetics dataset. The proposed model achieves state-of-the-art performance compared to the existing video domain adaptation methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350F (2024) https://doi.org/10.1117/12.3013530
In this work, we explore the possibility of using synthetically generated data for video-based gesture recognition with large pre-trained models. We consider whether these models have sufficiently robust and expressive representation spaces to enable “training-free” classification. Specifically, we utilize various state-of-the-art video encoders to extract features for use in k-nearest neighbors classification, where the training data points are derived from synthetic videos only. We compare these results with another training-free approach— zero-shot classification using text descriptions of each gesture. In our experiments with the RoCoG-v2 dataset, we find that using synthetic training videos yields significantly lower classification accuracy on real test videos compared to using a relatively small number of real training videos. We also observe that video backbones that were fine-tuned on classification tasks serve as superior feature extractors, and that the choice of fine-tuning data has a substantial impact on k-nearest neighbors performance. Lastly, we find that zero-shot text-based classification performs poorly on the gesture recognition task, as gestures are not easily described through natural language.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350G (2024) https://doi.org/10.1117/12.3013436
We present an application of synthetic datasets to a pose estimation problem called “Microwave Dish Mensuration”. Dish mensuration is the task of determining a microwave dish pointing angle from photogrammetry. Pose estimation presents a difficult case for machine learning, as it is onerous to collect a measured dataset capturing all possible configurations of an object or collection of objects; however, the ease of generating synthetic data may make the pose estimation problem tractable. Dish Mensuration has an additional benefit of having a well-known geometric invariance: a circular outline of a microwave dish, when rotated in 3D space, projects to an ellipse, and from the parameters of the ellipse, the 3D rotation relative to the sensor can be inferred. It is hoped that this geometric invariance will help the synthetic training regime generalize to measured data, and moreover, present a path forward to generalized models trained on synthetic datasets. For this research, we generated a dataset of 86,400 images of 5 different Microwave Dish models taken at 6 different times of day, generating both rendered image chips and component masks, facilitating pose estimation. We discuss the methods for generating the synthetic dataset, difficulties associated with generating sufficient variance, and a method for performing dish mensuration with a Deep Learning regression model. We conclude by addressing next steps and ways to further generalize into more pose estimation problems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350I (2024) https://doi.org/10.1117/12.3013547
In this paper, we propose a novel approach for real-time human action recognition (HAR) on resource-constrained UAVs. Our approach tackles the limited availability of labeled UAV video data (compared to ground-based datasets) by incorporating synthetic data augmentation to improve the performance of a lightweight action recognition model. This combined strategy offers a robust and efficient solution for UAV-based HAR.We evaluate our method on the RoCoG v21 and UAV-Human2 datasets, showing a notable increase in top-1 accuracy across all scenarios on RoCoG: 9.1% improvement when training with synthetic data only, 6.9% with real data only, and the highest improvement of 11.8% with a combined approach. Additionally, using an X3D backbone further improves accuracy on the UAV-Human dataset by 5.5%. Our models deployed on a Qualcomm Robotics RB5 platform achieve real-time predictions at approximately 10 frames per second (fps) and demonstrate a superior trade-off between performance and inference rate on both low-power edge devices and high-end desktops.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350J (2024) https://doi.org/10.1117/12.3009866
The accumulation of falling snow is a complex physical process that involves a variety of environmental factors. While much past work has been done on the rendering of accumulated snow for gaming applications, scientific simulation of snow accumulation has been limited to large-scale mountain ranges and watersheds. These largescale simulations are not relevant for simulations of autonomous ground vehicle (AGV) performance, for which the relevant length scales are a few meters to a few hundred meters. In this work, we present a physics-based simulation of the accumulation of falling snow that is implemented using smoothed-particle hydrodynamics (SPH) to represent snow mass elements. SPH has been used in past work to simulate not only fluids but also deformable and continuous media ranging from concrete to fabric to soil. In this work we show that SPH can be parametrized to have material properties that reasonably approximate the bulk properties of accumulated snow. We present several example simulations in which SPH has been used to calculate the accumulation of fallen snow in an off-road scene. Finally, we show how the SPH simulation output can be combined with a rendering simulation to create realistic synthetic images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350K (2024) https://doi.org/10.1117/12.3014543
Semantic segmentation of 2D images is a critical capability for Unmanned Ground Vehicles (UGV) navigation. A significant amount of work has been performed in data collection for road rated civilian UGVs, but Army applications are more challenging, requiring algorithms to identify a wider range of terrain and conditions. Acquiring sufficient off-road data is challenging, time intensive, and expensive due to the vast amount of variation in factors, such as off- road terrain, lighting conditions, and weather that are not present in on-road applications. Simulators can rapidly synthesize imagery appropriate to target environments that can be used to re-train models for environments with sparse datasets. Here we show that synthetic off-road data generated in simulation improved the performance of a scene segmentation algorithm deployed on a UGV. We discuss solutions to optimize the generation of synthetic data, as well as mixing with real data, for autonomous navigation in rough terrain.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350L (2024) https://doi.org/10.1117/12.3013736
Deep neural network based military vehicle detectors pose particular challenges due to the scarcity of relevant images and limited access to vehicles in this domain, particularly in the infrared spectrum. To address these issues, a novel drone-based bi-modal vehicle acquisition method is proposed, capturing 72 key images from different view angles of a vehicle in a fast and automated way. By overlaying vehicle patches with relevant background images and utilizing data augmentation techniques, synthetic training images are obtained. This study introduces the use of AI-generated synthetic background images compared to real video footage. Several models were trained and their performance compared in real-world situations. Results demonstrate that the combination of data augmentation, context-specific background samples, and synthetic background images significantly improves model precision while maintaining Mean Average Precision, highlighting the potential of utilizing Generative AI (Stable Diffusion) and drones to generate training datasets for object detectors in challenging domains.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350M (2024) https://doi.org/10.1117/12.3014145
Robust weed recognition relies on curating large-scale, diverse datasets, which are, however, practically difficult to come by. Deep generative modeling has received widespread attention in synthesizing visually realistic images beneficial for wide-ranging applications. This study investigates the efficacy of state-of-the-art deep learning-based diffusion models as an image augmentation technique for synthesizing weed images towards enhanced weed detection performance. A 10-weed-class dataset was created as a testbed for image generation and weed detection tasks. A ControlNet-added Stable Diffusion model was trained to generate weed images with broad intra-class variations of targeted weed species and diverse backgrounds to adapt to changing field conditions. The quality of generated images was assessed using metrics including the Fréchet Inception Distance and Inception Score. The generated images had an average FID score of 0.98 and an IS score of 3.63. YOLOv8l was trained for weed detection. Combining the generated with real images yielded consistent improvements (1.2-1.4% mAP@50:95) in weed detection, compared to modeling using only real images. Further research is needed to exploit controllable diffusion models for generating high-fidelity, diverse weed images and enhancing multi-class weed detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350N (2024) https://doi.org/10.1117/12.3023596
For certain objects, panchromatic or 3-band (RGB) imagery may be insufficient to achieve accurate object identification, thus, additional bandwidths within the infrared (IR) spectrum may be needed to exploit unique spectral characteristics for improving object detection. Many of the existing generative modeling techniques are applied solely to the visible wavelengths. A need exists to fully explore the application of generative modeling techniques to multispectral imagery (MSI) and specifically the IR bands. Generative models used for data augmentation for object detection must have sufficient fidelity to avoid generating data that are out of distribution with respect to actual measured data, or that contain systemic bias or artifacts. This work demonstrates the utility of a conditionally generative, multi-scale vision transformer that learns the spatial and spectral structures and the interactions between them in order to accurately synthesize near-infrared (NIR) and short-wave infrared (SWIR) data from RGB. This synthesis is performed over a diverse set of target objects observed over multiple seasons, at multiple look angles, over varying terrains, with images sampled globally from multiple satellites. For both training and inference, the model is provided no contextual information or metadata as input. Compared to using RGB alone, the average precision (AP) of an off-the-shelf object detection model (YOLOv5) trained with the additional synthesized IR data improves by up to 48% on a target class that is difficult for an analyst to identify. In conjunction with RGB data, using synthetic instead of true IR data for object detection provides higher AP values over all target classes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350O (2024) https://doi.org/10.1117/12.3021572
Deep Neural Networks (DNNs) have emerged as a powerful tool for human action recognition, yet their reliance on vast amounts of high-quality labeled data poses significant challenges. A promising alternative is to train the network on generated synthetic data. However, existing synthetic data generation pipelines require complex simulation environments. Our novel solution bypasses this requirement by employing Generative Adversarial Networks (GANs) to generate synthetic data from only a small existing real-world dataset.
Our training pipeline extracts the motion from each training video and augments it across various subject appearances within the training set. This approach increases the diversity in both motion and subject representations, thus significantly enhancing the model's performance. A rigorous evaluation of the model's performance is presented under diverse scenarios, including ground and aerial views. Moreover, an insightful analysis of critical factors influencing human action recognition performance, such as gesture motion diversity and subject appearance, is presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350P (2024) https://doi.org/10.1117/12.3014453
We report synthesis of an open source test data set and work in progress to expand it. Sophisticated data mining and Machine Learning (ML) techniques can discover statistical associations among variables that may or may not reflect actual causal dependencies. In many applications, systems must discriminate between associations that are mere coincidences and those that are at least plausibly causal. Further, a graph of causal relationships may be complex, with fan-in, fan-out, transitive, and various combinations of, dependencies. To test a system’s power to filter out non-causal associations and untangle the causal web, suitable synthetic data is needed. We report the development, in Wolfram Mathematica, of code that synthesizes data with subtle, complex, causal dependencies among some but not all of the generated observable variables. We implement several simple dissipative chaotic flows. Four (4) are autonomous, six (6) are driven. Among the resulting ten (10) observable state vectors, there are forty-five (45) potential pairwise (1:1) relationships, of which four (4) are strong, five (5) are moderate, three (3) are weak, for a total of twelve (12) that are actually causal, and any others are mere statistical artifacts that a tool under test should reject. Each system’s observables are corrupted by additive Gaussian noise. Each system’s hidden dynamics are disturbed by a normal Wiener process. The levels of these stochastic components are parameterized to make problem difficulty tunable. A set of generated data and code for generating more will be released openly on-line.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350Q (2024) https://doi.org/10.1117/12.3015079
In this research, we present a depth prediction model designed for a range of applications, moving beyond the traditional scope of assisted and autonomous driving systems. Our model emphasizes absolute accuracy over relative accuracy, tackling the challenge of performance deterioration at extended ranges.
To bolster our novel design, we employed the AirSim Unreal Engine simulator to develop a tailored dataset, capturing various scene locations. This approach aids in mitigating model overfitting to nuances such as textures and colors. With over 2.7 million images from diverse scene locations under different environmental conditions, our dataset provided a rich variety of perspectives and distances for training. We further enriched the dataset with images from 14 RGB and depth sensor pairs, strategically placed at varied pitch and yaw angles on a drone, enhancing the model’s adaptability. Notably, our reliance on simulation data aligns our model closely with real-world scenarios.
At the core of our model are features like the overlap patch embedding block, an optimized self-attention mechanism, and a Mixed-Feed Forward Network. Together, they facilitate improved depth prediction, even at considerable distances. Empirical evaluations show consistent performance across a broad depth range, with a Mean Absolute Percent Error (MAPE) of 5-10% maintained up to 1900 meters. However, performance decreases beyond this range, signaling opportunities for future enhancements.
Regarding real-world results, due to the lack of available supervision, real data was analyzed qualitatively. Preliminary observations suggest that the outcomes appear reasonable and align well with expectations, although quantitative validations remain a direction for future research.
Our research provides statistical evidence and visual illustrations of our model’s capabilities in depth prediction. The combination of our approach and insights from the simulation data suggests potential for further advancements in the field of depth prediction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350R (2024) https://doi.org/10.1117/12.3016426
Current standard practices in computational military simulation, especially the simulation of historical battles, result in fundamental epistemic error that significantly reduces its evidentiary power, the usefulness of any generated synthetic data for machine learning systems, as well as its capacity to develop meaningful and general results which might be applied to contemporary affairs. This paper lays out this criticism by analogizing military simulation to the numerical approximation of dynamical systems, via which we demonstrate the limitations associated with attempting to model a single battle. We end with a discourse on the nature of the results that should be expected from high quality computational military simulation, and its role in military doctrine, both from a Clausewitzian perspective.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350S (2024) https://doi.org/10.1117/12.3012357
The ability to see through walls is a crucial need for special operations and security forces. Our previous research has demonstrated that centimeter wave (CMW) imaging system operated at around 5 GHz WiFi signal offers a low power solution with good range and penetration capabilities. However, the accuracy of the existing system in scene reconstruction was limited due to computational complexity. In this work, we aim to leverage deep learning (DL) based algorithms to design a scene reconstruction approach with significantly improved accuracy. We utilize the high-fidelity electromagnetic (EM) simulation tool, SABR (Shooting and Bouncing Ray), for RF (Radio Frequency) simulations across different scenes and sensor setups. The backbone of our approach is an encoder-decoder neural network. To accommodate the sparse distribution of transmitter and receiver locations in 3D space, we recognize that the transformer with position encoding is more suitable to be used as our building blocks, as opposed to convolution blocks whose receptive field is the neighborhood grid. Additionally, recognizing the sparse nature of point clouds, our decoder integrates sparse tensors and convolutions via the Minkowski Engine. This innovation in model design not only makes it memory-efficient but also supports higher resolution reconstructions and the utilization of deeper learning architectures. We notice that WiFi 3D scene reconstruction using DL technique is a relatively unexplored problem, and we demonstrate that we are able to reconstruct the scene with resolution close to Rayleigh limit. Our approach has great potential to allow scene reconstruction behind obstacles on low SWaP hardware. It has wide applications such as battlefield, security and surveillance to detect and locate threats, search and rescue mission for trapped or injured under rubble, debris, or even medical fields for remote diagnosis and/or treatment, etc.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350T https://doi.org/10.1117/12.3014061
Inspired by Learning Theory, cognitive science, psychological descriptions of experience and memory, unsupervised labeling, and computer vision; Terry Traylor - a retired military information and artificial intelligence professional - borrows techniques from both the social and natural sciences to identify processes that enable experimental AI learning from cybersecurity videos. Specifically, he uses mixed-methods theory development techniques from qualitative science to study students learning cybersecurity processes to develop a biologically-inspired synthetic framework to bootstrap machine learning or other generalized synthetic learning processes.
Using the learning cybersecurity tradecraft from videos case, he exposes processes and challenges associated with handling multi-modal information that enables generalized synthetic learning. Special attention is paid to Sensory AI challenges, synthetic perception, and multi-modal processing. The session will expose attendees to a synthetic structure for multi-signal/multi-modal learning, a proposed language for synthetic experience memory structures, and a biologically-inspired structure for the multi-modal learning problem.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Integrated Machine Learning and Synthesis Pipelines
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350U (2024) https://doi.org/10.1117/12.3013814
Training state-of-the-art image classifiers and object detectors remains an extremely data-intensive process to this day. This is because inherently data-hungry, deep supervised networks are the traditional framework of choice. The significant data needs in turn impose strict requirements on the data acquisition, curation, and labelling stages that typically precede the learning process. This poses a particularly significant challenge for military and defense applications where the availability of high-quality labeled data is often limited. What is needed are methods that can effectively learn from sparse amounts of labeled, real-world data. In this paper, we propose a novel framework that incorporates a synthetic data generator into a supervised learning pipeline in order to enable end-to-end co-optimization of the discriminability and realism of the synthetic data, as well as the performance of the supervised engine. We demonstrate, via extensive empirical validation on image classification and object detection tasks, that the proposed framework is capable of learning from a small fraction of the real-world data required to train traditional, standalone supervised engines, while matching or even outperforming its off-the-shelf counterparts.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Andrii Soloviov, Derek T. Anderson, Andrew R. Buck, Brendan Alvey
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350V (2024) https://doi.org/10.1117/12.3013110
Advancements in deep learning have revolutionized the artificial intelligence (AI) landscape. However, despite considerable performance enhancements, their reliance on data and the intrinsic opacity of these models remains a challenge, hindering our ability to understand the reasons behind their failures. This paper introduces a headless open-source framework, coined MizSIM, built on the Unreal Engine (UE) to generate high volume and variety synthetic datasets for AI training and evaluation. Through the manipulation of agent and environment parameters, MizSIM can provide detailed performance analysis and failure diagnosis. Leveraging UE’s open-source distribution, cost-effective assets, and high-quality graphics, along with tools like AirSim and the Robotic Operating System (ROS), MizSIM ensures user-friendly design and seamless data extraction. In this article, we demonstrate two MizSIM workflows: one for a single-life computer vision task and the other to evaluate an object detector across hundreds of simulated lives. The overarching aim is to establish a closed-loop environment to enhance AI effectiveness and transparency.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350W (2024) https://doi.org/10.1117/12.3026764
Synthetically-generated imagery holds the promise of being a panacea for the challenges of real world datasets. Yet it continues to be frequently observed that deep learning model performance is not as good when trained with synthetic data versus real measured imagery. In this study we present analyses and illustration of the use of several statistical metrics, measures, and visualization tools based on the distance and similarity between real and synthetic data empirical distributions in the latent feature embedding space, which provide a quantitative understanding of the relevant image-domain distribution discrepancy issues hampering the generation of performant simulated datasets. We also demonstrate the practical applications of these tools and techniques in a novel study comparing latent space embedding vector distributions of real, pristine synthetic, and synthetic modified by physics-based degradation models. The results may assist deep learning practitioners and synthetic imagery modelers with evaluating latent space embedding distributional dissimilarity and improving model performance when using simulation tools to generate synthetic imagery training data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350X (2024) https://doi.org/10.1117/12.3012808
Computer vision (CV) algorithms have improved tremendously with the application of neural network-based approaches. For instance, Convolutional Neural Networks (CNNs) achieve state of the art performance on Infrared (IR) detection and identification (e.g., classification) problems. To train such algorithms, however, requires a tremendous quantity of labeled data, which are less available in the IR domain than for “natural imagery”, and are further less available for CV-related tasks. Recent work has demonstrated that synthetic data generation techniques provide a cheap and attractive alternative to collecting real data, despite a “realism gap” that exists between synthetic and real IR data.
In this work, we train deep models on a combination of real and synthetic IR data, and we evaluate model performance on real IR data. We focus on the tasks of vehicle and person detection, object identification, and vehicle parts segmentation. We find that for both detection and object identification, training on a combination of real and synthetic data performs better than training only on real data. This classification improvement demonstrates an advantage to using synthetic data for computer vision. Furthermore, we believe that the utility of synthetic data – when combined with real data – will only increase as the realism gap closes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350Y (2024) https://doi.org/10.1117/12.3014274
Geospatial intelligence is a subject with many opportunities for machine automation. Object detection is one desirable application. However, a lack of high-volume relevant datasets can make this task difficult. To combat this issue, we introduced a spin-set augmentation technique to generate synthetic training data. We used these synthetic datasets to augment the training of an object detection deep network, focusing on visible band imagery. We have continued our efforts by further testing this method on long-wave infrared imagery, including results from YOLO, SSD, and Faster R-CNN algorithms. We also introduce another synthetic augmentation technique which involves generating physics-based fully-rendered images of 3D synthetic scenery and targets and compared the rendered image performance to that of spin-sets. This paper analyzes both the spin-set and rendered image augmentation techniques in terms of object detection performance, complexity, generalizability, and explainability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350Z (2024) https://doi.org/10.1117/12.3015657
Synthetic data are frequently used to supplement a small set of real images and create a dataset with diverse features, but this may not improve the equivariance of a computer vision model. Our work answers the following questions: First, what metrics are useful for measuring a domain gap between real and synthetic data distributions? Second, is there an effective method for bridging an observed domain gap? We explore these questions by presenting a pathological case where the inclusion of synthetic data did not improve model performance, then presenting measurements of the difference between the real and synthetic distributions in the image space, latent space, and model prediction space. We find that augmenting the dataset with pixel-level augmentation effectively reduced the observed domain gap, and improves the model F1 score to 0.95 compared to 0.43 for un-augmented data. We also observe that an increase in the average cross entropy of the latent space feature vectors is positively correlated with increased model equivariance and the closing of the domain gap. The results are explained using a framework of model regularization effects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 1303510 (2024) https://doi.org/10.1117/12.3013516
Robust and resilient machine learning is critical to leading the world in cutting-edge technology for defense, but to achieve it, we need large amounts of representative data. Unfortunately, collecting and labeling real world data can be expensive and time-consuming. Computer generated data, often referred to as synthetic data, has made it possible to exponentially increase the amount of labeled data available with methods of creation such as generative models. Despite this growing trend to dedicate money and resources to produce synthetic data via simulated environments, it remains undetermined if training algorithms on synthetic data is an advantage for mission critical object detection tasks. In this paper, we propose a unique data quality metric that will support or counter the hypothesis that synthetic data is a viable alternative to using real world data. This data quality metric will determine the viability of using “digital twins” to generate more controllable and diverse synthetic images to overcome the lack of training data that hinders targeting related algorithms such as Automated Target Recognition (ATR) and Battle Damage Assessment (BDA).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 1303511 (2024) https://doi.org/10.1117/12.3013441
Collecting and annotating real-world data for the development of object detection models is a time-consuming and expensive process. In the military domain in particular, data collection can also be dangerous or infeasible. Training models on synthetic data may provide a solution for cases where access to real-world training data is restricted. However, bridging the reality gap between synthetic and real data remains a challenge. Existing methods usually build on top of baseline Convolutional Neural Network (CNN) models that have been shown to perform well when trained on real data, but have limited ability to perform well when trained on synthetic data. For example, some architectures allow for fine-tuning with the expectation of large quantities of training data and are prone to overfitting on synthetic data. Related work usually ignores various best practices from object detection on real data, e.g. by training on synthetic data from a single environment with relatively little variation. In this paper we propose a methodology for improving the performance of a pre-trained object detector when training on synthetic data. Our approach focuses on extracting the salient information from synthetic data without forgetting useful features learned from pre-training on real images. Based on the state of the art, we incorporate data augmentation methods and a Transformer backbone. Besides reaching relatively strong performance without any specialized synthetic data transfer methods, we show that our methods improve the state of the art on synthetic data trained object detection for the RarePlanes and DGTA-VisDrone datasets, and reach near-perfect performance on an in-house vehicle detection dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 1303512 (2024) https://doi.org/10.1117/12.3012275
Machine learning algorithms require datasets that are both massive and varied to train and generalize effectively. However, preparing real-world semantically labeled datasets is a very time-consuming and cumbersome task. Also, training with low volume datasets can lead to compromised performance and poor generalization of such algorithms. This algorithm performance and generalization gap due to limited quantities of real-world data could be decreased with the help of synthetic datasets that are generated with the consideration of real-world features. In this work, a combination of synthetic and real-world datasets is used to demonstrate and assess the performance of simulated-to-real-world transfer learning algorithms where the training is done in synthetic and testing in real-world datasets. The performance is further evaluated with a mixture of real and synthetic datasets. Two simulators are used in this work to generate synthetic images. The first was the Mississippi State University Autonomous Vehicle Simulator (MAVS), a high-fidelity physics-based simulator for Autonomous Ground Vehicle (AGV) in off-road terrain. The MAVS has been used to study machine learning in a variety of applications using both camera and lidar data. In addition to MAVS, the Unreal Engine version 4 (UE4) was used to generate images. Finally, images with a variety of synthetic scene fidelities and real-world images were considered for training the neural network to evaluate the effectiveness of low-fidelity synthetic data and the network performed very well with excellent confidence scores for object detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Justin T. Carrillo, Barbara N. Pilate, Andrew C. Trautz, Matthew D. Bray, Jonathan D. Sherburn, Madeline S. Karr, Orie M. Cecil, Matthew W. Farthing
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 1303513 (2024) https://doi.org/10.1117/12.3013428
The increasing deployment of AI in critical sectors necessitates advancements in explainable AI (XAI) to ensure transparency and trustworthiness of AI decisions. This paper introduces a novel methodology that leverages the Virtual Environmental Simulation for Physics-based Analysis (VESPA) framework in conjunction with Randomized Input Sampling for Explanation (RISE) to provide enhanced explainability for AI models, particularly in complex simulated environments. VESPA, known for its high-fidelity, physics-based simulations across diverse conditions, generates a vast dataset encompassing various sensor configurations, environmental factors, and material responses. This dataset serves as the foundation for applying RISE, a model-agnostic approach that generates pixel-level importance maps by probing the AI model with masked versions of the input images. Through this integration, we offer a systematic way to visualize and understand the influence of different environmental elements on AI decisions. Our approach not only sheds light on the ”black box” of AI decision-making processes but also provides a scalable framework for evaluating AI models’ robustness and reliability under a wide array of simulated scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 1303514 (2024) https://doi.org/10.1117/12.3013688
Searching millions of overhead images in order to identify objects that are of interest to national security missions presents a beneficial use case for AI models to assist human analysts. However, training AI models for target recognition typically requires large amounts of data with thousands of labeled examples. Labeling is expensive and, more importantly, in some cases sufficient examples do not exist to create an AI detection model, suggesting a need of synthetic data. We investigated multiple configurations for model training, including using various mixes of real and synthetic data, domain adaptation, and fine tuning of models. Creation of the best synthetic data via physics based simulation methods proved to be time consuming and still left a domain gap between synthetic and real data. Attempts to bridge this gap with domain adaptation suffered from model induced artifacts and still required fine-tuning with some real data to yield an improvement. While AI generated data provides less realism, it can be effective for creating a closed loop system between data generation and model development. Our results show that it is usually possible to use synthetic data to improve the performance of AI models compared to those trained solely on real data. However, the performance improvement for adding additional real data is significantly higher than for adding a similar number of synthetically generated samples.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 1303515 (2024) https://doi.org/10.1117/12.3015268
Reproducing realistic cloud clutter in simulated imagery is a useful tool for analyzing both sensor and algorithm performance. Additionally, as machine learning becomes more prevalent, realistic cloud imagery will be important in training algorithms to reject them as clutter. We briefly describe the theory of multiple scattering and then discuss the volumetric Monte Carlo method for physically-accurate rendering of clouds. We then describe a simple method that allows radiance data to be precomputed under certain assumptions. The resulting dataset can be leveraged to accelerate rendering from any view point. We conclude with a few results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 1303516 (2024) https://doi.org/10.1117/12.3012971
Training data for sporadically occurring events or anomalous targets is always an issue leading to an imbalance or under-representation in machine learning tasks. Synthetic data can aid in several different ways, such as the generation of suitable numbers of training images and, consequently, this data is often pre-labeled as part of the synthetic image generation process. The quality of this synthetic data can be questioned as to the efficacy of the task, but there is often an opportunity to look at how that quality affects your ultimate objective. For an overhead imagery task, does one need a complete, physically accurate simulation of all the optical, atmospheric, and sensor properties, or does a ”quick-and-dirty” visible wavelength simulation suffice? For all intents and purposes, this is surely dependent on the task at hand.
When it is impossible or impractical to collect real labeled data, simulated data may be the only option. In active research being conducted by the Digital Imaging and Remote Sensing laboratory in the Chester F. Carlson Center for Imaging Science at the Rochester Institute of Technology, researchers are focusing on the estimation of the volume of condensed water vapor plumes that are generated from mechanical draft cooling towers, at a variety of facilities, using various modalities of remote sensing data from different imaging platforms. Prior research has supported the use of machine learning for plume segmentation and multi-view geometry techniques for three-dimensional reconstruction and subsequent volume estimation. To this point, real imagery has been collected from the ground and small unmanned aircraft systems with the end goal of exploring other potential collection platforms.
This research focuses on the training of a U-Net model to mask and segment these condensed water vapor plumes from other objects in the scene. The U-Net model and segmentation process has been successfully previously applied to real, low-altitude imagery and this research focuses on the use and application of the model through the use of simulated imagery. Several aspects of the simulation are of interest; how physically accurate do the scattering properties of the plume data need to be, how critical is the understanding of in situ meteorological conditions, how dependent is the process on the temporal and geographic variety in the data, how important is scene clutter and background type? The synthetic data used in this study was generated using the Digital Imaging and Remote Sensing Image Generation (DIRSIG) simulation environment and used to derive the inference and segmentation model to be tested on real imagery. While the trained artificial intelligence model performed consistently well when evaluated with synthetic imagery, the accuracy seen in the synthetic dataset did not translate into comparable results when evaluated with real imagery. However, the successes seen with the synthetic imagery and instances of real imagery results indicate that this binary classification and subsequent volume estimation is feasible to accomplish with high levels of accuracy in the future.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 1303517 (2024) https://doi.org/10.1117/12.3013381
Space domain awareness has gained traction in recent years, encompassing the charting and cataloging of space objects, anticipating orbital paths, and keeping track of re-entering objects. Radar techniques can be used to monitor the fast-growing population of satellites, but so far this is mainly used for detection and tracking. For the characterization of a satellite’s capabilities, more detailed information, such as inverse synthetic-aperture radar (ISAR) imaging, is needed. Deep learning has become the preferred method for automated image analysis in various applications. Development of deep learning models typically requires large amounts of training data, but recent studies have shown that synthetic data can be used as an alternative in combination with domain adaption techniques to overcome the domain gap between synthetic and real data.
In this study, we present a deep learning-based methodology for automated segmentation of the satellite’s bus and solar panels in ISAR images. We first train a segmentation model using thousands of fast simulated ISAR images and then we finetune the model using a domain adaptation technique that only requires a few samples of the target domain. As a proof of concept, we use a small set of high fidelity simulated ISAR images closely resembling real ISAR images as the target domain. Our proof of concept demonstrates that this domain adaptation technique effectively bridges the domain gap between the training and target radar image domains. Consequently, fast simulated (low fidelity) synthetic datasets are proven to be invaluable for training segmentation models for ISAR images, especially when combined with domain adaptation techniques.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 1303518 (2024) https://doi.org/10.1117/12.3010174
Aperture photometry is a critical method for estimating the visual magnitudes of stars and satellites, essential in Space Domain Awareness (SDA) for tasks like collision avoidance. Traditional methods have fixed aperture shapes, limiting accuracy and adaptability. We introduce a novel approach that defines pixel-specific regions for the aperture and annulus, significantly improving accuracy. Nevertheless, conventional aperture photometry is constrained by predefined equations, leading to errors and sensitivity to image conditions. To overcome these limitations, we propose a learned photometry pipeline that combines aperture photometry with machine learning. Our approach demonstrates remarkable effectiveness for both stars and satellites across diverse image conditions. We rigorously tested it on three datasets, including a custom synthetic dataset and real imagery. Our results showcase outstanding performance, with a 1.44% error in star visual magnitude estimation and a 0.64% error in satellite visual magnitude estimation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 1303519 (2024) https://doi.org/10.1117/12.3013481
Emerging communication technologies, such as millimeter-wave (mmWave) and massive antenna arrays, facilitate highly directional and long-range communication through beamforming and Multiple-Input Multiple-Output (MIMO) techniques. Recent AI advancements hold significant promise for enhancing Radio Frequency (RF) tracking capabilities, enabling the detection, localization, and tracking of highly directional signals through coordinated swarms. However, these advancements also bring new challenges, such as the need for comprehensive training datasets that account for various environmental factors affecting RF signal propagation, including diverse weather conditions, buildings, and terrains.
This paper introduces a new simulation platform specifically for evaluating the performance of RF tracking methods and, more importantly, generating comprehensive signal map training datasets for reinforcement learning-based RF tracking algorithms. Leveraging the MATLAB RF signal simulation toolbox, the simulator possesses the capability to model RF signal propagation and swarm mobility, accounting for diverse factors such as free space loss (due to propagation distance), diffraction loss (due to obstacle obstruction), and environmental variables like terrain, buildings, and weather conditions (e.g., sunny, cloudy, and foggy). Additionally, the platform can simulate the trajectories of different types of moving transmitters and receivers (such as robots, drowns, and vehicles). Furthermore, the simulator offers users and developers flexibility to incorporate their own mobility models into the simulation environment, including control mobility models and data-driven models (e.g., transformer), enabling the training of reinforcement learning for RF tracking in complex scenarios generated by the platform.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130351A (2024) https://doi.org/10.1117/12.3016167
A Generative Adversarial Network was used to produce Raman spectra of Influenza A virus in culture and then used to train a virus detection classification model. Dimensionality reduction plotting using t-Distributed Stochastic Neighbor Embedding (t-SNE) demonstrated overlap between the real and synthetic spectra but not complete blending, which can be attributed to the subtle differences between the real and synthetic data. Nevertheless, the real and synthetic spectra also exhibited similar Raman peak patterns. Moreover, the inclusion of synthetic spectra into the training set was able to increase the virus classification accuracy from 83.5% to 91.5%. This indicates that the GANs were able to synthesize spectra closely related to virus-positive spectra yet distinctly different from virus-negative spectra, which appear visually similar. We conclude that the synthetic spectra produced by the GANs were similar to the real data but not an exact replacement.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130351B (2024) https://doi.org/10.1117/12.3013507
Automatic Modulation Recognition (AMR) is an important part of spectrum management. Existing work and datasets focus on variety in the modulations transmitted and only apply rudimentary channel effects. We propose a new dataset which supports AMR tasks which focuses on only a few common modulations but introduces a large variation to the propagation channel. Simple scenarios with rural and urban areas are randomly generated using Simplex noise and a receiver/transmitter pair is placed in the scenario. The 3GPP model is combined with the propagation vector from the scenario generator to simulate a signal propagating across the generated terrain. This dataset brings more realism to the AMR task and will allow machine learning models to adapt to changing environments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130351C (2024) https://doi.org/10.1117/12.3013122
Recently, there is a growing interest in utilizing wireless signals for human gesture recognition and activity recognition. At the same time, the scarcity and lack of diversity of radar echo signature datasets of human gestures and activities is well recognized. This work demonstrates a framework for synthetically generating a vast and diverse set of radar echo signatures starting from a small set of optical motion capture (MoCap) trajectories. The captured trajectories are perturbed using a pool of composable spatial and temporal transformation functions assembled by a data augmentation pipeline builder. The transformed trajectories, combined with a simple radar cross-section (RCS) modeling process, are used to simulate radar CIR signals. Features extracted from this synthetic dataset show a strong correlation with the features obtained from simultaneously collected real radar data. Furthermore, we demonstrate that the synthetically generated radar echo signals can improve the performance of ML-based wireless gesture and activity recognition systems especially where the availability of real data is limited.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130351D (2024) https://doi.org/10.1117/12.3015083
The reconstruction of a watertight surface mesh from point clouds is a difficult problem. Constructing a watertight model from a polygonal mesh is just as difficult since there can be many issues in these models, such as intersecting surfaces and non-manifold geometry. We first describe a complete repair process for a single CAD object, resulting in a repaired static model. Next, we implement a novel workflow that can be used to repair local issues on almost every model, allowing one to use global repair methods on local areas of the model. This workflow can be applied to an assembly of CAD objects to retain articulations in the final repaired dynamic model. We introduce methods from Topological Data Analysis (TDA) to show that topological features can be used in the definition of robust mesh metrics, to characterize and determine quality of meshes, and to implement fully-automated watertight & repair of CAD meshes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130351E (2024) https://doi.org/10.1117/12.3013826
Acquiring representative data samples is pivotal to the process of creating machine learning models. However, gathering real-world imagery often presents challenges related to privacy concerns, regulatory constraints, financial resources, and accessibility limitations. Synthetic imagery offers an opportunity to augment real -world computer vision datasets while bypassing these obstacles. Yet, a fundamental challenge in working with synthetic imagery is ensuring that the generated data closely resembles its real-world counterpart. Further, it can be difficult to generate synthetic imagery with the same features and quality required to train well-generalized computer vision models. This research paper introduces and evaluates our custom-built Replicant framework – a novel synthetic data generation framework integrated into Booz Allen’s Vision AI Stack. In developing this service, we created a framework to produce synthetic imagery that closely resembles a real-world maritime dataset, and which can be used to develop any domain-specific synthetic data. We utilize this data to train object detection models and demonstrate how synthetic data benefits model performance. Additionally, we employ similarity metrics, including perceptual hashing (pHash), Optimal Transport Dataset Distance (OTDD), and Fréchet Inception Distance (FID) to assess the likeness of these real and synthetic datasets. Finally, we explore the applicability and effectiveness of explainable AI (XAI) techniques, such as Eigen Class Activation Mapping (Eigen CAM) and Shapley Additive Explanation (SHAP), to gain insights into the performance of our deep learning models and the utility of our synthetic data. Our findings underscore the vast potential of synthetic data to benefit deep learning model performance while overcoming challenges associated with real -world data acquisition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130351F (2024) https://doi.org/10.1117/12.3013989
For facial recognition, textured three-dimensional (3D) meshes offer critical depth information that enhance identification across multiple perspectives. However, 3D facial recognition is often hindered by data limitations, collection environments, and domain shifts. Therefore, we propose a method to synthesize textured 3D facial meshes using existing two-dimensional (2D) face images. Our method demonstrates improved pose invariance by synthesizing faces and leveraging combinations of synthetic and real 3D facial data to improve facial recognition performance. Figure 1 provides an example of synthesized textured meshes from the Facescape1 dataset, which includes 3D faces, textures, and corresponding 2D images. Through synthesizing 3D geometry and occluded/nonoccluded textures, this method leverages pose invariant features from textured 3D meshes using 2D imagery for complex facial recognition tasks. We implement a 2D-to-3D domain adaptation scheme that enables Adaface2—a leading 2D recognition framework—to discriminate 3D facial features learned from Pointnet++.3 This strengthens off-pose identification, highlighting the importance of data synthesis in expanding capabilities. Our proposed method improves pose invariance by leveraging denoising diffusion probabilistic models (DDPMs)4 conditioned on 2D representations to construct 3D textured meshes. This approach presents a robust alternative to existing methodologies,5, 6 emphasizing the advantages of 2D networks to infer 3D features for enhanced recognition. Leveraging DDPMs and domain adaptation broadens the landscape for image recognition systems, signifying diverse data structures, even synthetic, for improving recognition performance under challenging conditions. The results demonstrate enhanced modeling to bridge the gap between 3D meshes and images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.