Deep neural network based military vehicle detectors pose particular challenges due to the scarcity of relevant images and limited access to vehicles in this domain, particularly in the infrared spectrum. To address these issues, a novel drone-based bi-modal vehicle acquisition method is proposed, capturing 72 key images from different view angles of a vehicle in a fast and automated way. By overlaying vehicle patches with relevant background images and utilizing data augmentation techniques, synthetic training images are obtained. This study introduces the use of AI-generated synthetic background images compared to real video footage. Several models were trained and their performance compared in real-world situations. Results demonstrate that the combination of data augmentation, context-specific background samples, and synthetic background images significantly improves model precision while maintaining Mean Average Precision, highlighting the potential of utilizing Generative AI (Stable Diffusion) and drones to generate training datasets for object detectors in challenging domains.
|