Paper
16 August 2024 YOLOv9-SMN: YOLOv9 with spatial multifusion network
Zizhuang Liu
Author Affiliations +
Proceedings Volume 13230, Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024); 1323028 (2024) https://doi.org/10.1117/12.3035607
Event: Third International Conference on Machine Vision, Automatic Identification and Detection, 2024, Kunming, China
Abstract
The introduction of pioneering technologies such as programmable gradient information and Generalized Efficient Layer Aggregation Network in YOLOv9 has significantly improved its efficiency, accuracy, and adaptability compared to YOLOv8. The proposal of SPPELAN has notably enhanced the network's feature extraction capability. However, SPPELAN's use of multi-level maximum pooling layers may lead to the loss of some detail information, especially with larger pooling kernel sizes, potentially ignoring smaller targets or details. To address this, we propose the Spatial Multi- Fusion Net, which involves segmenting channels and then blending image features extracted from different channels using maximum pooling blocks with different kernel sizes and convolutional blocks of varying depths. This allows the model to capture features at different abstraction levels, thereby achieving the goal of collecting features of objects of different sizes. Integrating the Spatial Multi-Fusion Net into YOLOv9 further improves its performance on the COCO dataset's object detection task, with all metrics showing enhancement, despite adding fewer parameters.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Zizhuang Liu "YOLOv9-SMN: YOLOv9 with spatial multifusion network", Proc. SPIE 13230, Third International Conference on Machine Vision, Automatic Identification, and Detection (MVAID 2024), 1323028 (16 August 2024); https://doi.org/10.1117/12.3035607
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Feature extraction

Object detection

Convolution

Feature fusion

Image processing

Image segmentation

Data modeling

Back to Top