Paper
3 April 2024 Efficient single- and multi-DNN inference using TensorRT framework
Vyacheslav Zhdanovskiy, Lev Teplyakov, Philipp Belyaev
Author Affiliations +
Proceedings Volume 13072, Sixteenth International Conference on Machine Vision (ICMV 2023); 1307215 (2024) https://doi.org/10.1117/12.3023487
Event: Sixteenth International Conference on Machine Vision (ICMV 2023), 2023, Yerevan, Armenia
Abstract
In the recent years, there has been a significant growth of interest in real-world systems based on deep neural networks (DNNs). These systems typically incorporate multiple DNNs running simultaneously. In this paper we propose a novel approach of multi-DNN execution on a single GPU using multiple CUDA contexts and TensorRT, state-of-the-art DNN inference framework. We show that it can lead to more efficient scheduling of multiple DNNs, especially in case when a lightweight and a heavy DNNs are inferred together. We show that our approach can provide an almost 7x increase in the throughput of a lightweight DNN at the cost of neglible throughput drop of a heavy DNN, compared to the baseline. Moreover, we compare two ways of improving throughput of a single DNN by processing multiple images together: standard batching and implicit batching by processing multiple images simultaneously using several TensorRT execution contexts. We show that meanwhile standard batching outperforms implicit batching at larger batch sizes, implicit batching can provide up to 43% more throughput for a smaller DNN using smaller batch size.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Vyacheslav Zhdanovskiy, Lev Teplyakov, and Philipp Belyaev "Efficient single- and multi-DNN inference using TensorRT framework", Proc. SPIE 13072, Sixteenth International Conference on Machine Vision (ICMV 2023), 1307215 (3 April 2024); https://doi.org/10.1117/12.3023487
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
Back to Top