RT-Swap: Addressing GPU Memory Bottlenecks for Real-Time Multi-DNN Inference

Woosung Kang, Jinkyu Lee, Youngmoon Lee, Sangeun Oh, Kilho Lee, Hoon Sung Chwa

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

The increasing complexity and memory demands of Deep Neural Networks (DNNs) for real-Time systems pose new significant challenges, one of which is the GPU memory capacity bottleneck, where the limited physical memory inside GPUs impedes the deployment of sophisticated DNN models. This paper presents, to the best of our knowledge, the first study of addressing the GPU memory bottleneck issues, while simultaneously ensuring the timely inference of multiple DNN tasks. We propose RT-Swap, a real-Time memory management framework, that enables transparent and efficient swap scheduling of memory objects, employing the relatively larger CPU memory to extend the available GPU memory capacity, without compromising timing guarantees. We have implemented RT-Swap on top of representative machine-learning frameworks, demonstrating its effectiveness in making significantly more DNN task sets schedulable at least 72% over existing approaches even when the task sets demand up to 96.2% more memory than the GPU's physical capacity.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE 30th Real-Time and Embedded Technology and Applications Symposium, RTAS 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages373-385
Number of pages13
ISBN (Electronic)9798350358414
DOIs
StatePublished - 2024
Event30th IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS 2024 - Hong Kong, China
Duration: 13 May 202416 May 2024

Publication series

NameProceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS
ISSN (Print)1545-3421

Conference

Conference30th IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS 2024
Country/TerritoryChina
CityHong Kong
Period13/05/2416/05/24

Bibliographical note

Publisher Copyright:
© 2024 IEEE.

Keywords

  • Deep Neural Network
  • Memory
  • Real-Time Scheduling

Fingerprint

Dive into the research topics of 'RT-Swap: Addressing GPU Memory Bottlenecks for Real-Time Multi-DNN Inference'. Together they form a unique fingerprint.

Cite this