Combining DNN partitioning and early exit
Published in EdgeSys '22: Proceedings of the 5th International Workshop on Edge Systems, Analytics and Networking, 2022
Recommended citation: Ebrahimi, Maryam; da Silva Veith, Alexandre; Gabel, Moshe; de Lara, Eyal.
Abstract
DNN inference is time-consuming and resource hungry. Partitioning and early exit are ways to run DNNs efficiently on the edge. Partitioning balances the computation load on multiple servers, and early exit offers to quit the inference process sooner and save time. Usually, these two are considered separate steps with limited flexibility. This work combines partitioning and early exit and proposes a performance model to estimate both inference latency and accuracy. We use this performance model to offer the best partitioned/early exit DNN based on deployment information and user preferences. Our experiments show that the flexibility in number and position of partitioning points and placement on available devices plays an important role in deciding the best output. In the future, we plan to turn this work into a “one-click” system to train and optimize models for edge computing.