PORTEND: A Joint Performance Model for Partitioned Early-Exiting DNNs

Published in The 29th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2023), 2023

Recommended citation: Ebrahimi, Maryam; da Silva Veith, Alexandre; Gabel, Moshe; de Lara, Eyal.

[Paper] [BIBTEX]

Abstract

The computation and storage requirements of Deep Neural Networks (DNNs) make them challenging to deploy on edge devices, which often have limited resources. Conversely, offloading DNNs to cloud servers incurs high communication overheads. Partitioning and early exiting are attractive solutions for reducing computational costs and improving inference speed. However, current work often addresses these approaches separately and/or ignores common communication intricacies on edge networks such as de(serialization) and data transmission overheads. We present PORTEND, a novel performance model that jointly optimizes partitioning, early exiting, and multi-tier network placement. PORTEND’S novel approach outperforms the state-of-the-art solutions in edge computing setups, reducing the DNN inference latency by 29%.