Offloading Algorithms for Maximizing Inference Accuracy on Edge Device in an Edge Intelligence System

With the emergence of edge computing, the problem of offloading jobs between an Edge Device (ED) and an Edge Server (ES) received significant attention in the past. Motivated by the fact that an increasing number of applications are using Machine Learning (ML) inference from the data samples collect...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on parallel and distributed systems Vol. 34; no. 7; pp. 1 - 15
Main Authors	Fresa, Andrea, Champati, Jaya Prakash
Format	Journal Article
Language	English
Published	New York IEEE 01.07.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Accuracy Algorithms Approximation algorithms Artificial neural networks Computational modeling Constraints Costs Data models Dynamic programming Edge computing Image classification Inference Inference algorithms Integer programming Linear programming Machine learning Maximization Optimization Polynomials Scheduling Servers
Online Access	Get full text
ISSN	1045-9219 1558-2183
DOI	10.1109/TPDS.2023.3267458

Cover

More Information
Summary:	With the emergence of edge computing, the problem of offloading jobs between an Edge Device (ED) and an Edge Server (ES) received significant attention in the past. Motivated by the fact that an increasing number of applications are using Machine Learning (ML) inference from the data samples collected at the EDs, we study the problem of offloading inference jobs by considering the following novel aspects: 1) in contrast to a typical computational job, the processing time of an inference job depends on the size of the ML model, and 2) recently proposed Deep Neural Networks (DNNs) for resource-constrained devices provide the choice of scaling down the model size by trading off the inference accuracy. Considering that multiple ML models are available at the ED, and a powerful ML model is available at the ES, we formulate an Integer Linear Programming (ILP) problem with the objective of maximizing the total inference accuracy of <inline-formula><tex-math notation="LaTeX">n</tex-math></inline-formula> data samples at the ED subject to a time constraint <inline-formula><tex-math notation="LaTeX">T</tex-math></inline-formula> on the makespan. Noting that the problem is NP-hard, we propose an approximation algorithm Accuracy Maximization using LP-Relaxation and Rounding (AMR<inline-formula><tex-math notation="LaTeX">^{2}</tex-math></inline-formula>) and prove that it results in a makespan at most <inline-formula><tex-math notation="LaTeX">2T</tex-math></inline-formula> and achieves a total accuracy that is lower by a small constant from the optimal total accuracy implying that AMR<inline-formula><tex-math notation="LaTeX">^{2}</tex-math></inline-formula> is asymptotically optimal. Further, if the data samples are identical we propose Accuracy Maximization using Dynamic Programming (AMDP), an optimal pseudo-polynomial time algorithm. Furthermore, we extend AMR<inline-formula><tex-math notation="LaTeX">^{2}</tex-math></inline-formula> for the case of multiple ESs, where each ES is equipped with a powerful ML model. As proof of concept, we implemented AMR<inline-formula><tex-math notation="LaTeX">^{2}</tex-math></inline-formula> on a Raspberry Pi, equipped with MobileNets, that is connected to a server equipped with ResNet, and studied the total accuracy and makespan performance of AMR<inline-formula><tex-math notation="LaTeX">^{2}</tex-math></inline-formula> for image classification.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1045-9219 1558-2183
DOI:	10.1109/TPDS.2023.3267458