Semantic Offloading for AI

Highly-efficient Semantic Offloading Framework for AI Inference

Project N-epitomizer
Enabling Semantic Offloading for Neural Network Inferences

Summary

Offloading neural network inferences from resource-constrained mobile devices to an edge server over wireless networks is becoming more crucial as the neural networks get heavier. To this end, recent studies have tried to make this offloading process more efficient. However, the most fundamental question on extracting and offloading the minimal amount of necessary information that does not degrade the inference accuracy has remained unanswered. We call such an ideal offloading semantic offloading and propose N-epitomizer, a new offloading framework that enables semantic offloading, thus achieving more reliable and timely inferences in highly-fluctuated or even low-bandwidth wireless networks. To realize N-epitomizer, we design an autoencoder-based scalable encoder trained to extract the most informative data and scale its output size to meet the latency and accuracy requirements of inferences over a network. We also accelerate N-epitomizer by exploiting light-weight knowledge distillation for the encoder design and decoder slimming for the decoder design, reducing its overall computation time significantly. Our evaluation shows that N-epitomizer achieves exceptionally high compression for images without compromising inference accuracy, which is 21x, 77x, and 192x higher than JPEG compression, and 20x, 55x, and 86x higher than the state-of-the-art DNN-aware image compression GRACE. for semantic segmentation, depth estimation, and classification, respectively. Our results show N-epitomizer's strong potential as the first semantic offloading system to guarantee end-to-end latency even under highly varying cellular networks.

N-epitomizer: Design Overview

Performance

The below figure shows that N-epitomizer outperforms other codecs and the state-of-the-art approaches for semantic segmentation and image classification without compromising DNN accuracy. This result confirms that N-epitomizer can extract essential information tailored for different DNN models and minimize the transmission volume for offloading DNN inferences.

Comparison of Bit-per-pixel (BPP) between N-epitomizer
and other codecs for semantic segmentation and image classification

Publications

[IEEE/ACM TNET'25] N-epitomizer: A Semantic Offloading Framework leveraging Essential Information for Timely Neural Network Inferences [paper]
Wooseung Nam, Sungyong Lee, Jinsung Lee, Huijeong Choi, Sangtae Ha, and Kyunghan Lee*, "N-epitomizer: A Semantic Offloading Framework leveraging Essential Information for Timely Neural Network Inferences," IEEE/ACM Transactions on Networking (IF: 4.0), vol. 33, no. 3, pp. 1041-1055, Jun. 2025.

[IEEE MASS'23] N-epitomizer: Enabling Semantic Offloading for Neural Network Inferences [paper]
Sungyong Lee, Wooseung Nam, Jinsung Lee, Sangtae Ha, and Kyunghan Lee*, "N-epitomizer: Enabling Semantic Offloading for Neural Network Inferences," IEEE MASS (invited), Toronto, Canada, 2023.

Members

Sungyong Lee ✉

Ph. D. Students

Wooseung Nam ✉

Ph. D. Students

Kyunghan Lee ✉

Principal Investigator

Page updated

Google Sites

Report abuse