Deep learning on smartphones, smartwatches, and IoT devices is possible, but often slow and power hungry. At University of Ljubljana we believe that this partly because of unrealistic demands for high computational accuracy. Therefore, we develop techniques for imprecise, yet "good enough" deep learning that runs faster and consumes less energy than the standard approach. In this presentation, targeting primarily mobile computing practitioners, we will see how, using our tools, a deep learning model can be dynamically approximated to run on a smartphone with 15% less energy and no loss of accuracy.
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
[DSC Adria 23] Veljko Pejovic Lightweight Deep Learning on Edge Devices.pptx
1. Lightweight Deep Learning
on Edge Devices
Veljko Pejović (veljko.pejovic@fri.uni-lj.si)
Faculty of Computer and Information Science
University of Ljubljana, Slovenia
Computer Science Department,
Lancaster University, UK
2. AI Should Live on the Edge
Privacy and availability
“4 in 10 consumers opt not to use the [AI-powered
voice assistant] services because they are worried
about their data”
The Voice Consumer Index (VCI)
Vixen Labs, 2021
“AI requires a high-bandwidth, low-latency network.
It is important to ensure the service wrap and
technology stack are consistent for all regions”
What are the infrastructure requirements for artificial intelligence?
Terry Storrar, Leaseweb, 2021
3. AI Struggles on the Edge
Latency, memory, energy
• Limited resources vs
increasing model requirements
Canziani, A., Paszke, A., & Culurciello, E. (2016). An analysis
of deep neural network models for practical applications.
arXiv preprint arXiv:1605.07678.
• Heterogeneous devices and
latency/energy burden
Wang, H., Kim, B., Xie, J., & Han, Z.
How is energy consumed in smartphone deep learning apps?
Executing locally vs. remotely. In IEEE GLOBECOM 2019
4. Next Generation Hardware Won’t Help
Mobiles will lag
• Breakdown of Dennard scaling
• Packing more transistors in the
same area will dissipate more power
• Multicore needs space
• More energy for computation and cooling
[Hennessy & Patterson, Turing Award Lecture 2019]
6. Opportunities for AMC
• Computed result quality exceeds the limits of human perception or attention
• Computed result quality exceeds a user’s interest/need
• Preserving resources is more important than high result quality
• Inputs and/or the computation are inherently noisy
• Inputs are inherently “easy” to process
7. Bringing AMC to Masses
Programming support for context-aware approximation
• All developers should be able to approximate
• Mobile developers are not data scientists
• Approximation should be dynamic
8. Mobiprox
Supporting approximate deep learning on mobiles
• Implement support for approximate
tensor operations on Android
M. Fabjančič, O. Machidon, H. Sharif, Y. Zhao, S. Misailović, V. Pejović
Mobiprox: Supporting Dynamic Approximate Computing on Mobiles
arXiv:2303.11291 (2023)
9. Mobiprox
Supporting approximate deep learning on mobiles
QoS loss
Speedup
• Implement support for approximate
tensor operations on Android
• Uncover the Pareto-front of
configurations (layer-wise
approximations), that give the optimal
speedup — inference accuracy trade-
off
• Devise dynamic adaptation
algorithms for navigating the Pareto
front
10. Mobiprox
Supporting approximate deep learning on mobiles
• Approximations:
• Filter sampling, perforated convolutions, quantization
• Implementation: expanded CLBlast lib
• Tuning:
• On a GPU-enabled cluster * **
• On an Android device
Row perforation and column perforation
Filter sampling
* Sharif et al., ApproxTuner: A Compiler and Runtime System for Adaptive Approximations. PPoPP, 2021
** Sharif et al. ApproxHPVM: a portable compiler IR for accuracy-aware optimizations. OOPSLA, 2019
12. Dynamic Approximation Adaptation
Context-aware, need-driven, business-oriented adaptation
• Arbitrary adaptation strategies can be implemented
• “More accurate human activity recognition model when a user is exercising”
• “Higher approximation level when battery falls under 15%”
• Our pick: “Minimize energy usage without sacrificing the inference accuracy”
14. Evaluation
Human activity recognition
• 21 volunteers, on-body UDOO boards,
six prescribed activities
• Slight accuracy drop
from 65% to 63% accuracy (-2%)
• Significant energy savings
from 245mAh to 209mAh (-15%)
• Certain classes are more robust
to approximation than others
Average accuracy vs. average energy consumption for all users
non-approximated network vs confidence-based adaptation
15. Evaluation
Spoken keyword recognition
• HONK model built on Google SC
• Mix 160 unheard utterances from
Google SC with noise levels from
realistic environments
• Confidence-based adaptation
• 15% less energy, 0% accuracy loss
16. Acknowledgements
The Team Resources
• Octavian Machidon
• Alina Machidon
• Davor Sluga
• Matevž Fabjančič
• Timotej Knez
• Janez Božič
• Tine Fajfar
• Jani Asprov
“Bringing Resource Efficiency to Smartphones with Approximate
Computing”
(ARRS project No.: N2-0136)
“Context-Aware On-Device Approximate Computing”
(ARRS project No.: J2-3047)
“Computer Structures and Systems”
(ARRS core funding No. P2-0098.
M. Fabjancic et al. Mobiprox: Supporting Dynamic Approximate Computing on
Mobiles, arXiv:2303.11291, 2023
A. Machidon and V. Pejovic, Enabling Resource-Efficient Edge Intelligence
with Compressive Sensing-Based Deep Learning, ACM Computing Frontiers,
May 2022
A. Machidon and V. Pejovic, Deep Learning Techniques for Compressive
Sensing-Based Reconstruction and Inference - A Ubiquitous Systems
Perspective, Artificial Intelligence Review, 2022
T. Knez, O. Machidon, and V. Pejovic, Self-Adaptive Approximate Mobile Deep
Learning, Electronics (2021)
V. Pejovic, Towards Approximate Mobile Computing, ACM GetMobile
Magazine, Vol 22(5), December, 2018.
17. Thank you!
Veljko Pejović (veljko.pejovic@fri.uni-lj.si)
University of Ljubljana, Slovenia
Lancaster University, UK
Code available at https://gitlab.fri.uni-lj.si/lrk