SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
A Peek into Google’s
Edge TPU
Koan-Sin Tan

freedom@computer.org

April 18th, 2019

Hsinchu Coding Serfs Meeting
1
Who Am I?
• An old programmer, learned to use “open
source” stuff on VAX-11/780 running 4.3BSD
before the term “open source” was coined

• TensorFlow Contributor

• Search “Koan-Sin" at https://github.com/
tensorflow/tensorflow/releases

• PRs, https://github.com/tensorflow/
tensorflow/pulls?
utf8=%E2%9C%93&q=is%3Apr+author
%3Afreedomtan+

• Contributing to TensorFlow is quite easy.
There are many typos :-)

• Interested in using NN on edge devices. so
learned TFLite

• label_image for TFLite
2
Google Edge TPU
!3
https://coral.withgoogle.com/products/
Google Edge TPU
• Announced in Google Next
2018 (July, 2018)

• Available to general developers
right before TensorFlow Dev
Summit 2019 (Mar, 2019)

• USB: Coral Accelerator

• Dev Board: Coral Dev Board

• More are coming, e.g., PCI-E
Accelerator and SOM

• Supported framework: TFLite
https://coral.withgoogle.com/products/
4
• Updates released on April 11th, 2019

• Compiler: removed the restriction for specific architectures

• New TensorFlow Lite C++ API

• Updated Python API, mainly for multiple Edge TPUs

• Updated Mendel OS and Mendel Management Tool (MDT) tool

• Environmental Sensor Board, https://coral.withgoogle.com/products/
environmental/

https://developers.googleblog.com/2019/04/updates-from-coral-new-compiler-and.html 

https://coral.withgoogle.com/news/updates-04-2019/
!5
biology hobbyist in Edge TPU team?
!6
https://en.wikipedia.org/wiki/Coral https://en.wikipedia.org/wiki/Charles_Darwin
https://en.wikipedia.org/wiki/HMS_Beagle https://en.wikipedia.org/wiki/Gregor_Mendel
Coral USB Accelerator
• USB 3.1 (gen 1) port and
cable (SuperSpeed, 5Gb/s
transfer speed)

• MobileNet V1 1.0 224
quantized: ~ 4.3 MiB,

• Recommended operating
conditions

•
• https://coral.withgoogle.com/tutorials/accelerator-datasheet/
• https://coral.withgoogle.com/tutorials/accelerator/
4.3 * 106
* 8/(5 * 109
) ≈ 70μs
Operating frequency Max ambient temperature
Default 35°C
Maximum 25°C
• Software environment

• Linux computer with a USB Port

• Debian 6.0 or higher, or any
derivative thereof (such as Ubuntu
10.0+)

• System architecture of either x86_64
or ARM64 with ARMv8 instruction
set

• Some caveats

• USB 2.0 hurts

• With newer Ubuntu, you have to
modify the installation script

• actually, ARMv7 also works
7
https://coral.withgoogle.com/tutorials/accelerator-datasheet/
Performance Setting for
USB Accelerator
!8
!9
Coral Dev Board
• Edge TPU Module (SOM)
◦ NXP i.MX 8M SOC (Quad-core
Cortex-A53, plus Cortex-M4F)
◦ Google Edge TPU ML accelerator
coprocessor
◦ Cryptographic coprocessor
◦ Wi-Fi 2x2 MIMO (802.11b/g/n/ac
2.4/5GHz)
◦ Bluetooth 4.1
◦ 8GB eMMC
◦ 1GB LPDDR4
• USB connections
◦ USB Type-C power port (5V DC)
◦ USB 3.0 Type-C OTG port
◦ USB 3.0 Type-A host port
◦ USB 2.0 Micro-B serial console port
• Audio connections
◦ 3.5mm audio jack (CTIA compliant)
◦ Digital PDM microphone (x2)
◦ 2.54mm 4-pin terminal for stereo speakers
• Video connections
◦ HDMI 2.0a (full size)
◦ 39-pin FFC connector for MIPI DSI
display (4-lane)
◦ 24-pin FFC connector for MIPI CSI-2
camera (4-lane)
• MicroSD card slot
• Gigabit Ethernet port
• 40-pin GPIO expansion header
• Supports Mendel Linux (derivative of Debian)
https://coral.withgoogle.com/tutorials/devboard-datasheet/
https://www.blog.google/products/google-cloud/bringing-intelligence-to-the-edge-with-cloud-iot/10
Mendel Linux?
• https://pypi.org/project/
mendel-development-tool/

• https://
coral.googlesource.com/
mdt.git

• 404, several weeks ago

• now it’s there

• actually, there are lots more
information at https://
coral.googlesource.com/, let’s
look at them later
https://pypi.org/project/mendel-development-tool/
11
Mendel Linux
• It’s Debian-based one, apt tools can tell us many things

• And take a look at /etc/apt/sources.list. Yup, it’s there

• https://packages.cloud.google.com/apt/dists/mendel-bsp-
enterprise-beaker/main

• https://packages.cloud.google.com/apt/dists/mendel-
beaker/main
!12
Mendel Linux
• https://
packages.cloud.google.com/
apt/dists/

mendel-animal
mendel-beaker
mendel-bsp-enterprise-animal
mendel-bsp-enterprise-beaker
mendel-bsp-enterprise-chef
mendel-bsp-enterprise-unstable
mendel-chef
mendel-chef-unstable
mendel-core-animal
mendel-core-beaker
mendel-core-chef
mendel-core-unstable
mendel-unstable
mendel-upstream-stretch
13
Performance?
https://coral.withgoogle.com/tutorials/edgetpu-faq/
!14
Let’s start from the first
demo
• USB getting started guide:

• https://coral.withgoogle.com/tutorials/accelerator/
• BasicEngine->{ClassificationEngine, DetectionEngine}, ImprintingEngine

• BasicEngine is single line

• from edgetpu.swig.edgetpu_cpp_wrapper import BasicEngine
• swig: yes, the > 20 yo SWIG

• _edgetpu_cpp_wrapper.so
!15
ClassificationEngine DetectionEngine
BasicEngine ImprintingEngine
ClassifyWithImage(img, threshold=0.1, top_k=3, resample=Image.NEAREST)
ClassifyWithInputTensor(input_tensor, threshold=0.0, top_k=3)
__dict__
…
ClassificationEngine
RunInference(input)
get_input_tensor_shape()
get_all_output_tensors_sizes()
get_num_of_output_tensors()
get_output_tensor_size()
required_input_array_size()
total_output_array_size()
model_path()
get_raw_output()
get_inference_time()
device_path()
__dict__
…
BasicEngine
What are in Engines
• BasicEngine

• input and output related

• Classification

• still I/O related

• classification specific:
resizing input image and
what to output
16
performance!
• no existing way to reproduce those numbers

• classify_image.py uses
ClassificationEngine.ClassifyWithImage()

• ClassifyWithImage() —>
ClassifyWithInputTensors() —>
RunInference()

• preprocessing: image resize time

• post-processing: top_k and finding labels/
classes

• BasicEngine.get_inference_time() returns
something I cannot understand

• modified label_image.py (and
object_detection) for TFLite

• quite close
https://github.com/freedomtan/edge_tpu_python_scripts
17
numbers in a git repo
• numbers and scripts

•
18
inception_v1_224_quant.tflite 412.79
inception_v1_224_quant_edget
pu.tflite
4.00
inception_v4_299_quant.tflite 3328.34
inception_v4_299_quant_edget
pu.tflite
100.33
mobilenet_ssd_v1_coco_quant
_postprocess.tflite
391.34
mobilenet_ssd_v1_coco_quant
_postprocess_edgetpu.tflite
14.83
mobilenet_ssd_v2_coco_quant
_postprocess.tflite
355.48
mobilenet_ssd_v2_coco_quant
_postprocess_edgetpu.tflite
16.92
mobilenet_ssd_v2_face_quant
_postprocess.tflite
369.02
mobilenet_ssd_v2_face_quant
_postprocess_edgetpu.tflite
7.78
mobilenet_v1_1.0_224_quant.t
flite
184.99
mobilenet_v1_1.0_224_quant_
edgetpu.tflite
2.22
mobilenet_v2_1.0_224_quant.t
flite
160.94
mobilenet_v2_1.0_224_quant_
edgetpu.tflite
2.56
• benchmarks/basic_engine_benchmarks.py[Added - diff]
• benchmarks/classification_benchmarks.py[Added - diff]
• benchmarks/detection_benchmarks.py[Added - diff]
• benchmarks/imprinting_benchmarks.py[Added - diff]
• benchmarks/multiple_tpus_performance_analysis.py[Added - diff]
• benchmarks/reference/basic_engine_reference_aarch64.csv[Added - diff]
• benchmarks/reference/basic_engine_reference_rp3b+.csv[Added - diff]
• benchmarks/reference/basic_engine_reference_rp3b.csv[Added - diff]
• benchmarks/reference/basic_engine_reference_x86_64.csv[Added - diff]
• benchmarks/reference/classification_reference_aarch64.csv[Added - diff]
• benchmarks/reference/classification_reference_rp3b+.csv[Added - diff]
• benchmarks/reference/classification_reference_rp3b.csv[Added - diff]
• benchmarks/reference/classification_reference_x86_64.csv[Added - diff]
• benchmarks/reference/detection_reference_aarch64.csv[Added - diff]
• benchmarks/reference/detection_reference_rp3b+.csv[Added - diff]
• benchmarks/reference/detection_reference_rp3b.csv[Added - diff]
• benchmarks/reference/detection_reference_x86_64.csv[Added - diff]
• benchmarks/reference/imprinting_reference_aarch64.csv[Added - diff]
• benchmarks/reference/imprinting_reference_rp3b+.csv[Added - diff]
• benchmarks/reference/imprinting_reference_rp3b.csv[Added - diff]
• benchmarks/reference/imprinting_reference_x86_64.csv[Added - diff]
https://coral.googlesource.com/edgetpu/+/refs/heads/release-chef
Comparing with NCS 2
!19
device
MobileNet V1
1.0/224
MobileNet V2
1.0/224
Inception V3 ResNet 50 SqueezeNet 1.1
MobileNet V1
0.25/128
SSD MobileNet
V1 COCO
SSD MobileNet
V2 COCO
Coral: Edge
TPU
2.74 2.87 43.27 42.41 1.90 1.11 10.05 12.48
NCS 2 (fp16) 12.11 14.87 52.25 33.1 3.99 4.08 23.53 39.11
iPhone Xs Max
(Neural Engine
accelerated,
fp16)
1.74 2.15 8.65 6.91 1.75 1.16
Mobilenet V1/V2 and SSD Mobilenet V1/V2 are quite good
• Edge TPU: my scripts, https://github.com/freedomtan/edge_tpu_python_scripts
• NCS 2: ./benchmark_app-d MYRIAD -niter 50 -nireq 10 ..
• iPhone Xs Max: my CoreML benchmark, https://github.com/freedomtan/coremlbenchmark
0
2
4
6
8
10
12
14
time(ms)
Mobilenet V1: Edge TPU and NCS2
ncs2 mobilenet_v1_0.25 ncs2 mobilenet_v1_0.5 ncs2 mobilenet_v1_0.75 ncs2 mobilenet_v1_1.0
coral mobilenet_v1_0.25 coral mobilenet_v1_0.5 coral mobilenet_v1_0.75 coral mobilenet_v1_1.0
Mobilenet V1 on EdgeTPU
and NCS2
20
inference time size=128x128 size=160x160 size=192x192 size=224x224
ncs2
mobilenet_v1_0
.25
3.83 3.95 4.06 4.4
ncs2
mobilenet_v1_0
.5
4.98 4.86 5.51 6.51
ncs2
mobilenet_v1_0
.75
6.04 6.67 7.93 9.4
ncs2
mobilenet_v1_1
.0
7.43 8.68 10.13 12.2
coral
mobilenet_v1_0
.25
1.07 1.24 1.30 1.47
coral
mobilenet_v1_0
.5
1.16 1.40 1.53 1.95
coral
mobilenet_v1_0
.75
1.29 1.70 1.80 2.16
coral
mobilenet_v1_1
.0
1.50 1.95 2.15 2.85
https://www.tensorflow.org/lite/images/convert/workflow.svg
https://coral.withgoogle.com/docs/edgetpu/models-intro/• It’s said Edge TPU supports
TFLite

• well, not running TFLite
models directly
Edge TPU’s canned model
!21
Edge TPU’s canned model
• What do you mean by single
custom op
The compiler creates a single custom op for all Edge TPU
compatible ops; anything else stays the same
https://coral.withgoogle.com/docs/edgetpu/models-intro/
22
MobileNet V1 1×224×224×3
1×1001
edgetpu-custom-op
input
Softmax
1×300×300×3
1×1917×91
1×10×4 1×10 1×10 1
edgetpu-custom-op
TFLite_Detection_PostProcess
3 1917×4
normalized_input_image_tensor
TFLite_Detection_PostProcess TFLite_Detection_PostProcess:1 TFLite_Detection_PostProcess:2 TFLite_Detection_PostProcess:3
SSD MobileNet V1
Beyond Python
• _edgetpu_cpp_wrapper.so

• TensorFlow Lite runtime and others

• let’s take a look at _wrap_new_BasicEngine: aiy::BasicEngine::BasicEngine()
• aiy::BasicEngine::RunInference() —>
aiy::BasicEngine::RunInferenceHelper() —>
tflite::Interpreter::Invoke()
• unresolved edgetpu::EdgeTpuManager::GetSingleton()

• libedgetpu.so

• OpenSSL, Edge TPU context, communicating with the Edge TPU via USB or PCI

• edgetpu::EdgeTpuManager::GetSingleton()
• platforms::darwinn::tflite::EdgeTpuManagerDirect::GetSingleton()
!23
Edge TPU C++ API
• Released on April 11th, 2019

• binaries for x86_64, aarch64, and armeabi-v7a

• a simple header file

• two simple examples

• some doc at https://coral.withgoogle.com/docs/edgetpu/api-cpp/

• Native build on Dev Board

• the Dev Board is a quad-CA53 board, surely we can build code on it

• a small aarch64 patch https://github.com/tensorflow/tensorflow/commit/5520a9d82e5,
https://github.com/tensorflow/tensorflow/pull/16175

• https://github.com/freedomtan/edgetpu-native, label_image for tflite ported
!24
Edge TPU C++ API
•class EdgeTpuManager
•static EdgeTpuManager* GetSingleton();
•3 different
std::unique_ptr<EdgeTpuContext>
NewEdgeTpuContext()
•std::vector<DeviceEnumerationRecord>
EnumerateEdgeTpu()
•TfLiteStatus SetVerbosity(int verbosity)
•std::string Version()
• let’s take a look at ‘-v’ logs

• https://drive.google.com/
drive/folders/1-
MhGIgWHuhbKM6XrhPqyuLJ
DzoLD1t2g?usp=sharing

• in short, USB ones seem to
have more overhead
25
https://github.com/freedomtan/edgetpu-native/blob/label_image/libedgetpu/edgetpu.h#L110-
L158
1×224×224×3
1×1×1×1024
1×1×1×1024
1×1×1×5
1×5
1×5
edgetpu-custom-op
L2Normalization
Conv2D
weights 5×1×1×1024
bias 5
Reshape
Softmax
input
Output
Imprinting Engine
• Yes, let’s check what it is

• The Imprinting Engine implements a low-shot learning technique
called ‘Imprinted Weights’ [1][2]

• Can be used to retrain classifiers on-device (either on USB
Accelerator or Dev Board), no back-propagation gradient involved.

• Why?

• Transfer-learning happens on-device, at near-realtime speed.

• You don't need to recompile the model.

• Limitations

• Training data size is limited to a max of 200 images per class.

• It is most suitable only for datasets that have a small inner
class variation.

• The last fully-connected layer runs on the CPU, not the Edge
TPU. So it will be slightly less efficient than running a pre-
compiled on Edge TPU.

• if you are interested in it, check the paper and
aiy::learn::imprinting::ImprintingEngine::Train(un
signed char const*, int, int)
26
[1] https://coral.withgoogle.com/docs/edgetpu/retrain-classification-ondevice/

[2] https://arxiv.org/abs/1712.07136
1×224×224×3
1×1×1×1024
edgetpu-custom-op
input
AvgPool
PCIe device?
• it’s Linux

• `uname -a`: Linux hopeful-nexus 4.9.51-imx #1 SMP
PREEMPT Thu Jan 31 01:58:26 UTC 2019 aarch64
GNU/Linux

• there is /proc/config.gz

• $ zcat /proc/config.gz | grep -i
edge
• CONFIG_SND_GOOGLE_EDGETPU_CARD=y
!27
PCIe Device
• apex driver is in gasket
• https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/
drivers/staging/gasket
• It’s upstreamed last year already
• https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/
drivers/staging/gasket/apex_driver.c
!28
Global Unichip Corp
USB Vendor id 0x1a6e = “Global Unichip Corp”
PCI Vendor id 0x1ac1 = “Global Unichip Corp”
!29
USB Accelerator opened
https://twitter.com/generuso/status/1111733195244998656
!30
MCU on USB Accelerator
!31
https://www.seeedstudio.com/Coral-USB-Accelerator-p-2899.html
Power Consumption of the
USB Accelerator
• 4.94 x 0.18 ~= 0.9 W

• running Mobilenet-SSD
https://twitter.com/exsiva/status/1108692847719407616
32
Architecture of Edge TPU?
• Nope, I didn’t read it. Just
FYR

• https://patents.google.com/
patent/US20190050717A1/
33
Concluding Remarks
• Edge TPU is quite good for small models that you can converted to canned
ones

• Quantized UINT8

• not so good for some common larger models, e.g., Inception V3 and
ResNet 50

• your USB and CPU could be problems

• on-device re-training looks promising

• NCS 2 supports much more models for now

• How about NVIDIA Jetson Nano? Dunno, let’s wait and see. I don’t believe
GPU will win in the on long run.
!34
questions?
!35
!36
~ $99.00

Más contenido relacionado

La actualidad más candente

Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with GpuRohit Khatana
 
Embedded Systems: Lecture 13: Introduction to GNU Toolchain (Build Tools)
Embedded Systems: Lecture 13: Introduction to GNU Toolchain (Build Tools)Embedded Systems: Lecture 13: Introduction to GNU Toolchain (Build Tools)
Embedded Systems: Lecture 13: Introduction to GNU Toolchain (Build Tools)Ahmed El-Arabawy
 
linux device driver
linux device driverlinux device driver
linux device driverRahul Batra
 
Introduction to TensorFlow Lite
Introduction to TensorFlow Lite Introduction to TensorFlow Lite
Introduction to TensorFlow Lite Koan-Sin Tan
 
Embedded Systems: Lecture 1: Course Overview
Embedded Systems: Lecture 1: Course OverviewEmbedded Systems: Lecture 1: Course Overview
Embedded Systems: Lecture 1: Course OverviewAhmed El-Arabawy
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machineAlexei Starovoitov
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and moreBrendan Gregg
 
Virtualization Support in ARMv8+
Virtualization Support in ARMv8+Virtualization Support in ARMv8+
Virtualization Support in ARMv8+Aananth C N
 
X / DRM (Direct Rendering Manager) Architectural Overview
X / DRM (Direct Rendering Manager) Architectural OverviewX / DRM (Direct Rendering Manager) Architectural Overview
X / DRM (Direct Rendering Manager) Architectural OverviewMoriyoshi Koizumi
 

La actualidad más candente (20)

OpenCL Heterogeneous Parallel Computing
OpenCL Heterogeneous Parallel ComputingOpenCL Heterogeneous Parallel Computing
OpenCL Heterogeneous Parallel Computing
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with Gpu
 
Embedded Systems: Lecture 13: Introduction to GNU Toolchain (Build Tools)
Embedded Systems: Lecture 13: Introduction to GNU Toolchain (Build Tools)Embedded Systems: Lecture 13: Introduction to GNU Toolchain (Build Tools)
Embedded Systems: Lecture 13: Introduction to GNU Toolchain (Build Tools)
 
linux device driver
linux device driverlinux device driver
linux device driver
 
GitHub Presentation
GitHub PresentationGitHub Presentation
GitHub Presentation
 
Advanced C - Part 2
Advanced C - Part 2Advanced C - Part 2
Advanced C - Part 2
 
Linux programming - Getting self started
Linux programming - Getting self started Linux programming - Getting self started
Linux programming - Getting self started
 
Introduction to TensorFlow Lite
Introduction to TensorFlow Lite Introduction to TensorFlow Lite
Introduction to TensorFlow Lite
 
Linux Systems: Getting started with setting up an Embedded platform
Linux Systems: Getting started with setting up an Embedded platformLinux Systems: Getting started with setting up an Embedded platform
Linux Systems: Getting started with setting up an Embedded platform
 
Embedded Systems: Lecture 1: Course Overview
Embedded Systems: Lecture 1: Course OverviewEmbedded Systems: Lecture 1: Course Overview
Embedded Systems: Lecture 1: Course Overview
 
Introduction to Embedded Systems
Introduction to Embedded Systems Introduction to Embedded Systems
Introduction to Embedded Systems
 
Embedded Linux Kernel - Build your custom kernel
Embedded Linux Kernel - Build your custom kernelEmbedded Linux Kernel - Build your custom kernel
Embedded Linux Kernel - Build your custom kernel
 
Microkernel Evolution
Microkernel EvolutionMicrokernel Evolution
Microkernel Evolution
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
Qemu
QemuQemu
Qemu
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Virtualization Support in ARMv8+
Virtualization Support in ARMv8+Virtualization Support in ARMv8+
Virtualization Support in ARMv8+
 
X / DRM (Direct Rendering Manager) Architectural Overview
X / DRM (Direct Rendering Manager) Architectural OverviewX / DRM (Direct Rendering Manager) Architectural Overview
X / DRM (Direct Rendering Manager) Architectural Overview
 
Interview preparation workshop
Interview preparation workshopInterview preparation workshop
Interview preparation workshop
 
Linux Internals - Part I
Linux Internals - Part ILinux Internals - Part I
Linux Internals - Part I
 

Similar a A Peek into Google's Edge TPU

Go & multi platform GUI Trials and Errors
Go & multi platform GUI Trials and ErrorsGo & multi platform GUI Trials and Errors
Go & multi platform GUI Trials and ErrorsYoshiki Shibukawa
 
Mozilla chirimen firefox os dwika v5
Mozilla chirimen firefox os dwika v5Mozilla chirimen firefox os dwika v5
Mozilla chirimen firefox os dwika v5Dwika Sudrajat
 
Linux Perf Tools
Linux Perf ToolsLinux Perf Tools
Linux Perf ToolsRaj Pandey
 
Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Koan-Sin Tan
 
“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...
“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...
“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...Edge AI and Vision Alliance
 
APIs in production - we built it, can we fix it?
APIs in production - we built it, can we fix it?APIs in production - we built it, can we fix it?
APIs in production - we built it, can we fix it?Martin Gutenbrunner
 
Insertable Streams and E2EE @ ClueCon2020
Insertable Streams and E2EE @ ClueCon2020Insertable Streams and E2EE @ ClueCon2020
Insertable Streams and E2EE @ ClueCon2020Lorenzo Miniero
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
ITCamp 2013 - Alessandro Pilotti - Git crash course for Visual Studio devs
ITCamp 2013 - Alessandro Pilotti - Git crash course for Visual Studio devsITCamp 2013 - Alessandro Pilotti - Git crash course for Visual Studio devs
ITCamp 2013 - Alessandro Pilotti - Git crash course for Visual Studio devsITCamp
 
Paving the way with Jakarta EE and apache TomEE at cloudconferenceday
Paving the way with Jakarta EE and apache TomEE at cloudconferencedayPaving the way with Jakarta EE and apache TomEE at cloudconferenceday
Paving the way with Jakarta EE and apache TomEE at cloudconferencedayCésar Hernández
 
Pavimentando el Camino con Jakarta EE 9 y Apache TomEE 9.0.0
Pavimentando el Camino con Jakarta EE 9 y Apache TomEE 9.0.0Pavimentando el Camino con Jakarta EE 9 y Apache TomEE 9.0.0
Pavimentando el Camino con Jakarta EE 9 y Apache TomEE 9.0.0César Hernández
 
Continuous Go Profiling & Observability
Continuous Go Profiling & ObservabilityContinuous Go Profiling & Observability
Continuous Go Profiling & ObservabilityScyllaDB
 
20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo Omura
20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo Omura20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo Omura
20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo OmuraPreferred Networks
 
Flutter Festival - Session 1
Flutter Festival - Session 1Flutter Festival - Session 1
Flutter Festival - Session 1AmanVerma36049
 
From DTrace to Linux
From DTrace to LinuxFrom DTrace to Linux
From DTrace to LinuxBrendan Gregg
 
TFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesKoan-Sin Tan
 
Machine Learning in Google I/O 19
Machine Learning in Google I/O 19Machine Learning in Google I/O 19
Machine Learning in Google I/O 19Jeongkyu Shin
 
You didnt see it’s coming? "Dawn of hardened Windows Kernel"
You didnt see it’s coming? "Dawn of hardened Windows Kernel" You didnt see it’s coming? "Dawn of hardened Windows Kernel"
You didnt see it’s coming? "Dawn of hardened Windows Kernel" Peter Hlavaty
 

Similar a A Peek into Google's Edge TPU (20)

Go & multi platform GUI Trials and Errors
Go & multi platform GUI Trials and ErrorsGo & multi platform GUI Trials and Errors
Go & multi platform GUI Trials and Errors
 
Mozilla chirimen firefox os dwika v5
Mozilla chirimen firefox os dwika v5Mozilla chirimen firefox os dwika v5
Mozilla chirimen firefox os dwika v5
 
Linux Perf Tools
Linux Perf ToolsLinux Perf Tools
Linux Perf Tools
 
Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020
 
“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...
“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...
“TensorFlow Lite for Microcontrollers (TFLM): Recent Developments,” a Present...
 
APIs in production - we built it, can we fix it?
APIs in production - we built it, can we fix it?APIs in production - we built it, can we fix it?
APIs in production - we built it, can we fix it?
 
A Peek into TFRT
A Peek into TFRTA Peek into TFRT
A Peek into TFRT
 
Insertable Streams and E2EE @ ClueCon2020
Insertable Streams and E2EE @ ClueCon2020Insertable Streams and E2EE @ ClueCon2020
Insertable Streams and E2EE @ ClueCon2020
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
ITCamp 2013 - Alessandro Pilotti - Git crash course for Visual Studio devs
ITCamp 2013 - Alessandro Pilotti - Git crash course for Visual Studio devsITCamp 2013 - Alessandro Pilotti - Git crash course for Visual Studio devs
ITCamp 2013 - Alessandro Pilotti - Git crash course for Visual Studio devs
 
Paving the way with Jakarta EE and apache TomEE at cloudconferenceday
Paving the way with Jakarta EE and apache TomEE at cloudconferencedayPaving the way with Jakarta EE and apache TomEE at cloudconferenceday
Paving the way with Jakarta EE and apache TomEE at cloudconferenceday
 
Pavimentando el Camino con Jakarta EE 9 y Apache TomEE 9.0.0
Pavimentando el Camino con Jakarta EE 9 y Apache TomEE 9.0.0Pavimentando el Camino con Jakarta EE 9 y Apache TomEE 9.0.0
Pavimentando el Camino con Jakarta EE 9 y Apache TomEE 9.0.0
 
Continuous Go Profiling & Observability
Continuous Go Profiling & ObservabilityContinuous Go Profiling & Observability
Continuous Go Profiling & Observability
 
20190423 meetup japan_public
20190423 meetup japan_public20190423 meetup japan_public
20190423 meetup japan_public
 
20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo Omura
20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo Omura20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo Omura
20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo Omura
 
Flutter Festival - Session 1
Flutter Festival - Session 1Flutter Festival - Session 1
Flutter Festival - Session 1
 
From DTrace to Linux
From DTrace to LinuxFrom DTrace to Linux
From DTrace to Linux
 
TFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU Delegates
 
Machine Learning in Google I/O 19
Machine Learning in Google I/O 19Machine Learning in Google I/O 19
Machine Learning in Google I/O 19
 
You didnt see it’s coming? "Dawn of hardened Windows Kernel"
You didnt see it’s coming? "Dawn of hardened Windows Kernel" You didnt see it’s coming? "Dawn of hardened Windows Kernel"
You didnt see it’s coming? "Dawn of hardened Windows Kernel"
 

Más de Koan-Sin Tan

running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on androidKoan-Sin Tan
 
Exploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsExploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsKoan-Sin Tan
 
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolExploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolKoan-Sin Tan
 
A Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlowA Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlowKoan-Sin Tan
 
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Koan-Sin Tan
 
open source nn frameworks on cellphones
open source nn frameworks on cellphonesopen source nn frameworks on cellphones
open source nn frameworks on cellphonesKoan-Sin Tan
 
Tensorflow on Android
Tensorflow on AndroidTensorflow on Android
Tensorflow on AndroidKoan-Sin Tan
 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016Koan-Sin Tan
 
A peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserA peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserKoan-Sin Tan
 
Android Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchAndroid Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchKoan-Sin Tan
 
Understanding Android Benchmarks
Understanding Android BenchmarksUnderstanding Android Benchmarks
Understanding Android BenchmarksKoan-Sin Tan
 
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsDark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsKoan-Sin Tan
 
Smalltalk and ruby - 2012-12-08
Smalltalk and ruby  - 2012-12-08Smalltalk and ruby  - 2012-12-08
Smalltalk and ruby - 2012-12-08Koan-Sin Tan
 

Más de Koan-Sin Tan (14)

running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
 
Exploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsExploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source Tools
 
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolExploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
 
A Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlowA Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlow
 
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
 
open source nn frameworks on cellphones
open source nn frameworks on cellphonesopen source nn frameworks on cellphones
open source nn frameworks on cellphones
 
Caffe2 on Android
Caffe2 on AndroidCaffe2 on Android
Caffe2 on Android
 
Tensorflow on Android
Tensorflow on AndroidTensorflow on Android
Tensorflow on Android
 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016
 
A peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserA peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk User
 
Android Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchAndroid Wear and the Future of Smartwatch
Android Wear and the Future of Smartwatch
 
Understanding Android Benchmarks
Understanding Android BenchmarksUnderstanding Android Benchmarks
Understanding Android Benchmarks
 
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsDark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
 
Smalltalk and ruby - 2012-12-08
Smalltalk and ruby  - 2012-12-08Smalltalk and ruby  - 2012-12-08
Smalltalk and ruby - 2012-12-08
 

Último

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Último (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

A Peek into Google's Edge TPU

  • 1. A Peek into Google’s Edge TPU Koan-Sin Tan freedom@computer.org April 18th, 2019 Hsinchu Coding Serfs Meeting 1
  • 2. Who Am I? • An old programmer, learned to use “open source” stuff on VAX-11/780 running 4.3BSD before the term “open source” was coined • TensorFlow Contributor • Search “Koan-Sin" at https://github.com/ tensorflow/tensorflow/releases • PRs, https://github.com/tensorflow/ tensorflow/pulls? utf8=%E2%9C%93&q=is%3Apr+author %3Afreedomtan+ • Contributing to TensorFlow is quite easy. There are many typos :-) • Interested in using NN on edge devices. so learned TFLite • label_image for TFLite 2
  • 4. Google Edge TPU • Announced in Google Next 2018 (July, 2018) • Available to general developers right before TensorFlow Dev Summit 2019 (Mar, 2019) • USB: Coral Accelerator • Dev Board: Coral Dev Board • More are coming, e.g., PCI-E Accelerator and SOM • Supported framework: TFLite https://coral.withgoogle.com/products/ 4
  • 5. • Updates released on April 11th, 2019 • Compiler: removed the restriction for specific architectures • New TensorFlow Lite C++ API • Updated Python API, mainly for multiple Edge TPUs • Updated Mendel OS and Mendel Management Tool (MDT) tool • Environmental Sensor Board, https://coral.withgoogle.com/products/ environmental/ https://developers.googleblog.com/2019/04/updates-from-coral-new-compiler-and.html https://coral.withgoogle.com/news/updates-04-2019/ !5
  • 6. biology hobbyist in Edge TPU team? !6 https://en.wikipedia.org/wiki/Coral https://en.wikipedia.org/wiki/Charles_Darwin https://en.wikipedia.org/wiki/HMS_Beagle https://en.wikipedia.org/wiki/Gregor_Mendel
  • 7. Coral USB Accelerator • USB 3.1 (gen 1) port and cable (SuperSpeed, 5Gb/s transfer speed) • MobileNet V1 1.0 224 quantized: ~ 4.3 MiB, • Recommended operating conditions • • https://coral.withgoogle.com/tutorials/accelerator-datasheet/ • https://coral.withgoogle.com/tutorials/accelerator/ 4.3 * 106 * 8/(5 * 109 ) ≈ 70μs Operating frequency Max ambient temperature Default 35°C Maximum 25°C • Software environment • Linux computer with a USB Port • Debian 6.0 or higher, or any derivative thereof (such as Ubuntu 10.0+) • System architecture of either x86_64 or ARM64 with ARMv8 instruction set • Some caveats • USB 2.0 hurts • With newer Ubuntu, you have to modify the installation script • actually, ARMv7 also works 7
  • 9. !9
  • 10. Coral Dev Board • Edge TPU Module (SOM) ◦ NXP i.MX 8M SOC (Quad-core Cortex-A53, plus Cortex-M4F) ◦ Google Edge TPU ML accelerator coprocessor ◦ Cryptographic coprocessor ◦ Wi-Fi 2x2 MIMO (802.11b/g/n/ac 2.4/5GHz) ◦ Bluetooth 4.1 ◦ 8GB eMMC ◦ 1GB LPDDR4 • USB connections ◦ USB Type-C power port (5V DC) ◦ USB 3.0 Type-C OTG port ◦ USB 3.0 Type-A host port ◦ USB 2.0 Micro-B serial console port • Audio connections ◦ 3.5mm audio jack (CTIA compliant) ◦ Digital PDM microphone (x2) ◦ 2.54mm 4-pin terminal for stereo speakers • Video connections ◦ HDMI 2.0a (full size) ◦ 39-pin FFC connector for MIPI DSI display (4-lane) ◦ 24-pin FFC connector for MIPI CSI-2 camera (4-lane) • MicroSD card slot • Gigabit Ethernet port • 40-pin GPIO expansion header • Supports Mendel Linux (derivative of Debian) https://coral.withgoogle.com/tutorials/devboard-datasheet/ https://www.blog.google/products/google-cloud/bringing-intelligence-to-the-edge-with-cloud-iot/10
  • 11. Mendel Linux? • https://pypi.org/project/ mendel-development-tool/ • https:// coral.googlesource.com/ mdt.git • 404, several weeks ago • now it’s there • actually, there are lots more information at https:// coral.googlesource.com/, let’s look at them later https://pypi.org/project/mendel-development-tool/ 11
  • 12. Mendel Linux • It’s Debian-based one, apt tools can tell us many things • And take a look at /etc/apt/sources.list. Yup, it’s there • https://packages.cloud.google.com/apt/dists/mendel-bsp- enterprise-beaker/main • https://packages.cloud.google.com/apt/dists/mendel- beaker/main !12
  • 15. Let’s start from the first demo • USB getting started guide: • https://coral.withgoogle.com/tutorials/accelerator/ • BasicEngine->{ClassificationEngine, DetectionEngine}, ImprintingEngine • BasicEngine is single line • from edgetpu.swig.edgetpu_cpp_wrapper import BasicEngine • swig: yes, the > 20 yo SWIG • _edgetpu_cpp_wrapper.so !15 ClassificationEngine DetectionEngine BasicEngine ImprintingEngine
  • 16. ClassifyWithImage(img, threshold=0.1, top_k=3, resample=Image.NEAREST) ClassifyWithInputTensor(input_tensor, threshold=0.0, top_k=3) __dict__ … ClassificationEngine RunInference(input) get_input_tensor_shape() get_all_output_tensors_sizes() get_num_of_output_tensors() get_output_tensor_size() required_input_array_size() total_output_array_size() model_path() get_raw_output() get_inference_time() device_path() __dict__ … BasicEngine What are in Engines • BasicEngine • input and output related • Classification • still I/O related • classification specific: resizing input image and what to output 16
  • 17. performance! • no existing way to reproduce those numbers • classify_image.py uses ClassificationEngine.ClassifyWithImage() • ClassifyWithImage() —> ClassifyWithInputTensors() —> RunInference() • preprocessing: image resize time • post-processing: top_k and finding labels/ classes • BasicEngine.get_inference_time() returns something I cannot understand • modified label_image.py (and object_detection) for TFLite • quite close https://github.com/freedomtan/edge_tpu_python_scripts 17
  • 18. numbers in a git repo • numbers and scripts • 18 inception_v1_224_quant.tflite 412.79 inception_v1_224_quant_edget pu.tflite 4.00 inception_v4_299_quant.tflite 3328.34 inception_v4_299_quant_edget pu.tflite 100.33 mobilenet_ssd_v1_coco_quant _postprocess.tflite 391.34 mobilenet_ssd_v1_coco_quant _postprocess_edgetpu.tflite 14.83 mobilenet_ssd_v2_coco_quant _postprocess.tflite 355.48 mobilenet_ssd_v2_coco_quant _postprocess_edgetpu.tflite 16.92 mobilenet_ssd_v2_face_quant _postprocess.tflite 369.02 mobilenet_ssd_v2_face_quant _postprocess_edgetpu.tflite 7.78 mobilenet_v1_1.0_224_quant.t flite 184.99 mobilenet_v1_1.0_224_quant_ edgetpu.tflite 2.22 mobilenet_v2_1.0_224_quant.t flite 160.94 mobilenet_v2_1.0_224_quant_ edgetpu.tflite 2.56 • benchmarks/basic_engine_benchmarks.py[Added - diff] • benchmarks/classification_benchmarks.py[Added - diff] • benchmarks/detection_benchmarks.py[Added - diff] • benchmarks/imprinting_benchmarks.py[Added - diff] • benchmarks/multiple_tpus_performance_analysis.py[Added - diff] • benchmarks/reference/basic_engine_reference_aarch64.csv[Added - diff] • benchmarks/reference/basic_engine_reference_rp3b+.csv[Added - diff] • benchmarks/reference/basic_engine_reference_rp3b.csv[Added - diff] • benchmarks/reference/basic_engine_reference_x86_64.csv[Added - diff] • benchmarks/reference/classification_reference_aarch64.csv[Added - diff] • benchmarks/reference/classification_reference_rp3b+.csv[Added - diff] • benchmarks/reference/classification_reference_rp3b.csv[Added - diff] • benchmarks/reference/classification_reference_x86_64.csv[Added - diff] • benchmarks/reference/detection_reference_aarch64.csv[Added - diff] • benchmarks/reference/detection_reference_rp3b+.csv[Added - diff] • benchmarks/reference/detection_reference_rp3b.csv[Added - diff] • benchmarks/reference/detection_reference_x86_64.csv[Added - diff] • benchmarks/reference/imprinting_reference_aarch64.csv[Added - diff] • benchmarks/reference/imprinting_reference_rp3b+.csv[Added - diff] • benchmarks/reference/imprinting_reference_rp3b.csv[Added - diff] • benchmarks/reference/imprinting_reference_x86_64.csv[Added - diff] https://coral.googlesource.com/edgetpu/+/refs/heads/release-chef
  • 19. Comparing with NCS 2 !19 device MobileNet V1 1.0/224 MobileNet V2 1.0/224 Inception V3 ResNet 50 SqueezeNet 1.1 MobileNet V1 0.25/128 SSD MobileNet V1 COCO SSD MobileNet V2 COCO Coral: Edge TPU 2.74 2.87 43.27 42.41 1.90 1.11 10.05 12.48 NCS 2 (fp16) 12.11 14.87 52.25 33.1 3.99 4.08 23.53 39.11 iPhone Xs Max (Neural Engine accelerated, fp16) 1.74 2.15 8.65 6.91 1.75 1.16 Mobilenet V1/V2 and SSD Mobilenet V1/V2 are quite good • Edge TPU: my scripts, https://github.com/freedomtan/edge_tpu_python_scripts • NCS 2: ./benchmark_app-d MYRIAD -niter 50 -nireq 10 .. • iPhone Xs Max: my CoreML benchmark, https://github.com/freedomtan/coremlbenchmark
  • 20. 0 2 4 6 8 10 12 14 time(ms) Mobilenet V1: Edge TPU and NCS2 ncs2 mobilenet_v1_0.25 ncs2 mobilenet_v1_0.5 ncs2 mobilenet_v1_0.75 ncs2 mobilenet_v1_1.0 coral mobilenet_v1_0.25 coral mobilenet_v1_0.5 coral mobilenet_v1_0.75 coral mobilenet_v1_1.0 Mobilenet V1 on EdgeTPU and NCS2 20 inference time size=128x128 size=160x160 size=192x192 size=224x224 ncs2 mobilenet_v1_0 .25 3.83 3.95 4.06 4.4 ncs2 mobilenet_v1_0 .5 4.98 4.86 5.51 6.51 ncs2 mobilenet_v1_0 .75 6.04 6.67 7.93 9.4 ncs2 mobilenet_v1_1 .0 7.43 8.68 10.13 12.2 coral mobilenet_v1_0 .25 1.07 1.24 1.30 1.47 coral mobilenet_v1_0 .5 1.16 1.40 1.53 1.95 coral mobilenet_v1_0 .75 1.29 1.70 1.80 2.16 coral mobilenet_v1_1 .0 1.50 1.95 2.15 2.85
  • 21. https://www.tensorflow.org/lite/images/convert/workflow.svg https://coral.withgoogle.com/docs/edgetpu/models-intro/• It’s said Edge TPU supports TFLite • well, not running TFLite models directly Edge TPU’s canned model !21
  • 22. Edge TPU’s canned model • What do you mean by single custom op The compiler creates a single custom op for all Edge TPU compatible ops; anything else stays the same https://coral.withgoogle.com/docs/edgetpu/models-intro/ 22 MobileNet V1 1×224×224×3 1×1001 edgetpu-custom-op input Softmax 1×300×300×3 1×1917×91 1×10×4 1×10 1×10 1 edgetpu-custom-op TFLite_Detection_PostProcess 3 1917×4 normalized_input_image_tensor TFLite_Detection_PostProcess TFLite_Detection_PostProcess:1 TFLite_Detection_PostProcess:2 TFLite_Detection_PostProcess:3 SSD MobileNet V1
  • 23. Beyond Python • _edgetpu_cpp_wrapper.so • TensorFlow Lite runtime and others • let’s take a look at _wrap_new_BasicEngine: aiy::BasicEngine::BasicEngine() • aiy::BasicEngine::RunInference() —> aiy::BasicEngine::RunInferenceHelper() —> tflite::Interpreter::Invoke() • unresolved edgetpu::EdgeTpuManager::GetSingleton() • libedgetpu.so • OpenSSL, Edge TPU context, communicating with the Edge TPU via USB or PCI • edgetpu::EdgeTpuManager::GetSingleton() • platforms::darwinn::tflite::EdgeTpuManagerDirect::GetSingleton() !23
  • 24. Edge TPU C++ API • Released on April 11th, 2019 • binaries for x86_64, aarch64, and armeabi-v7a • a simple header file • two simple examples • some doc at https://coral.withgoogle.com/docs/edgetpu/api-cpp/ • Native build on Dev Board • the Dev Board is a quad-CA53 board, surely we can build code on it • a small aarch64 patch https://github.com/tensorflow/tensorflow/commit/5520a9d82e5, https://github.com/tensorflow/tensorflow/pull/16175 • https://github.com/freedomtan/edgetpu-native, label_image for tflite ported !24
  • 25. Edge TPU C++ API •class EdgeTpuManager •static EdgeTpuManager* GetSingleton(); •3 different std::unique_ptr<EdgeTpuContext> NewEdgeTpuContext() •std::vector<DeviceEnumerationRecord> EnumerateEdgeTpu() •TfLiteStatus SetVerbosity(int verbosity) •std::string Version() • let’s take a look at ‘-v’ logs • https://drive.google.com/ drive/folders/1- MhGIgWHuhbKM6XrhPqyuLJ DzoLD1t2g?usp=sharing • in short, USB ones seem to have more overhead 25 https://github.com/freedomtan/edgetpu-native/blob/label_image/libedgetpu/edgetpu.h#L110- L158
  • 26. 1×224×224×3 1×1×1×1024 1×1×1×1024 1×1×1×5 1×5 1×5 edgetpu-custom-op L2Normalization Conv2D weights 5×1×1×1024 bias 5 Reshape Softmax input Output Imprinting Engine • Yes, let’s check what it is • The Imprinting Engine implements a low-shot learning technique called ‘Imprinted Weights’ [1][2] • Can be used to retrain classifiers on-device (either on USB Accelerator or Dev Board), no back-propagation gradient involved. • Why? • Transfer-learning happens on-device, at near-realtime speed. • You don't need to recompile the model. • Limitations • Training data size is limited to a max of 200 images per class. • It is most suitable only for datasets that have a small inner class variation. • The last fully-connected layer runs on the CPU, not the Edge TPU. So it will be slightly less efficient than running a pre- compiled on Edge TPU. • if you are interested in it, check the paper and aiy::learn::imprinting::ImprintingEngine::Train(un signed char const*, int, int) 26 [1] https://coral.withgoogle.com/docs/edgetpu/retrain-classification-ondevice/ [2] https://arxiv.org/abs/1712.07136 1×224×224×3 1×1×1×1024 edgetpu-custom-op input AvgPool
  • 27. PCIe device? • it’s Linux • `uname -a`: Linux hopeful-nexus 4.9.51-imx #1 SMP PREEMPT Thu Jan 31 01:58:26 UTC 2019 aarch64 GNU/Linux • there is /proc/config.gz • $ zcat /proc/config.gz | grep -i edge • CONFIG_SND_GOOGLE_EDGETPU_CARD=y !27
  • 28. PCIe Device • apex driver is in gasket • https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/ drivers/staging/gasket • It’s upstreamed last year already • https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/ drivers/staging/gasket/apex_driver.c !28
  • 29. Global Unichip Corp USB Vendor id 0x1a6e = “Global Unichip Corp” PCI Vendor id 0x1ac1 = “Global Unichip Corp” !29
  • 31. MCU on USB Accelerator !31 https://www.seeedstudio.com/Coral-USB-Accelerator-p-2899.html
  • 32. Power Consumption of the USB Accelerator • 4.94 x 0.18 ~= 0.9 W • running Mobilenet-SSD https://twitter.com/exsiva/status/1108692847719407616 32
  • 33. Architecture of Edge TPU? • Nope, I didn’t read it. Just FYR • https://patents.google.com/ patent/US20190050717A1/ 33
  • 34. Concluding Remarks • Edge TPU is quite good for small models that you can converted to canned ones • Quantized UINT8 • not so good for some common larger models, e.g., Inception V3 and ResNet 50 • your USB and CPU could be problems • on-device re-training looks promising • NCS 2 supports much more models for now • How about NVIDIA Jetson Nano? Dunno, let’s wait and see. I don’t believe GPU will win in the on long run. !34