Google Edge TPUで TensorFlow Liteを使った時に何をやっているのかを妄想してみる 2 「エッジAIモダン計測制御の世界」オフ会＠東京

Google Edge TPUで 
TensorFlow Liteを使った時に 
何をやっているのかを妄想してみる 2 
 
「エッジAIモダン計測制御の世界」オフ会＠東京 
 
作成：2019/6/30、7/13、7/15 
Slideshareにて公開：2019/7/18 
@Vengineer

ブログ (2007年～) : Vengineerの戯言 
　http://blogs.yahoo.co.jp/verification_engineer 
 
SlideShare :  
　https://www.slideshare.net/ssuser479fa3 
 
 
Twitter (2009年～) : 
＠Vengineer 
ソースコード解析職人

CQ出版社：雑誌インターフェース 2017年8月号～2019年8月号に 
「TensorFlow XLA および Lite」 
に関することを寄稿しました
2017年8月号 2017年9月号 2018年2月号 2018年8月号 2019年1月号
XLA AOT XLA AOT XLA JIT Lite & XLA Lite
2019年8月号
Lite 
edge TPU 
New

本資料は、 
 
「TFUG ハード部：Jetson Nano, Edge TPU &
TF Lite micro 特集」@Google (2019/6/10)の
LT5分枠 
 
で話した内容に加えて、 
もう少しツッコんだ内容を追加しました。

@iwatake2222 さんが
以下のような内容を Qiita にアップしてくれています。
これを見ると、いろいろなことが分かってきますので、
是非、チェックしてみてください。
Edge TPU USB Acceleratorの解析 - 入出力データの転送
(2019.6.15～) 
 
Edge TPU USB Acceleratorの解析 - Operationとモデル構
造 (2019.6.23～)

Google Edge TPUの死角であった 
Online Compilerから 
 
Offline Compilerに！ 
 
https://coral.withgoogle.com/docs/edgetpu/compiler/

TensorFlow公式のモデルを使って変換してみた
：Hosted Models から以下の4個 
 
　・mnasnet_0.5_224.tflite 
　・mobilenet_v2_1.0_224_quant.tflite 
　・inception_v1_244_quant.tflite 
　・detect.tflite

AutoML mobile models : mnasnet_0.5_224.tflite 
　　 
Edge TPU Compiler version 1.0.249710469 
INFO: Initialized TensorFlow Lite runtime. 
Invalid model: mnasnet_0.5_224.tflite 
Model not quantized 
 
　量子化されていないのは、ダメ！ 
　　=> quantization-aware training 
Quantization and Training of Neural Networks for Efficient
Integer-Arithmetic-Only Inference

mobilenet_v2_1.0_224_quant.tflite 
　　 
　・入力モデルサイズ : 3.41MB 
　・出力モデルサイズ : 3.89MB 
　・avaiable for caching : 6.53MB 
　・On-chip memory : 3.75MB 
　・Off-chip memory : 0.00B 
 
　・Subgraph : 1、Ops : 65

inception_v1_224_quant.tflite 
　　 
　・Off-chip memory : 182.19KB 
 

Parameter data caching 
引用 
The Edge TPU has roughly 8 MB of SRAM that
can cache the model's parameter data.  
 
However, a small amount of the RAM is first reserved for the
model's inference executable, so the parameter data uses
whatever space remains after that.

モデルのパラメータ用に内部にSRAM
を8MBぐらい持っている 
 
ただし、最初の方はモデル用に使う
ので、8MB全部をモデルのパラメータ
用には使えない。

 
inception_v1_224_quant.tflite 
　・Off-chip memory : 182.19KB 
 
セーフ
アウト

続く 
 
Naturally, saving the parameter data on the Edge
TPU RAM enables faster inferencing speed
compared to fetching the parameter data from
external memory. 
 
　=> たぶん、ホスト側のシステムメモリ

モデルは、どう変換されるのか？ 
 
　・Subgraph : 1、Ops : 65 
 
　inception_v1_224_quant.tflite 

mobilenet_v2_1.0_224_quant_edgeput.tflite inception_v1_224_quant_edgeput.tflite

Edge TPUには複数のモデルを入れることができます。
これについては、
@Nextremer_nb_o さんが
以下のような内容をブログに書いてくれています。
これを見ると、いろいろなことが分かってきますので、
是非、チェックしてみてください。
Edge TPUで"Co-compiling multiple models"を試す
(2019.6.23)

edgetpu_custom_opって、何？

// EdgeTPU custom op. 
static const char kCustomOp[] = "edgetpu-custom-op"; 
 
https://coral.googlesource.com/edgetpu-native/+/refs/heads/release-chef/libedgetpu/edgetpu.h#95
どうやら、これ！

std::unique_ptr<tflite::Interpreter> BuildEdgeTpuInterpreter( 
const tflite::FlatBufferModel& model,  
edgetpu::EdgeTpuContext* edgetpu_context) {  
tflite::ops::builtin::BuiltinOpResolver resolver;  
resolver.AddCustom (edgetpu::kCustomOp, edgetpu::RegisterCustomOp());  
std::unique_ptr<tflite::Interpreter> interpreter;  
if (tflite::InterpreterBuilder(model, resolver)(&interpreter) != kTfLiteOk) {  
std::cerr << "Failed to build interpreter." << std::endl;  
} 
// Bind given context with interpreter.  
interpreter->SetExternalContext(kTfLiteEdgeTpuContext, edgetpu_context);  
interpreter->SetNumThreads(1);  
if (interpreter->AllocateTensors() != kTfLiteOk) {  
std::cerr << "Failed to allocate tensors." << std::endl;  
} 
return interpreter;  
} 
 
https://coral.googlesource.com/edgetpu-native/+/refs/heads/release-chef/edgetpu/cpp/examples/utils.cc#181

Custom operators 
 
TfLiteStatus SinPrepare(TfLiteContext* context, TfLiteNode* node) {  
　なんかいろいろやっている  
} 
　 
TfLiteStatus SinEval(TfLiteContext* context, TfLiteNode* node) {  
　なんかいろいろやっている  
} 
 
TfLiteRegistration* Register_SIN() {  
static TfLiteRegistration r = {nullptr, nullptr, SinPrepare, SinEval} ; 
return &r; 
} 
 
tflite::ops::builtin::BuiltinOpResolver builtins;  
builtins.AddCustom("Sin", Register_SIN());

edgetpu_custon_op では、　 
　　 
　TensorFlow XLAと同じように 
 
　1つのOpにまとめて、 
 
　中でなんかやっているようです。妄想

ここまでは、 
「TFUG ハード部：Jetson Nano, Edge
TPU & TF Lite micro 特集」 
にて、LTした内容です。 
 
ここからは、

ソースコード解析職人として、 
 
TensorFlow LiteのCustom Opに関する
部分を解析してみた

InterpreterBuilder::operator() 
 
 
for (int subgraph_index = 0; subgraph_index < subgraphs->Length();  
++subgraph_index) {  
const tflite::SubGraph* subgraph = (*subgraphs)[subgraph_index];  
tflite::Subgraph* modified_subgraph = 
(*interpreter)->subgraph(subgraph_index);  
auto operators = subgraph->operators();  
auto tensors = subgraph->tensors();  
 
途中略 
 
// Finally setup nodes and tensors  
if (ParseNodes(operators, modified_subgraph) != kTfLiteOk) 
return cleanup_and_error();  
if (ParseTensors(buffers, tensors, modified_subgraph) != kTfLiteOk)  
return cleanup_and_error();  
グラフをSubGraphに分割
SubGraphの中のノードをパース

InterpreterBuilder::ParseNodes 
 
TfLiteStatus InterpreterBuilder::ParseNodes(  
const flatbuffers::Vector<flatbuffers::Offset<Operator>>* operators,  
Subgraph* subgraph) {  
 
途中略 
for (int i = 0; i < operators->Length(); ++i) {  
const auto* op = operators->Get(i) ; 
途中略 
 
if (op->custom_options()) { 
subgraph->AddNodeWithParameters ( 
FlatBufferIntArrayToVector( op->inputs()), 
FlatBufferIntArrayToVector (op->outputs()), 
reinterpret_cast<const char*>( op->custom_options()->data() ), 
op->custom_options()->size() , nullptr, registration);  
} else { 
 
以降略 
Opが Custon Op の場合

tensorflow/lite/schema/schema_v0.fbs 
 
// An operator takes tensors as inputs and outputs. The type of operation being  
// performed is determined by an index into the list of valid OperatorCodes,  
// while the specifics of each operations is configured using builtin_options  
// or custom_options.  
 
table Operator { 
// Index into the operator_codes array. Using an integer here avoids  
// complicate map lookups.  
opcode_index:int; 
 
inputs:[int]; 
outputs:[int]; 
 
builtin_options:BuiltinOptions;  
custom_options:[ubyte];  
} 
Opが Custon Op の場合は、これが設定されている！

Interpreter::AddNodeWithParameters 
 
TfLiteStatus Interpreter::AddNodeWithParameters(  
const std::vector<int>& inputs, // 入力データ 
const std::vector<int>& outputs, // 出力データ 
const char* init_data, // Custom Opのデータ  
size_t init_data_size, // CUstom Opのデータサイズ  
void* builtin_data, // null 
const TfLiteRegistration* registration, // 登録関数 
int* node_index  
) { 
return primary_subgraph().AddNodeWithParameters ( 
inputs, outputs,  
init_data, init_data_size,  
builtin_data, 
registration,  
node_index); 
}

Subgraph::AddNodeWithParameters 
 
TfLiteStatus Subgraph::AddNodeWithParameters(  
const std::vector<int>& inputs, const std::vector<int>& outputs,  
const char* init_data, size_t init_data_size, void* builtin_data, 
const TfLiteRegistration* registration , int* node_index) { 
 
途中略 
 
 
if (init_data) { 
node.user_data = OpInit(*registration, init_data, init_data_size) ; 
} else { 
node.user_data =  
OpInit(*registration,  
reinterpret_cast<const char*>(builtin_data_deleter.get()), 0);  
} 
Opに対応した初期化関数を init_data と init_data_size を引数として呼び出す

Subgraph::OpInit 
 
void* OpInit(const TfLiteRegistration& op_reg ,  
const char* buffer, size_t length) { 
if (op_reg.init == nullptr) return nullptr;  
return op_reg.init(context_, buffer, length) ; 
} 
 
edgetpu.h 
 
// Returns pointer to an instance of TfLiteRegistration to handle  
// EdgeTPU custom ops, to be used with  
// tflite::ops::builtin::BuiltinOpResolver::AddCustom  
TfLiteRegistration* RegisterCustomOp();  
edge TPU の場合は、RegisterCustionOp() にて獲得した OP

TfLiteRegistraion (その1) 
 
// Initializes the op from serialized data.  
// If a built-in op:  
// `buffer` is the op's params data (TfLiteLSTMParams*).  
// `length` is zero.  
// If custom op: 
// `buffer` is the op's `custom_options`.  
// `length` is the size of the buffer.  
// 
// Returns a type-punned (i.e. void*) opaque data (e.g. a primitive pointer  
// or an instance of a struct).  
// The returned pointer will be stored with the node in the `user_data` field,  
// accessible within prepare and invoke functions below.  
// NOTE: if the data is already in the desired format, simply implement this  
// function to return `nullptr` and implement the free function to be a no-op.  
void* (*init)(TfLiteContext* context, const char* buffer, size_t length);

TfLiteRegistraion (その2) 
 
// The pointer `buffer` is the data previously returned by an init invocation.  
void (*free)(TfLiteContext* context, void* buffer);  
 
// prepare is called when the inputs this node depends on have been resized.  
// context->ResizeTensor() can be called to request output tensors to be  
// resized. 
// 
// Returns kTfLiteOk on success.  
TfLiteStatus (*prepare)(TfLiteContext* context, TfLiteNode* node);  
 
// Execute the node (should read node->inputs and output to node->outputs).  
// Returns kTfLiteOk on success.  
TfLiteStatus (*invoke)(TfLiteContext* context, TfLiteNode* node);

tflite::ops::builtin::BuiltinOpResolver resolver;  
resolver.AddCustom (edgetpu::kCustomOp, edgetpu::RegisterCustomOp()); 
} 
} 
} 
 
https://coral.googlesource.com/edgetpu-native/+/refs/heads/release-chef/edgetpu/cpp/examples/utils.cc#181

mobilenet_v2_1.0_224_quant_edgeput.tflite
edgetpu::RegisterCustomOp() 
で獲得した下記のメソッド(中で何をやってるかが分
からないが)が呼ばれる 
 
・void* (*init)(TfLiteContext* context,  
const char* buffer, size_t length);  
 
・ void (*free)(TfLiteContext* context,  
void* buffer);  
 
・TfLiteStatus (*prepare)(TfLiteContext* context,  
TfLiteNode* node);  
 
・TfLiteStatus (*invoke)(TfLiteContext* context,  
TfLiteNode* node);

Object Detectionも見てみよう

Object Detection : detect.tflite 
　　 
 

Model successfully compiled but not all operations are
supported by the Edge TPU. A percentage of the model will
instead run on the CPU, which is slower. If possible, consider
updating your model to use only operations supported by the
Edge TPU.  
 
For details, visit g.co/coral/model-reqs. 
Number of operations that will run on Edge TPU: 63 
Number of operations that will run on CPU: 1

detect_edgetpu.log 
　　 
DEPTHWISE_CONV_2D 13 Mapped to Edge TPU 
RESHAPE 13 Mapped to Edge TPU 
LOGISTIC 1 Mapped to Edge TPU 
CUSTOM 1 Operation is working on an
unsupported data type 
CONCATENATION 2 Mapped to Edge TPU 
CONV_2D 34 Mapped to Edge TPU 
 
Currently, the Edge TPU compiler cannot partition the model more than once, so
as soon as an unsupported operation occurs, that operation and everything after it
executes on the CPU, even if supported operations occur later.

"TFLite_Detection_PostProcess" がCPU側へオフロードされています

https://coral.withgoogle.com/docs/edgetpu/models-intro/#model-requirements

BuiltinOpResolver::BuiltinOpResolver() 
 
AddCustom( 
"TFLite_Detection_PostProcess", 
tflite::ops::custom::Register_DETECTION_POSTPROCESS() 
); 
 
TFLite_Detection_PostProcess Opは、 
のように、BuiltinOpResolverクラスのコンストラクター内でCustom Opとして登録され
ています。

 
tflite::ops::builtin::BuiltinOpResolver resolver; 
resolver.AddCustom (edgetpu::kCustomOp, edgetpu::RegisterCustomOp());  
} 
} 
} 
 https://coral.googlesource.com/edgetpu-native/+/refs/heads/release-chef/edgetpu/cpp/examples/utils.cc#181

detection_postprocess::Initメソッド 
 
void* Init(TfLiteContext* context, const char* buffer, size_t length ) { 
auto* op_data = new OpData;  
 
const uint8_t* buffer_t = reinterpret_cast<const uint8_t*>(buffer);  
 
const flexbuffers::Map& m = flexbuffers::GetRoot(buffer_t, length).AsMap() ; 
 
 
 
 
 
 
 
op_data->max_detections = m["max_detections"].AsInt32() ; 
op_data->max_classes_per_detection = m["max_classes_per_detection"].AsInt32();  
 
なんやこれ！

detection_postprocessの例 
 
flexbuffers::Builder fbb;  
 
fbb.Map([&]() { 
fbb.Int("max_detections", 3);  
fbb.Int("max_classes_per_detection", 1);  
fbb.Int("detections_per_class", 1);  
fbb.Bool("use_regular_nms", use_regular_nms);  
fbb.Float("nms_score_threshold", 0.0);  
fbb.Float("nms_iou_threshold", 0.5);  
fbb.Int("num_classes", 2);  
fbb.Float("y_scale", 10.0);  
fbb.Float("x_scale", 10.0);  
fbb.Float("h_scale", 5.0);  
fbb.Float("w_scale", 5.0);  
}); 
 
fbb.Finish();

detection_postprocessの例 
 
 
SetCustomOp( "TFLite_Detection_PostProcess",  
fbb.GetBuffer(), 
Register_DETECTION_POSTPROCESS ); 
 
BuildInterpreter({GetShape(input1_), GetShape(input2_), GetShape(input3_)});

まとめ 
 
Google Edge TPU は、 
 
　・TensorFlow Lite の Custom Op を利用している 
 
　・Custom Op は、edgetpu_custom_op である 
 
　・edgetpu_custon_opに渡すデータは、 
　　どうやら flatbuffer フォーマットの模様

あたしは、 
ディープラーニング職人ではありません 
コンピュータエンジニアです 
 
 
ありがとうございました
＠Vengineer 
ソースコード解析職人

Google Edge TPUで TensorFlow Liteを使った時に何をやっているのかを妄想してみる 2 「エッジAIモダン計測制御の世界」オフ会＠東京

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Google Edge TPUで TensorFlow Liteを使った時に何をやっているのかを妄想してみる 2 「エッジAIモダン計測制御の世界」オフ会＠東京

Similar a Google Edge TPUで TensorFlow Liteを使った時に何をやっているのかを妄想してみる 2 「エッジAIモダン計測制御の世界」オフ会＠東京 (20)

Más de Mr. Vengineer

Más de Mr. Vengineer (20)

Último

Último (20)