Avro2 tf: a data processing engine for tensorflow

•Descargar como PPTX, PDF•

0 recomendaciones•56 vistas

To effectively support deep learning at LinkedIn, we need to first address the data processing issues. Most of the datasets used by our ML algorithms (e.g., LinkedIn large scale personalization engine Photon-ML) are in Avro format. Each record in an Avro dataset is essentially a sparse vector, and can be easily consumed by most of the modern classifiers. However, the format cannot be directly used by TensorFlow -- the leading deep learning package. The main blocker is that the sparse vector is not in the same format as Tensor. Many companies have vast amount of ML data in similar sparse vector format, and Tensor format is still relatively new to many companies. Avro2TF bridges this gap by providing scalable Spark based transformation and extension mechanism to efficiently convert the data into TF records that can be readily consumed by TensorFlow. With this technology, engineers can improve their productivity by focusing on model building rather than data processing. In this talk, we will go over the data processing issues common to many machine learning pipelines, and how we solve the problems, then deep dive into the open sourced tool, Avro2TF. How it works, its tech architecture and usage.

Ingeniería

Avro2TF: A Data Processing
Engine for TensorFlow
Xuhong Zhang, Wensheng Sun, Chenya Zhang, Yiming Ma
AI Computing Foundation Team

Tensor Data Preparation: Avro2TF
- A data preprocessing component under TensorFlowIn.
- Read raw user input data with any format supported by Spark.
Generate tensor metadata (e.g.shape, cardinality, dtype).
Generate Avro- or TFRecord-based training data.
Make your training data ready to be consumed by TF!
2

6
Tensorize
Text
Feature
Categorical
Feature

Tensor Data Preparation: Avro2TF
Deep dive into the user config.
- Example of a tensorizeIn config.
7
columnConfig for NTV only
Array of word tokens
If need Id conversion

Tensor Data Preparation: Avro2TF
Deep dive into the user config.
Shape.
- []: scalar;
- [-1] : 1D array of any length;
- [6]: 1D array of length 6;
- [2, 3]: a matrix with 2 rows and 3 columns.
(Where is “shape” used?)
Notice:
We don’t do any reshaping here. The shape is just the shape of your raw data.
8

Tensor Data Preparation: Avro2TF
Deep dive into the user config.
- For categorical/sparse features, we require them to be represented in NTV
(name-term-value) format.
9
- We also support the following primitive types:
int, long, float, double, String, bytes (for multimedia data such as image, audio, and video), boolean,
enum, Array[NTV].

Tensor Data Preparation: Avro2TF
Deep dive into the tensorize process.
Feature Mapping Table Generation.
- Requirement:
String-based indices Numerical Id-based indices
- Usage:
1) Mapping table is limited (only based on training data). [Starting from 0]
2) If not available maps to same Id with UNK (unknown) token. [= Cardinality]
3) User can provide their own mapping table.
4) Different training can use the same mapping table.
10
Talk later!

Tensor Data Preparation: Avro2TF
Deep dive into the tensorize process.
Feature Mapping Table Generation.
- Feature: NTV format. - Feature: A list of String.
Each row = name + term Each row = single word
11

Tensor Data Preparation: Avro2TF
Deep dive into the tensor metadata.
Cardinality Computation:
- sparseVector: the number of unique name + term across all records .
- String: the number of unique Strings across all records.
- long or int: the maximum long/int value.
12

Tensor Data Preparation: Avro2TF
Deep dive into the output tensor.
- For categorical/sparse features, we provide a special data type: sparseVector.
Indices: All the ids will be put into an array.
Values: All the values in the NTV tuples will be put into an array.
- We also support the following output tensor data type:
int, long, float, double, String, boolean, bytes, sparseVector.
13

14
More Examples on Supported Data Types
Tensor Metadata
Avro2TF

Más contenido relacionado

La actualidad más candente

Base64 Encoder And Decoder Transformer With Mule ESBJitendra Bafna

Standard Library FunctionsPraveen M Jigajinni

Network Security Lec4Federal Urdu University

2CPP16 - STLMichael Heron

Python command line_14_12_2020Sugnan M

Msc 1Prof. Ihab Ali

Tools for reading papersJack Fox

Link ListBudditha Hettige

Understanding the components of standard template libraryRahul Sharma

30csharpSireesh K

12. standard library introductionVahid Heidari

Editors l21 l24Neha Pachauri

Lecture1satendrakumar499728

Introduction to programming in MATLABmustafa_92

32sql serverSireesh K

Python advanced 3.the python std lib by example –data structuresJohn(Qiang) Zhang

R data-structures-3Victor Ordu

Functions class11 cbse_notesKrishnaprasad Arumugam

(Kpi summer school 2015) theano tutorial part1Serhii Havrylov

Python advanced 3.the python std lib by example – algorithmJohn(Qiang) Zhang

La actualidad más candente (20)

Base64 Encoder And Decoder Transformer With Mule ESB

Standard Library Functions

Network Security Lec4

2CPP16 - STL

Python command line_14_12_2020

Msc 1

Tools for reading papers

Link List

Understanding the components of standard template library

30csharp

12. standard library introduction

Editors l21 l24

Lecture1

Introduction to programming in MATLAB

32sql server

Python advanced 3.the python std lib by example –data structures

R data-structures-3

Functions class11 cbse_notes

(Kpi summer school 2015) theano tutorial part1

Python advanced 3.the python std lib by example – algorithm

Similar a Avro2 tf: a data processing engine for tensorflow

Sqlmaterial 120414024230-phpapp01Lalit009kumar

Just entity frameworkMarcin Dembowski

Standardizing on a single N-dimensional array API for PythonRalf Gommers

Learn about Tensorflow for Deep Learning now! Part 1Tyrone Systems

Introduction To Using TensorFlow & Deep Learningali alemi

Think Like Spark: Some Spark Concepts and a Use CaseRachel Warren

Arduino referenceMarcos Henrique

Meetup tensorframesPaolo Platter

Theory1&2Dr.M.Karthika parthasarathy

Tamir Dresher - DotNet 7 What's new.pptxTamir Dresher

Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Chris Fregly

Terraform training 🎒 - BasicStephaneBoghossian1

Interm codegenAnshul Sharma

Mod01 tns e overviewPeter Haase

BKK16-103 OpenCSD - Open for Business!Linaro

running stable diffusion on androidKoan-Sin Tan

OSA Con 2022: Streaming Data Made EasyTimothy Spann

OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...Altinity Ltd

Keras and TensorFlowNopphawanTamkuan

Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Tyrone Systems

Similar a Avro2 tf: a data processing engine for tensorflow (20)

Sqlmaterial 120414024230-phpapp01

Just entity framework

Standardizing on a single N-dimensional array API for Python

Learn about Tensorflow for Deep Learning now! Part 1

Introduction To Using TensorFlow & Deep Learning

Think Like Spark: Some Spark Concepts and a Use Case

Arduino reference

Meetup tensorframes

Theory1&2

Tamir Dresher - DotNet 7 What's new.pptx

Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...

Terraform training 🎒 - Basic

Interm codegen

Mod01 tns e overview

BKK16-103 OpenCSD - Open for Business!

running stable diffusion on android

OSA Con 2022: Streaming Data Made Easy

OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...

Keras and TensorFlow

Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2

Último

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile

Roadmap to Membership of RICS - Pathways and RoutesM Maged Hegazy, LLM, MBA, CCP, P3O

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth

Introduction and different types of Ethernet.pptxupamatechverse

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth

UNIT-II FMM-Flow Through Circular Conduitsrknatarajan

Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Avro2 tf: a data processing engine for tensorflow

1. Avro2TF: A Data Processing Engine for TensorFlow Xuhong Zhang, Wensheng Sun, Chenya Zhang, Yiming Ma AI Computing Foundation Team

2. Tensor Data Preparation: Avro2TF - A data preprocessing component under TensorFlowIn. - Read raw user input data with any format supported by Spark. Generate tensor metadata (e.g.shape, cardinality, dtype). Generate Avro- or TFRecord-based training data. Make your training data ready to be consumed by TF! 2

3. TensorFlowIn Overview 3

4. Model Development Pipeline

5. Avro2TF Overview

6. 6 Tensorize Text Feature Categorical Feature

7. Tensor Data Preparation: Avro2TF Deep dive into the user config. - Example of a tensorizeIn config. 7 columnConfig for NTV only Array of word tokens If need Id conversion

8. Tensor Data Preparation: Avro2TF Deep dive into the user config. Shape. - []: scalar; - [-1] : 1D array of any length; - [6]: 1D array of length 6; - [2, 3]: a matrix with 2 rows and 3 columns. (Where is “shape” used?) Notice: We don’t do any reshaping here. The shape is just the shape of your raw data. 8

9. Tensor Data Preparation: Avro2TF Deep dive into the user config. - For categorical/sparse features, we require them to be represented in NTV (name-term-value) format. 9 - We also support the following primitive types: int, long, float, double, String, bytes (for multimedia data such as image, audio, and video), boolean, enum, Array[NTV].

10. Tensor Data Preparation: Avro2TF Deep dive into the tensorize process. Feature Mapping Table Generation. - Requirement: String-based indices Numerical Id-based indices - Usage: 1) Mapping table is limited (only based on training data). [Starting from 0] 2) If not available maps to same Id with UNK (unknown) token. [= Cardinality] 3) User can provide their own mapping table. 4) Different training can use the same mapping table. 10 Talk later!

11. Tensor Data Preparation: Avro2TF Deep dive into the tensorize process. Feature Mapping Table Generation. - Feature: NTV format. - Feature: A list of String. Each row = name + term Each row = single word 11

12. Tensor Data Preparation: Avro2TF Deep dive into the tensor metadata. Cardinality Computation: - sparseVector: the number of unique name + term across all records . - String: the number of unique Strings across all records. - long or int: the maximum long/int value. 12

13. Tensor Data Preparation: Avro2TF Deep dive into the output tensor. - For categorical/sparse features, we provide a special data type: sparseVector. Indices: All the ids will be put into an array. Values: All the values in the NTV tuples will be put into an array. - We also support the following output tensor data type: int, long, float, double, String, boolean, bytes, sparseVector. 13

14. 14 More Examples on Supported Data Types Tensor Metadata Avro2TF

Avro2 tf: a data processing engine for tensorflow

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Avro2 tf: a data processing engine for tensorflow

Similar a Avro2 tf: a data processing engine for tensorflow (20)

Último

Último (20)

Avro2 tf: a data processing engine for tensorflow