SlideShare una empresa de Scribd logo
1 de 14
Avro2TF: A Data Processing
Engine for TensorFlow
Xuhong Zhang, Wensheng Sun, Chenya Zhang, Yiming Ma
AI Computing Foundation Team
Tensor Data Preparation: Avro2TF
- A data preprocessing component under TensorFlowIn.
- Read raw user input data with any format supported by Spark.
Generate tensor metadata (e.g.shape, cardinality, dtype).
Generate Avro- or TFRecord-based training data.
Make your training data ready to be consumed by TF!
2
TensorFlowIn Overview
3
Model Development Pipeline
Avro2TF Overview
6
Tensorize
Text
Feature
Categorical
Feature
Tensor Data Preparation: Avro2TF
Deep dive into the user config.
- Example of a tensorizeIn config.
7
columnConfig for NTV only
Array of word tokens
If need Id conversion
Tensor Data Preparation: Avro2TF
Deep dive into the user config.
Shape.
- []: scalar;
- [-1] : 1D array of any length;
- [6]: 1D array of length 6;
- [2, 3]: a matrix with 2 rows and 3 columns.
(Where is “shape” used?)
Notice:
We don’t do any reshaping here. The shape is just the shape of your raw data.
8
Tensor Data Preparation: Avro2TF
Deep dive into the user config.
- For categorical/sparse features, we require them to be represented in NTV
(name-term-value) format.
9
- We also support the following primitive types:
int, long, float, double, String, bytes (for multimedia data such as image, audio, and video), boolean,
enum, Array[NTV].
Tensor Data Preparation: Avro2TF
Deep dive into the tensorize process.
Feature Mapping Table Generation.
- Requirement:
String-based indices Numerical Id-based indices
- Usage:
1) Mapping table is limited (only based on training data). [Starting from 0]
2) If not available maps to same Id with UNK (unknown) token. [= Cardinality]
3) User can provide their own mapping table.
4) Different training can use the same mapping table.
10
Talk later!
Tensor Data Preparation: Avro2TF
Deep dive into the tensorize process.
Feature Mapping Table Generation.
- Feature: NTV format. - Feature: A list of String.
Each row = name + term Each row = single word
11
Tensor Data Preparation: Avro2TF
Deep dive into the tensor metadata.
Cardinality Computation:
- sparseVector: the number of unique name + term across all records .
- String: the number of unique Strings across all records.
- long or int: the maximum long/int value.
12
Tensor Data Preparation: Avro2TF
Deep dive into the output tensor.
- For categorical/sparse features, we provide a special data type: sparseVector.
Indices: All the ids will be put into an array.
Values: All the values in the NTV tuples will be put into an array.
- We also support the following output tensor data type:
int, long, float, double, String, boolean, bytes, sparseVector.
13
14
More Examples on Supported Data Types
Tensor Metadata
Avro2TF

Más contenido relacionado

La actualidad más candente

Base64 Encoder And Decoder Transformer With Mule ESB
Base64 Encoder And Decoder Transformer With Mule ESBBase64 Encoder And Decoder Transformer With Mule ESB
Base64 Encoder And Decoder Transformer With Mule ESBJitendra Bafna
 
Python command line_14_12_2020
Python command line_14_12_2020Python command line_14_12_2020
Python command line_14_12_2020Sugnan M
 
Tools for reading papers
Tools for reading papersTools for reading papers
Tools for reading papersJack Fox
 
Understanding the components of standard template library
Understanding the components of standard template libraryUnderstanding the components of standard template library
Understanding the components of standard template libraryRahul Sharma
 
12. standard library introduction
12. standard library introduction12. standard library introduction
12. standard library introductionVahid Heidari
 
Introduction to programming in MATLAB
Introduction to programming in MATLABIntroduction to programming in MATLAB
Introduction to programming in MATLABmustafa_92
 
32sql server
32sql server32sql server
32sql serverSireesh K
 
Python advanced 3.the python std lib by example –data structures
Python advanced 3.the python std lib by example –data structuresPython advanced 3.the python std lib by example –data structures
Python advanced 3.the python std lib by example –data structuresJohn(Qiang) Zhang
 
R data-structures-3
R data-structures-3R data-structures-3
R data-structures-3Victor Ordu
 
(Kpi summer school 2015) theano tutorial part1
(Kpi summer school 2015) theano tutorial part1(Kpi summer school 2015) theano tutorial part1
(Kpi summer school 2015) theano tutorial part1Serhii Havrylov
 
Python advanced 3.the python std lib by example – algorithm
Python advanced 3.the python std lib by example – algorithmPython advanced 3.the python std lib by example – algorithm
Python advanced 3.the python std lib by example – algorithmJohn(Qiang) Zhang
 

La actualidad más candente (20)

Base64 Encoder And Decoder Transformer With Mule ESB
Base64 Encoder And Decoder Transformer With Mule ESBBase64 Encoder And Decoder Transformer With Mule ESB
Base64 Encoder And Decoder Transformer With Mule ESB
 
Standard Library Functions
Standard Library FunctionsStandard Library Functions
Standard Library Functions
 
Network Security Lec4
Network Security Lec4Network Security Lec4
Network Security Lec4
 
2CPP16 - STL
2CPP16 - STL2CPP16 - STL
2CPP16 - STL
 
Python command line_14_12_2020
Python command line_14_12_2020Python command line_14_12_2020
Python command line_14_12_2020
 
Msc 1
Msc 1Msc 1
Msc 1
 
Tools for reading papers
Tools for reading papersTools for reading papers
Tools for reading papers
 
Link List
Link ListLink List
Link List
 
Understanding the components of standard template library
Understanding the components of standard template libraryUnderstanding the components of standard template library
Understanding the components of standard template library
 
30csharp
30csharp30csharp
30csharp
 
12. standard library introduction
12. standard library introduction12. standard library introduction
12. standard library introduction
 
Editors l21 l24
Editors l21 l24Editors l21 l24
Editors l21 l24
 
Lecture1
Lecture1Lecture1
Lecture1
 
Introduction to programming in MATLAB
Introduction to programming in MATLABIntroduction to programming in MATLAB
Introduction to programming in MATLAB
 
32sql server
32sql server32sql server
32sql server
 
Python advanced 3.the python std lib by example –data structures
Python advanced 3.the python std lib by example –data structuresPython advanced 3.the python std lib by example –data structures
Python advanced 3.the python std lib by example –data structures
 
R data-structures-3
R data-structures-3R data-structures-3
R data-structures-3
 
Functions class11 cbse_notes
Functions class11 cbse_notesFunctions class11 cbse_notes
Functions class11 cbse_notes
 
(Kpi summer school 2015) theano tutorial part1
(Kpi summer school 2015) theano tutorial part1(Kpi summer school 2015) theano tutorial part1
(Kpi summer school 2015) theano tutorial part1
 
Python advanced 3.the python std lib by example – algorithm
Python advanced 3.the python std lib by example – algorithmPython advanced 3.the python std lib by example – algorithm
Python advanced 3.the python std lib by example – algorithm
 

Similar a Avro2 tf: a data processing engine for tensorflow

Sqlmaterial 120414024230-phpapp01
Sqlmaterial 120414024230-phpapp01Sqlmaterial 120414024230-phpapp01
Sqlmaterial 120414024230-phpapp01Lalit009kumar
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonRalf Gommers
 
Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Tyrone Systems
 
Introduction To Using TensorFlow & Deep Learning
Introduction To Using TensorFlow & Deep LearningIntroduction To Using TensorFlow & Deep Learning
Introduction To Using TensorFlow & Deep Learningali alemi
 
Think Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use CaseThink Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use CaseRachel Warren
 
Tamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptxTamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptxTamir Dresher
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Chris Fregly
 
Mod01 tns e overview
Mod01 tns e overviewMod01 tns e overview
Mod01 tns e overviewPeter Haase
 
BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!Linaro
 
running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on androidKoan-Sin Tan
 
OSA Con 2022: Streaming Data Made Easy
OSA Con 2022:  Streaming Data Made EasyOSA Con 2022:  Streaming Data Made Easy
OSA Con 2022: Streaming Data Made EasyTimothy Spann
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...Altinity Ltd
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Tyrone Systems
 

Similar a Avro2 tf: a data processing engine for tensorflow (20)

Sqlmaterial 120414024230-phpapp01
Sqlmaterial 120414024230-phpapp01Sqlmaterial 120414024230-phpapp01
Sqlmaterial 120414024230-phpapp01
 
Just entity framework
Just entity frameworkJust entity framework
Just entity framework
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 
Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1
 
Introduction To Using TensorFlow & Deep Learning
Introduction To Using TensorFlow & Deep LearningIntroduction To Using TensorFlow & Deep Learning
Introduction To Using TensorFlow & Deep Learning
 
Think Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use CaseThink Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use Case
 
Arduino reference
Arduino referenceArduino reference
Arduino reference
 
Meetup tensorframes
Meetup tensorframesMeetup tensorframes
Meetup tensorframes
 
Theory1&2
Theory1&2Theory1&2
Theory1&2
 
Tamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptxTamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptx
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
 
Terraform training 🎒 - Basic
Terraform training 🎒 - BasicTerraform training 🎒 - Basic
Terraform training 🎒 - Basic
 
Interm codegen
Interm codegenInterm codegen
Interm codegen
 
Mod01 tns e overview
Mod01 tns e overviewMod01 tns e overview
Mod01 tns e overview
 
BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!
 
running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
 
OSA Con 2022: Streaming Data Made Easy
OSA Con 2022:  Streaming Data Made EasyOSA Con 2022:  Streaming Data Made Easy
OSA Con 2022: Streaming Data Made Easy
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
 
Keras and TensorFlow
Keras and TensorFlowKeras and TensorFlow
Keras and TensorFlow
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
 

Último

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Último (20)

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 

Avro2 tf: a data processing engine for tensorflow

  • 1. Avro2TF: A Data Processing Engine for TensorFlow Xuhong Zhang, Wensheng Sun, Chenya Zhang, Yiming Ma AI Computing Foundation Team
  • 2. Tensor Data Preparation: Avro2TF - A data preprocessing component under TensorFlowIn. - Read raw user input data with any format supported by Spark. Generate tensor metadata (e.g.shape, cardinality, dtype). Generate Avro- or TFRecord-based training data. Make your training data ready to be consumed by TF! 2
  • 7. Tensor Data Preparation: Avro2TF Deep dive into the user config. - Example of a tensorizeIn config. 7 columnConfig for NTV only Array of word tokens If need Id conversion
  • 8. Tensor Data Preparation: Avro2TF Deep dive into the user config. Shape. - []: scalar; - [-1] : 1D array of any length; - [6]: 1D array of length 6; - [2, 3]: a matrix with 2 rows and 3 columns. (Where is “shape” used?) Notice: We don’t do any reshaping here. The shape is just the shape of your raw data. 8
  • 9. Tensor Data Preparation: Avro2TF Deep dive into the user config. - For categorical/sparse features, we require them to be represented in NTV (name-term-value) format. 9 - We also support the following primitive types: int, long, float, double, String, bytes (for multimedia data such as image, audio, and video), boolean, enum, Array[NTV].
  • 10. Tensor Data Preparation: Avro2TF Deep dive into the tensorize process. Feature Mapping Table Generation. - Requirement: String-based indices Numerical Id-based indices - Usage: 1) Mapping table is limited (only based on training data). [Starting from 0] 2) If not available maps to same Id with UNK (unknown) token. [= Cardinality] 3) User can provide their own mapping table. 4) Different training can use the same mapping table. 10 Talk later!
  • 11. Tensor Data Preparation: Avro2TF Deep dive into the tensorize process. Feature Mapping Table Generation. - Feature: NTV format. - Feature: A list of String. Each row = name + term Each row = single word 11
  • 12. Tensor Data Preparation: Avro2TF Deep dive into the tensor metadata. Cardinality Computation: - sparseVector: the number of unique name + term across all records . - String: the number of unique Strings across all records. - long or int: the maximum long/int value. 12
  • 13. Tensor Data Preparation: Avro2TF Deep dive into the output tensor. - For categorical/sparse features, we provide a special data type: sparseVector. Indices: All the ids will be put into an array. Values: All the values in the NTV tuples will be put into an array. - We also support the following output tensor data type: int, long, float, double, String, boolean, bytes, sparseVector. 13
  • 14. 14 More Examples on Supported Data Types Tensor Metadata Avro2TF