Enviar búsqueda
Cargar
Avro - More Than Just a Serialization Framework - CHUG - 20120416
•
23 recomendaciones
•
13,495 vistas
Chicago Hadoop Users Group
Seguir
View the accompanying video on vimeo: https://vimeo.com/40776630
Leer menos
Leer más
Tecnología
Educación
Denunciar
Compartir
Denunciar
Compartir
1 de 27
Recomendados
ApacheCon09: Avro
ApacheCon09: Avro
Cloudera, Inc.
Avro intro
Avro intro
Randy Abernethy
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Igor Anishchenko
Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]
LivePerson
CBOR - The Better JSON
CBOR - The Better JSON
Christoph Engelbert
3 avro hug-2010-07-21
3 avro hug-2010-07-21
Hadoop User Group
Apache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePerson
LivePerson
Rest style web services (google protocol buffers) prasad nirantar
Rest style web services (google protocol buffers) prasad nirantar
IndicThreads
Recomendados
ApacheCon09: Avro
ApacheCon09: Avro
Cloudera, Inc.
Avro intro
Avro intro
Randy Abernethy
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Igor Anishchenko
Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]
LivePerson
CBOR - The Better JSON
CBOR - The Better JSON
Christoph Engelbert
3 avro hug-2010-07-21
3 avro hug-2010-07-21
Hadoop User Group
Apache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePerson
LivePerson
Rest style web services (google protocol buffers) prasad nirantar
Rest style web services (google protocol buffers) prasad nirantar
IndicThreads
Beyond JSON - An Introduction to FlatBuffers
Beyond JSON - An Introduction to FlatBuffers
Maxim Zaks
Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)
오석 한
RESTLess Design with Apache Thrift: Experiences from Apache Airavata
RESTLess Design with Apache Thrift: Experiences from Apache Airavata
smarru
Apache AVRO (Boston HUG, Jan 19, 2010)
Apache AVRO (Boston HUG, Jan 19, 2010)
Cloudera, Inc.
Serialization and performance by Sergey Morenets
Serialization and performance by Sergey Morenets
Alex Tumanoff
Google Protocol Buffers
Google Protocol Buffers
Sergey Podolsky
Php
Php
mohamed ashraf
F# Type Provider for R Statistical Platform
F# Type Provider for R Statistical Platform
Howard Mansell
Data Serialization Using Google Protocol Buffers
Data Serialization Using Google Protocol Buffers
William Kibira
Experience protocol buffer on android
Experience protocol buffer on android
Richard Chang
Taming the resource tiger
Taming the resource tiger
Elizabeth Smith
Building scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thrift
Talentica Software
Dart programming language
Dart programming language
Aniruddha Chakrabarti
Php extensions
Php extensions
Elizabeth Smith
Hack and HHVM
Hack and HHVM
Ewere Diagboya
Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...
IndicThreads
Dart the better Javascript 2015
Dart the better Javascript 2015
Jorg Janke
Php’s guts
Php’s guts
Elizabeth Smith
Presentation of Python, Django, DockerStack
Presentation of Python, Django, DockerStack
David Sanchez
Graal VM: Multi-Language Execution Platform
Graal VM: Multi-Language Execution Platform
Thomas Wuerthinger
Avro
Avro
Eric Turcotte
Berlin Buzzwords 2019 - Taming the language border in data analytics and scie...
Berlin Buzzwords 2019 - Taming the language border in data analytics and scie...
Uwe Korn
Más contenido relacionado
La actualidad más candente
Beyond JSON - An Introduction to FlatBuffers
Beyond JSON - An Introduction to FlatBuffers
Maxim Zaks
Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)
오석 한
RESTLess Design with Apache Thrift: Experiences from Apache Airavata
RESTLess Design with Apache Thrift: Experiences from Apache Airavata
smarru
Apache AVRO (Boston HUG, Jan 19, 2010)
Apache AVRO (Boston HUG, Jan 19, 2010)
Cloudera, Inc.
Serialization and performance by Sergey Morenets
Serialization and performance by Sergey Morenets
Alex Tumanoff
Google Protocol Buffers
Google Protocol Buffers
Sergey Podolsky
Php
Php
mohamed ashraf
F# Type Provider for R Statistical Platform
F# Type Provider for R Statistical Platform
Howard Mansell
Data Serialization Using Google Protocol Buffers
Data Serialization Using Google Protocol Buffers
William Kibira
Experience protocol buffer on android
Experience protocol buffer on android
Richard Chang
Taming the resource tiger
Taming the resource tiger
Elizabeth Smith
Building scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thrift
Talentica Software
Dart programming language
Dart programming language
Aniruddha Chakrabarti
Php extensions
Php extensions
Elizabeth Smith
Hack and HHVM
Hack and HHVM
Ewere Diagboya
Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...
IndicThreads
Dart the better Javascript 2015
Dart the better Javascript 2015
Jorg Janke
Php’s guts
Php’s guts
Elizabeth Smith
Presentation of Python, Django, DockerStack
Presentation of Python, Django, DockerStack
David Sanchez
La actualidad más candente
(19)
Beyond JSON - An Introduction to FlatBuffers
Beyond JSON - An Introduction to FlatBuffers
Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)
RESTLess Design with Apache Thrift: Experiences from Apache Airavata
RESTLess Design with Apache Thrift: Experiences from Apache Airavata
Apache AVRO (Boston HUG, Jan 19, 2010)
Apache AVRO (Boston HUG, Jan 19, 2010)
Serialization and performance by Sergey Morenets
Serialization and performance by Sergey Morenets
Google Protocol Buffers
Google Protocol Buffers
Php
Php
F# Type Provider for R Statistical Platform
F# Type Provider for R Statistical Platform
Data Serialization Using Google Protocol Buffers
Data Serialization Using Google Protocol Buffers
Experience protocol buffer on android
Experience protocol buffer on android
Taming the resource tiger
Taming the resource tiger
Building scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thrift
Dart programming language
Dart programming language
Php extensions
Php extensions
Hack and HHVM
Hack and HHVM
Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...
Dart the better Javascript 2015
Dart the better Javascript 2015
Php’s guts
Php’s guts
Presentation of Python, Django, DockerStack
Presentation of Python, Django, DockerStack
Similar a Avro - More Than Just a Serialization Framework - CHUG - 20120416
Graal VM: Multi-Language Execution Platform
Graal VM: Multi-Language Execution Platform
Thomas Wuerthinger
Avro
Avro
Eric Turcotte
Berlin Buzzwords 2019 - Taming the language border in data analytics and scie...
Berlin Buzzwords 2019 - Taming the language border in data analytics and scie...
Uwe Korn
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20
Phil Wilkins
CRX Best practices
CRX Best practices
lisui0807
PyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems
PyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems
Uwe Korn
Sparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With Spark
Ian Pointer
Ruby on Rails (RoR) as a back-end processor for Apex
Ruby on Rails (RoR) as a back-end processor for Apex
Espen Brækken
DCRUG: Achieving Development-Production Parity
DCRUG: Achieving Development-Production Parity
Geoff Harcourt
3 apache-avro
3 apache-avro
zafargilani
Suneel Marthi - Deep Learning with Apache Flink and DL4J
Suneel Marthi - Deep Learning with Apache Flink and DL4J
Flink Forward
PHP - Introduction to PHP Fundamentals
PHP - Introduction to PHP Fundamentals
Vibrant Technologies & Computers
Api world apache nifi 101
Api world apache nifi 101
Timothy Spann
OSGi enRoute Unveiled - P Kriens
OSGi enRoute Unveiled - P Kriens
mfrancis
Deep learning on HDP 2018 Prague
Deep learning on HDP 2018 Prague
Timothy Spann
Reusando componentes Zope fuera de Zope
Reusando componentes Zope fuera de Zope
menttes
Guglielmo iozzia - Google I/O extended dublin 2018
Guglielmo iozzia - Google I/O extended dublin 2018
Guglielmo Iozzia
Full Speed Ahead! (Ahead-of-Time Compilation for Java SE) [JavaOne 2017 CON3738]
Full Speed Ahead! (Ahead-of-Time Compilation for Java SE) [JavaOne 2017 CON3738]
David Buck
Intro Of Selenium
Intro Of Selenium
Kai Feng Zhang
Spring Roo 1.0.0 Technical Deep Dive
Spring Roo 1.0.0 Technical Deep Dive
Ben Alex
Similar a Avro - More Than Just a Serialization Framework - CHUG - 20120416
(20)
Graal VM: Multi-Language Execution Platform
Graal VM: Multi-Language Execution Platform
Avro
Avro
Berlin Buzzwords 2019 - Taming the language border in data analytics and scie...
Berlin Buzzwords 2019 - Taming the language border in data analytics and scie...
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20
CRX Best practices
CRX Best practices
PyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems
PyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems
Sparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With Spark
Ruby on Rails (RoR) as a back-end processor for Apex
Ruby on Rails (RoR) as a back-end processor for Apex
DCRUG: Achieving Development-Production Parity
DCRUG: Achieving Development-Production Parity
3 apache-avro
3 apache-avro
Suneel Marthi - Deep Learning with Apache Flink and DL4J
Suneel Marthi - Deep Learning with Apache Flink and DL4J
PHP - Introduction to PHP Fundamentals
PHP - Introduction to PHP Fundamentals
Api world apache nifi 101
Api world apache nifi 101
OSGi enRoute Unveiled - P Kriens
OSGi enRoute Unveiled - P Kriens
Deep learning on HDP 2018 Prague
Deep learning on HDP 2018 Prague
Reusando componentes Zope fuera de Zope
Reusando componentes Zope fuera de Zope
Guglielmo iozzia - Google I/O extended dublin 2018
Guglielmo iozzia - Google I/O extended dublin 2018
Full Speed Ahead! (Ahead-of-Time Compilation for Java SE) [JavaOne 2017 CON3738]
Full Speed Ahead! (Ahead-of-Time Compilation for Java SE) [JavaOne 2017 CON3738]
Intro Of Selenium
Intro Of Selenium
Spring Roo 1.0.0 Technical Deep Dive
Spring Roo 1.0.0 Technical Deep Dive
Más de Chicago Hadoop Users Group
Kinetica master chug_9.12
Kinetica master chug_9.12
Chicago Hadoop Users Group
Chug dl presentation
Chug dl presentation
Chicago Hadoop Users Group
Yahoo compares Storm and Spark
Yahoo compares Storm and Spark
Chicago Hadoop Users Group
Using Apache Drill
Using Apache Drill
Chicago Hadoop Users Group
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Chicago Hadoop Users Group
Meet Spark
Meet Spark
Chicago Hadoop Users Group
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
Chicago Hadoop Users Group
An Overview of Ambari
An Overview of Ambari
Chicago Hadoop Users Group
Hadoop and Big Data Security
Hadoop and Big Data Security
Chicago Hadoop Users Group
Introduction to MapReduce
Introduction to MapReduce
Chicago Hadoop Users Group
Advanced Oozie
Advanced Oozie
Chicago Hadoop Users Group
Scalding for Hadoop
Scalding for Hadoop
Chicago Hadoop Users Group
Financial Data Analytics with Hadoop
Financial Data Analytics with Hadoop
Chicago Hadoop Users Group
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about Oozie
Chicago Hadoop Users Group
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache Hadoop
Chicago Hadoop Users Group
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917
Chicago Hadoop Users Group
Map Reduce v2 and YARN - CHUG - 20120604
Map Reduce v2 and YARN - CHUG - 20120604
Chicago Hadoop Users Group
Hadoop in a Windows Shop - CHUG - 20120416
Hadoop in a Windows Shop - CHUG - 20120416
Chicago Hadoop Users Group
Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815
Chicago Hadoop Users Group
Más de Chicago Hadoop Users Group
(19)
Kinetica master chug_9.12
Kinetica master chug_9.12
Chug dl presentation
Chug dl presentation
Yahoo compares Storm and Spark
Yahoo compares Storm and Spark
Using Apache Drill
Using Apache Drill
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Meet Spark
Meet Spark
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
An Overview of Ambari
An Overview of Ambari
Hadoop and Big Data Security
Hadoop and Big Data Security
Introduction to MapReduce
Introduction to MapReduce
Advanced Oozie
Advanced Oozie
Scalding for Hadoop
Scalding for Hadoop
Financial Data Analytics with Hadoop
Financial Data Analytics with Hadoop
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about Oozie
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache Hadoop
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917
Map Reduce v2 and YARN - CHUG - 20120604
Map Reduce v2 and YARN - CHUG - 20120604
Hadoop in a Windows Shop - CHUG - 20120416
Hadoop in a Windows Shop - CHUG - 20120416
Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815
Último
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Anna Loughnan Colquhoun
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
naman860154
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
HampshireHUG
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
naman860154
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
RTylerCroy
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
apidays
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Martijn de Jong
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Rafal Los
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
Pixlogix Infotech
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Enterprise Knowledge
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
UK Journal
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Radu Cotescu
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Khem
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Principled Technologies
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Delhi Call girls
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
Último
(20)
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Avro - More Than Just a Serialization Framework - CHUG - 20120416
1.
http://avro.apache.org
Apache Avro More Than Just A Serialization Framework Jim Scott Lead Engineer / Architect A ValueClick Company
2.
Agenda
• History / Overview • Serialization Framework • Supported Languages • Performance • Implementing Avro (Including Code Examples) • Avro with Maven • RPC (Including Code Examples) • Resources • Questions? 2 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
3.
History / Overview 3
Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
4.
History / Overview
Existing Serialization Frameworks • protobuf, thrift, avro, kryo, hessian, activemq-protobuf, scala, sbinary, google-gson, jackson/JSON, javolution, protostuff, woodstox, aalto, fast- infoset, xstream, java serialization, etc… Most popular frameworks • JAXB, Protocol Buffers, Thrift Avro Created by Doug Cutting, the Creator of Hadoop • Data is always accompanied by a schema: Support for dynamic typing--code generation is not required Supports schema evolution The data is not tagged resulting in smaller serialization size 4 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
5.
Serialization Framework 5
Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
6.
Serialization Framework
Avro Limitations • Map keys can only be Strings Avro Benefits • Interoperability Can serialize into Avro/Binary or Avro/JSON Supports reading and writing protobufs and thrift • Supports multiple languages • Rich data structures with a schema described via JSON A compact, fast, binary data format. A container file, to store persistent data (Schema ALWAYS available) Remote procedure call (RPC). • Simple integration with dynamic languages (via the generic type) Unlike other frameworks, an unknown schema is supported at runtime • Compressable and splittable by Hadoop MapReduce 6 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
7.
Supported Languages
Implementation Core Data file Codec RPC C yes yes deflate yes C++ yes yes ? yes C# yes no n/a no Java yes yes deflate, snappy yes Perl yes yes deflate no Python yes yes deflate, snappy yes Ruby yes yes deflate yes PHP yes yes ? no Core: Parse JSON schema, read / write binary schema Data file: Read / write avro data files RPC: Over HTTP Source: https://cwiki.apache.org/confluence/display/AVRO/Supported+Languages 7 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
8.
Framework - Performance
Comparison Metrics Time to Serialize / Deserialize • Avro is not the fastest, but is in the top half of all frameworks Object Creation • Avro falls to the bottom, because it always uses UTF-8 for Strings. In normal use cases this is not a problem, as this test was just to compare object creation, not object reuse. Size of Serialized Objects (Compressed w/ deflate or nothing) • Avro is only bested by Kryo by about 1 byte Source: http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2 8 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
9.
Framework - Performance
Comparison Charts Size of serialized data Total time to serialize data Avro Source: http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2 9 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
10.
Implementing Avro 10
Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
11.
Framework - Types
Generic • All avro records are represented by a generic attribute/value data structure. This style is most useful for systems which dynamically process datasets based on user-provided scripts. For example, a program may be passed a data file whose schema has not been previously seen by the program and told to sort it by the field named "city". Specific • Each Avro record corresponds to a different kind of object in the programming language. For example, in Java, C and C++, a specific API would generate a distinct class or struct definition for each record definition. This style is used for programs written to process a specific schema. RPC systems typically use this. Reflect • Avro schemas are generated via reflection to correspond to existing programming language data structures. This may be useful when converting an existing codebase to use Avro with minimal modifications. Source: https://cwiki.apache.org/confluence/display/AVRO/Glossary 11 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
12.
Using Reflect Type
Class<T> type = SomeObject.getClass(); Schema schema = ReflectData.AllowNull.get().getSchema(type); DataFileWriter writer = new DataFileWriter(new ReflectDatumWriter(schema)); 12 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
13.
Using Specific Type
Class<T> type = SomeObject.getClass(); Schema schema = SpecificData.get().getSchema(type); DataFileWriter writer = new DataFileWriter(new SpecificDatumWriter(schema)); 13 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
14.
Using the DataFileWriter
Only one more thing to do and that is to tell this writer where to write... writer.create(schema, OutputStream); What if you want to append to an existing file instead of creating a new one? writer.appendTo(new File("Some File That exists")); Time to write... writer.append(object of type T); 14 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
15.
Don’t Forget About
Reading Class<T> type = SomeObject.getClass(); Schema schema = ReflectData.AllowNull.get().getSchema(type); SpecificData.get().getSchema(type); DatumReader datumReader = new SpecificDatumReader(schema); new ReflectDatumReader(schema); DataFileStream reader = new DataFileStream(inputStream, datumReader); reader.iterator(); Remember that compressed data? Reader reads it automatically! 15 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
16.
Defining a Specific
Schema Create an Enum type: serverstate.avsc (name is arbitrary, extension is not) {"type":"enum", "namespace":"com.yourcompany.avro", "name":"ServerState", "symbols":[ "STARTING", "IDLE", "ACTIVE", "STOPPING“, "STOPPED“ ]} Create an Exception type: wrongstate.avsc { "type":"error", "namespace":"com.yourcompany.avro", "name":“WrongServerStateException", "fields":[ { "name":"message", "type":"string“ } ]} 16 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
17.
Defining a Specific
Schema Create a regular data object: historical.avsc { "type":"record", "namespace":"com.yourcompany.avro", "name":"NewHistoricalMessage", "aliases": ["com.yourcompany.avro.datatypes.HistoricalMessage"], "fields":[ { "name":"dataSource", "type":[ "null", "string“ ]} } Aliases allow for schema evolution. All data objects that are generated are defined with simple JSON and the documentation is very straight forward. Source: http://avro.apache.org/docs/current/spec.html 17 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
18.
Maven 18
Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
19.
Avro With Maven
Maven Plugins • This plugin assists with the Maven build lifecycle (may not be necessary in all use cases) <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>build-helper-maven-plugin</artifactId> </plugin> • Compiles *.avdl, *.avpr, *.avsc, and *.genavro (define the goals accordingly) <plugin> <groupId>org.apache.avro</groupId> <artifactId>avro-maven-plugin</artifactId> </plugin> • Necessary for Avro to introspect generated rpc code (http://paranamer.codehaus.org/) <plugin> <groupId>com.thoughtworks.paranamer</groupId> <artifactId>paranamer-maven-plugin</artifactId> </plugin> 19 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
20.
RPC 20
Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
21.
RPC
How to utilize an Avro RPC Server • Define the Protocol • Datatypes passed via RPC require use of specific types • An implementation of the interface generated by the protocol • Create and start an instance of an Avro RPC Server in Java • Create a client based on the interface generated by the protocol 21 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
22.
Define the Protocol
• Create an AVDL file: historytracker.avdl (name is arbitrary, but the extension is not) @namespace("com.yourcompany.rpc") protocol HistoryTracker { import schema "historical.avsc"; import schema "serverstate.avsc"; import schema "wrongstate.avsc“; void somethingHappened( com.yourcompany.avro.NewHistoricalMessage Item) oneway; /** * You can add comments */ com.yourcompany.avro.ServerState getState() throws com.yourcompany.avro.WrongServerStateException; } . 22 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
23.
Create an RPC
Server Creating a server is fast and easy… InetSocketAddress address = new InetSocketAddress(hostname, port); Responder responder = new SpecificResponder(HistoryTracker.class, HistoryTrackerImpl); Server avroServer = new NettyServer(responder, address); avroServer.start(); • The HistoryTracker is the interface generated from the AVDL file • The HistoryTrackerImpl is an implementation of the HistoryTracker • There are other service implementations beyond Netty, e.g. HTTP 23 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
24.
Create an RPC
Client Creating a client is easier than creating a server… InetSocketAddress address = new InetSocketAddress(hostname, port); Transceiver transceiver = new NettyTransceiver(address); Object<rpcInterface> client = SpecificRequestor.getClient(HistoryTracker.class, transceiver); • The HistoryTracker is the interface generated from the AVDL file • There are other service implementations beyond Netty, e.g. HTTP 24 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
25.
Resources 25
Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
26.
Resources
References • Apache Website and Wiki http://avro.apache.org https://cwiki.apache.org/confluence/display/AVRO/Index • Benchmarking Serializaiton Frameworks http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2 • An Introduction to Avro (Chris Cooper) http://files.meetup.com/1634302/CHUG-ApacheAvro.pdf Resources • Mailing List: user@avro.apache.org 26 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
27.
Thanks for Attending
Questions? jscott@dotomi.com 27 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.