SlideShare una empresa de Scribd logo
1 de 14
Descargar para leer sin conexión
Apache AVRO
  What's new?
Philip Zeyliger, Cloudera
   (AVRO committer)

     Boston HUG
   January 19, 2009
What's AVRO?

 A data serialization system
 Includes:
     A schema language
     A compact serialized form
     An RPC framework
     A handful of APIs, in a handful of languages
 Goals:
     Cross-language
     Support for dynamic access
     Simple but expressive schema evolution

 Same "space" as Apache Thrift, Google Protocol Buffers,
 Binary JSON, and XDR. Subtle differences with all of them.
AVRO Protocols & Schemas
@namespace("org.apache.avro.demo")
protocol CurrencyConversion {
  enum Currency {
    USD,
    GBP,
    EUR,
    JPY
  }
  record Money {
    Currency currency;
    int amount;
  }
  error UnknownRateError {
    Currency currency;
  }
  Money convert(Money input, Currency targetCurrency)
    throws UnknownRateError;
  double rate(Currency input, Currency output) throws UnknownRateError;
}




              "genavro" IDL (AVRO-258)
$java -jar avro-tools-1.2.0-dev.jar genavro < demo.genavro   "messages" : {
{                                                               "convert" : {
  "protocol" : "CurrencyConversion",                              "request" : [ {
  "namespace" : "org.apache.avro.demo",                            "name" : "input",
  "types" : [ {                                                    "type" : "Money"
   "type" : "enum",                                               }, {
   "name" : "Currency",                                            "name" : "targetCurrency",
   "symbols" : [ "USD", "GBP", "EUR", "JPY" ]                      "type" : "Currency"
  }, {                                                            } ],
   "type" : "record",                                             "response" : "Money",
   "name" : "Money",                                              "errors" : [ "UnknownRateError" ]
   "fields" : [ {                                               },
     "name" : "currency",                                       "rate" : {
     "type" : "Currency"                                          "request" : [ {
   }, {                                                            "name" : "input",
     "name" : "amount",                                            "type" : "Currency"
     "type" : "int"                                               }, {
   }]                                                              "name" : "output",
  }, {                                                             "type" : "Currency"
   "type" : "error",                                              } ],
   "name" : "UnknownRateError",                                   "response" : "double",
   "fields" : [ {                                                 "errors" : [ "UnknownRateError" ]
     "name" : "currency",                                       }
     "type" : "Currency"                                      }
   }]                                                        }[
  } ],




             JSON Representation of Protocol and Schemas
Types

            primitive             complex
string                  record
bytes                   array
int & long              map: string -> T
float & double          union
boolean                 fixed<N>
null                    enum
Schema Evolution & Projection
         AVRO binary data never travels without its schema. This
         allows dynamic tooling.
         Writer's Schema and Reader's Schema may be different.
{       /* Writer */                { /* Reader */
     "type" : "record",               "type" : "record",
     "name" : "Person",               "name" : "Person",
     "fields" : [ {                   "fields" : [ {
       "name" : "first",                "name" : "first",
       "type" : "string"                "type" : "string"
     }, {                             }, {
       "name" : "sport",                "name" : "age",
       "type" : "string",               "type" : "int",
     }                                  "default": 0,
 }                                    }
                                    }

Serialized Data:                   Data presented to application:

    "Alice", "Ultimate Frisbee"     "Alice", 0
APIs

 Python
    Dynamic
 Java
    Specific (generated code)
    Generic (container-based)
    Reflection (induces schemas from classes)
 C
 C++
 Ruby
C API

char buf[64];
avro_writer_t writer = avro_writer_memory(buf, sizeof(buf));
avro_schema_t writers_schema = avro_schema_string();
avro_datum_t datum = avro_string("Hello, world!");
avro_write_data(writer, writers_schema, datum);

avro_reader_t reader = avro_reader_memory(buf, sizeof(buf));
avro_schema_t readers_schema = avro_schema_string();
avro_datum_t read_datum;
avro_read_data(reader, writers_schema, readers_schema, &read_datum);
Data File Format
(AVRO-160)

Features:
* Splittable
 (important for Hadoop!)
* Append only with same
  schema.
* Compression
* Arbitrary metadata
* Simple
Hadoop Integration
 Users
    AvroInputFormat/AvroOutputFormat (MR-815)
    Using AVRO in the shuffle (MR-1126)
        Note that AVRO schemas let you specify sort order;
        binary comparators are a thing of the past
    Many Writables can be AVRO+Reflection instead
    AVRO sort order leaves hand-writing RawComparators in
    the past; for Streaming, you now get fast comparators for
    free!
 Framework
    AVRO for Hadoop RPC (e.g., HDFS-982)
 Goals
    Open up protocols for cross-language use
avro-tools

Available tools:
   compile Generates Java code for the given schema.
fragtojson Renders a binary-encoded Avro datum as JSON.
  fromjson Reads JSON records and writes an Avro data file.
   genavro Generates a JSON schema from a GenAvro file
 getschema Prints out schema of an Avro data file.
    induce Induce a schema/protocol from Java class/interface.
jsontofrag Renders a JSON-encoded Avro datum as binary.
rpcreceive Opens an HTTP RPC Server and listens for one message.
   rpcsend Sends a single RPC message.
    tojson Dumps an Avro data file as JSON, one record per line.
1.3 to be released soon...

Good time to try it out!

What's evolving?
  Trying not to evolve the serialized format.
  APIs are evolving.
  Transports are evolving.
Obligatory Links

  Web page:
  http://hadoop.apache.org/avro/
  Mailing list:
  avro-user-subscribe@hadoop.apache.org
  Source repository:
  http://svn.apache.org/repos/asf/hadoop/avro/
Thanks!


    Questions?


    Philip Zeyliger
philip@cloudera.com

Más contenido relacionado

La actualidad más candente

Apache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePersonApache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePersonLivePerson
 
Terrastore - A document database for developers
Terrastore - A document database for developersTerrastore - A document database for developers
Terrastore - A document database for developersSergio Bossa
 
F# Type Provider for R Statistical Platform
F# Type Provider for R Statistical PlatformF# Type Provider for R Statistical Platform
F# Type Provider for R Statistical PlatformHoward Mansell
 
Allura - an Open Source MongoDB Based Document Oriented SourceForge
Allura - an Open Source MongoDB Based Document Oriented SourceForgeAllura - an Open Source MongoDB Based Document Oriented SourceForge
Allura - an Open Source MongoDB Based Document Oriented SourceForgeRick Copeland
 
Introduction to the rust programming language
Introduction to the rust programming languageIntroduction to the rust programming language
Introduction to the rust programming languageNikolay Denev
 
Introduction to terrastore
Introduction to terrastoreIntroduction to terrastore
Introduction to terrastoresvjson
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
 
Introduction To Scala
Introduction To ScalaIntroduction To Scala
Introduction To ScalaPeter Maas
 
Search Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrSearch Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrKai Chan
 
Extending the Xbase Typesystem
Extending the Xbase TypesystemExtending the Xbase Typesystem
Extending the Xbase TypesystemSebastian Zarnekow
 
Few simple-type-tricks in scala
Few simple-type-tricks in scalaFew simple-type-tricks in scala
Few simple-type-tricks in scalaRuslan Shevchenko
 
2016 bioinformatics i_python_part_1_wim_vancriekinge
2016 bioinformatics i_python_part_1_wim_vancriekinge2016 bioinformatics i_python_part_1_wim_vancriekinge
2016 bioinformatics i_python_part_1_wim_vancriekingeProf. Wim Van Criekinge
 
Basics of JSON (JavaScript Object Notation) with examples
Basics of JSON (JavaScript Object Notation) with examplesBasics of JSON (JavaScript Object Notation) with examples
Basics of JSON (JavaScript Object Notation) with examplesSanjeev Kumar Jaiswal
 
Code for Startup MVP (Ruby on Rails) Session 2
Code for Startup MVP (Ruby on Rails) Session 2Code for Startup MVP (Ruby on Rails) Session 2
Code for Startup MVP (Ruby on Rails) Session 2Henry S
 

La actualidad más candente (20)

Javascript2839
Javascript2839Javascript2839
Javascript2839
 
Sax Dom Tutorial
Sax Dom TutorialSax Dom Tutorial
Sax Dom Tutorial
 
Apache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePersonApache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePerson
 
Terrastore - A document database for developers
Terrastore - A document database for developersTerrastore - A document database for developers
Terrastore - A document database for developers
 
F# Type Provider for R Statistical Platform
F# Type Provider for R Statistical PlatformF# Type Provider for R Statistical Platform
F# Type Provider for R Statistical Platform
 
Allura - an Open Source MongoDB Based Document Oriented SourceForge
Allura - an Open Source MongoDB Based Document Oriented SourceForgeAllura - an Open Source MongoDB Based Document Oriented SourceForge
Allura - an Open Source MongoDB Based Document Oriented SourceForge
 
Introduction to the rust programming language
Introduction to the rust programming languageIntroduction to the rust programming language
Introduction to the rust programming language
 
Introduction to terrastore
Introduction to terrastoreIntroduction to terrastore
Introduction to terrastore
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
 
Introduction To Scala
Introduction To ScalaIntroduction To Scala
Introduction To Scala
 
CBOR - The Better JSON
CBOR - The Better JSONCBOR - The Better JSON
CBOR - The Better JSON
 
Search Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrSearch Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and Solr
 
Extending the Xbase Typesystem
Extending the Xbase TypesystemExtending the Xbase Typesystem
Extending the Xbase Typesystem
 
DSLs in JavaScript
DSLs in JavaScriptDSLs in JavaScript
DSLs in JavaScript
 
Few simple-type-tricks in scala
Few simple-type-tricks in scalaFew simple-type-tricks in scala
Few simple-type-tricks in scala
 
2016 bioinformatics i_python_part_1_wim_vancriekinge
2016 bioinformatics i_python_part_1_wim_vancriekinge2016 bioinformatics i_python_part_1_wim_vancriekinge
2016 bioinformatics i_python_part_1_wim_vancriekinge
 
Basics of JSON (JavaScript Object Notation) with examples
Basics of JSON (JavaScript Object Notation) with examplesBasics of JSON (JavaScript Object Notation) with examples
Basics of JSON (JavaScript Object Notation) with examples
 
Code for Startup MVP (Ruby on Rails) Session 2
Code for Startup MVP (Ruby on Rails) Session 2Code for Startup MVP (Ruby on Rails) Session 2
Code for Startup MVP (Ruby on Rails) Session 2
 
Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to Perl
 
Python and MongoDB
Python and MongoDBPython and MongoDB
Python and MongoDB
 

Destacado

Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...StampedeCon
 
Apache Flume NG
Apache Flume NGApache Flume NG
Apache Flume NGhuguk
 
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014Steve Hoffman
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 
Async Redux Actions With RxJS - React Rally 2016
Async Redux Actions With RxJS - React Rally 2016Async Redux Actions With RxJS - React Rally 2016
Async Redux Actions With RxJS - React Rally 2016Ben Lesh
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kiteJoey Echeverria
 
Java concurrency - Thread pools
Java concurrency - Thread poolsJava concurrency - Thread pools
Java concurrency - Thread poolsmaksym220889
 
Non blocking io with netty
Non blocking io with nettyNon blocking io with netty
Non blocking io with nettyZauber
 
Java SE 8 技術手冊第 15 章 - 通用API
Java SE 8 技術手冊第 15 章 - 通用APIJava SE 8 技術手冊第 15 章 - 通用API
Java SE 8 技術手冊第 15 章 - 通用APIJustin Lin
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyDaniel Bimschas
 
Netty Notes Part 2 - Transports and Buffers
Netty Notes Part 2 - Transports and BuffersNetty Notes Part 2 - Transports and Buffers
Netty Notes Part 2 - Transports and BuffersRick Hightower
 
Java SE 8 技術手冊第 9 章 - Collection與Map
Java SE 8 技術手冊第 9 章 - Collection與MapJava SE 8 技術手冊第 9 章 - Collection與Map
Java SE 8 技術手冊第 9 章 - Collection與MapJustin Lin
 
Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem GetInData
 
Serialization and performance by Sergey Morenets
Serialization and performance by Sergey MorenetsSerialization and performance by Sergey Morenets
Serialization and performance by Sergey MorenetsAlex Tumanoff
 
Apache Flume and its use case in Manufacturing
Apache Flume and its use case in ManufacturingApache Flume and its use case in Manufacturing
Apache Flume and its use case in ManufacturingRapheephan Thongkham-Uan
 

Destacado (20)

3 apache-avro
3 apache-avro3 apache-avro
3 apache-avro
 
Apache Avro and You
Apache Avro and YouApache Avro and You
Apache Avro and You
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
 
Avro
AvroAvro
Avro
 
Apache Flume NG
Apache Flume NGApache Flume NG
Apache Flume NG
 
Notes on Netty baics
Notes on Netty baicsNotes on Netty baics
Notes on Netty baics
 
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Async Redux Actions With RxJS - React Rally 2016
Async Redux Actions With RxJS - React Rally 2016Async Redux Actions With RxJS - React Rally 2016
Async Redux Actions With RxJS - React Rally 2016
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kite
 
Java concurrency - Thread pools
Java concurrency - Thread poolsJava concurrency - Thread pools
Java concurrency - Thread pools
 
Non blocking io with netty
Non blocking io with nettyNon blocking io with netty
Non blocking io with netty
 
Java SE 8 技術手冊第 15 章 - 通用API
Java SE 8 技術手冊第 15 章 - 通用APIJava SE 8 技術手冊第 15 章 - 通用API
Java SE 8 技術手冊第 15 章 - 通用API
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with Netty
 
Netty Notes Part 2 - Transports and Buffers
Netty Notes Part 2 - Transports and BuffersNetty Notes Part 2 - Transports and Buffers
Netty Notes Part 2 - Transports and Buffers
 
Java SE 8 技術手冊第 9 章 - Collection與Map
Java SE 8 技術手冊第 9 章 - Collection與MapJava SE 8 技術手冊第 9 章 - Collection與Map
Java SE 8 技術手冊第 9 章 - Collection與Map
 
Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem
 
Avro introduction
Avro introductionAvro introduction
Avro introduction
 
Serialization and performance by Sergey Morenets
Serialization and performance by Sergey MorenetsSerialization and performance by Sergey Morenets
Serialization and performance by Sergey Morenets
 
Apache Flume and its use case in Manufacturing
Apache Flume and its use case in ManufacturingApache Flume and its use case in Manufacturing
Apache Flume and its use case in Manufacturing
 

Similar a Apache AVRO (Boston HUG, Jan 19, 2010)

JLIFF: Where we are, and where we're going
JLIFF: Where we are, and where we're goingJLIFF: Where we are, and where we're going
JLIFF: Where we are, and where we're goingChase Tingley
 
Strongly Typed Languages and Flexible Schemas
Strongly Typed Languages and Flexible SchemasStrongly Typed Languages and Flexible Schemas
Strongly Typed Languages and Flexible SchemasNorberto Leite
 
Writing Domain Specific Languages with JSON Schema
Writing Domain Specific Languages with JSON SchemaWriting Domain Specific Languages with JSON Schema
Writing Domain Specific Languages with JSON SchemaYos Riady
 
Webinar: Strongly Typed Languages and Flexible Schemas
Webinar: Strongly Typed Languages and Flexible SchemasWebinar: Strongly Typed Languages and Flexible Schemas
Webinar: Strongly Typed Languages and Flexible SchemasMongoDB
 
json.ppt download for free for college project
json.ppt download for free for college projectjson.ppt download for free for college project
json.ppt download for free for college projectAmitSharma397241
 
MongoDB + node.js で作るソーシャルゲーム
MongoDB + node.js で作るソーシャルゲームMongoDB + node.js で作るソーシャルゲーム
MongoDB + node.js で作るソーシャルゲームSuguru Namura
 
Introduction to Courier
Introduction to CourierIntroduction to Courier
Introduction to CourierJoe Betz
 
Copy/paste detector for source code on javascript
Copy/paste detector for source code on javascript Copy/paste detector for source code on javascript
Copy/paste detector for source code on javascript Andrey Kucherenko
 
RESTful APIs: Promises & lies
RESTful APIs: Promises & liesRESTful APIs: Promises & lies
RESTful APIs: Promises & liesTareque Hossain
 
Example-driven Web API Specification Discovery
Example-driven Web API Specification DiscoveryExample-driven Web API Specification Discovery
Example-driven Web API Specification DiscoveryJavier Canovas
 
Automatic discovery of Web API Specifications: an example-driven approach
Automatic discovery of Web API Specifications: an example-driven approachAutomatic discovery of Web API Specifications: an example-driven approach
Automatic discovery of Web API Specifications: an example-driven approachJordi Cabot
 
Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Gera Shegalov
 
Building DSLs with Xtext - Eclipse Modeling Day 2009
Building DSLs with Xtext - Eclipse Modeling Day 2009Building DSLs with Xtext - Eclipse Modeling Day 2009
Building DSLs with Xtext - Eclipse Modeling Day 2009Heiko Behrens
 
BabelJS - James Kyle at Modern Web UI
BabelJS - James Kyle at Modern Web UIBabelJS - James Kyle at Modern Web UI
BabelJS - James Kyle at Modern Web UImodernweb
 
Intravert Server side processing for Cassandra
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for CassandraEdward Capriolo
 
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"DataStax Academy
 

Similar a Apache AVRO (Boston HUG, Jan 19, 2010) (20)

JLIFF: Where we are, and where we're going
JLIFF: Where we are, and where we're goingJLIFF: Where we are, and where we're going
JLIFF: Where we are, and where we're going
 
Strongly Typed Languages and Flexible Schemas
Strongly Typed Languages and Flexible SchemasStrongly Typed Languages and Flexible Schemas
Strongly Typed Languages and Flexible Schemas
 
Writing Domain Specific Languages with JSON Schema
Writing Domain Specific Languages with JSON SchemaWriting Domain Specific Languages with JSON Schema
Writing Domain Specific Languages with JSON Schema
 
Webinar: Strongly Typed Languages and Flexible Schemas
Webinar: Strongly Typed Languages and Flexible SchemasWebinar: Strongly Typed Languages and Flexible Schemas
Webinar: Strongly Typed Languages and Flexible Schemas
 
json.ppt download for free for college project
json.ppt download for free for college projectjson.ppt download for free for college project
json.ppt download for free for college project
 
MongoDB + node.js で作るソーシャルゲーム
MongoDB + node.js で作るソーシャルゲームMongoDB + node.js で作るソーシャルゲーム
MongoDB + node.js で作るソーシャルゲーム
 
Introduction to Courier
Introduction to CourierIntroduction to Courier
Introduction to Courier
 
Copy/paste detector for source code on javascript
Copy/paste detector for source code on javascript Copy/paste detector for source code on javascript
Copy/paste detector for source code on javascript
 
RESTful APIs: Promises & lies
RESTful APIs: Promises & liesRESTful APIs: Promises & lies
RESTful APIs: Promises & lies
 
Example-driven Web API Specification Discovery
Example-driven Web API Specification DiscoveryExample-driven Web API Specification Discovery
Example-driven Web API Specification Discovery
 
Automatic discovery of Web API Specifications: an example-driven approach
Automatic discovery of Web API Specifications: an example-driven approachAutomatic discovery of Web API Specifications: an example-driven approach
Automatic discovery of Web API Specifications: an example-driven approach
 
Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013
 
Json at work overview and ecosystem-v2.0
Json at work   overview and ecosystem-v2.0Json at work   overview and ecosystem-v2.0
Json at work overview and ecosystem-v2.0
 
Google Protocol Buffers
Google Protocol BuffersGoogle Protocol Buffers
Google Protocol Buffers
 
Building DSLs with Xtext - Eclipse Modeling Day 2009
Building DSLs with Xtext - Eclipse Modeling Day 2009Building DSLs with Xtext - Eclipse Modeling Day 2009
Building DSLs with Xtext - Eclipse Modeling Day 2009
 
BabelJS - James Kyle at Modern Web UI
BabelJS - James Kyle at Modern Web UIBabelJS - James Kyle at Modern Web UI
BabelJS - James Kyle at Modern Web UI
 
Intravert Server side processing for Cassandra
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for Cassandra
 
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
 
3 avro hug-2010-07-21
3 avro hug-2010-07-213 avro hug-2010-07-21
3 avro hug-2010-07-21
 
Json
JsonJson
Json
 

Más de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Más de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Apache AVRO (Boston HUG, Jan 19, 2010)

  • 1. Apache AVRO What's new? Philip Zeyliger, Cloudera (AVRO committer) Boston HUG January 19, 2009
  • 2. What's AVRO? A data serialization system Includes: A schema language A compact serialized form An RPC framework A handful of APIs, in a handful of languages Goals: Cross-language Support for dynamic access Simple but expressive schema evolution Same "space" as Apache Thrift, Google Protocol Buffers, Binary JSON, and XDR. Subtle differences with all of them.
  • 3. AVRO Protocols & Schemas @namespace("org.apache.avro.demo") protocol CurrencyConversion { enum Currency { USD, GBP, EUR, JPY } record Money { Currency currency; int amount; } error UnknownRateError { Currency currency; } Money convert(Money input, Currency targetCurrency) throws UnknownRateError; double rate(Currency input, Currency output) throws UnknownRateError; } "genavro" IDL (AVRO-258)
  • 4. $java -jar avro-tools-1.2.0-dev.jar genavro < demo.genavro "messages" : { { "convert" : { "protocol" : "CurrencyConversion", "request" : [ { "namespace" : "org.apache.avro.demo", "name" : "input", "types" : [ { "type" : "Money" "type" : "enum", }, { "name" : "Currency", "name" : "targetCurrency", "symbols" : [ "USD", "GBP", "EUR", "JPY" ] "type" : "Currency" }, { } ], "type" : "record", "response" : "Money", "name" : "Money", "errors" : [ "UnknownRateError" ] "fields" : [ { }, "name" : "currency", "rate" : { "type" : "Currency" "request" : [ { }, { "name" : "input", "name" : "amount", "type" : "Currency" "type" : "int" }, { }] "name" : "output", }, { "type" : "Currency" "type" : "error", } ], "name" : "UnknownRateError", "response" : "double", "fields" : [ { "errors" : [ "UnknownRateError" ] "name" : "currency", } "type" : "Currency" } }] }[ } ], JSON Representation of Protocol and Schemas
  • 5. Types primitive complex string record bytes array int & long map: string -> T float & double union boolean fixed<N> null enum
  • 6. Schema Evolution & Projection AVRO binary data never travels without its schema. This allows dynamic tooling. Writer's Schema and Reader's Schema may be different. { /* Writer */ { /* Reader */ "type" : "record", "type" : "record", "name" : "Person", "name" : "Person", "fields" : [ { "fields" : [ { "name" : "first", "name" : "first", "type" : "string" "type" : "string" }, { }, { "name" : "sport", "name" : "age", "type" : "string", "type" : "int", } "default": 0, } } } Serialized Data: Data presented to application: "Alice", "Ultimate Frisbee" "Alice", 0
  • 7. APIs Python Dynamic Java Specific (generated code) Generic (container-based) Reflection (induces schemas from classes) C C++ Ruby
  • 8. C API char buf[64]; avro_writer_t writer = avro_writer_memory(buf, sizeof(buf)); avro_schema_t writers_schema = avro_schema_string(); avro_datum_t datum = avro_string("Hello, world!"); avro_write_data(writer, writers_schema, datum); avro_reader_t reader = avro_reader_memory(buf, sizeof(buf)); avro_schema_t readers_schema = avro_schema_string(); avro_datum_t read_datum; avro_read_data(reader, writers_schema, readers_schema, &read_datum);
  • 9. Data File Format (AVRO-160) Features: * Splittable (important for Hadoop!) * Append only with same schema. * Compression * Arbitrary metadata * Simple
  • 10. Hadoop Integration Users AvroInputFormat/AvroOutputFormat (MR-815) Using AVRO in the shuffle (MR-1126) Note that AVRO schemas let you specify sort order; binary comparators are a thing of the past Many Writables can be AVRO+Reflection instead AVRO sort order leaves hand-writing RawComparators in the past; for Streaming, you now get fast comparators for free! Framework AVRO for Hadoop RPC (e.g., HDFS-982) Goals Open up protocols for cross-language use
  • 11. avro-tools Available tools: compile Generates Java code for the given schema. fragtojson Renders a binary-encoded Avro datum as JSON. fromjson Reads JSON records and writes an Avro data file. genavro Generates a JSON schema from a GenAvro file getschema Prints out schema of an Avro data file. induce Induce a schema/protocol from Java class/interface. jsontofrag Renders a JSON-encoded Avro datum as binary. rpcreceive Opens an HTTP RPC Server and listens for one message. rpcsend Sends a single RPC message. tojson Dumps an Avro data file as JSON, one record per line.
  • 12. 1.3 to be released soon... Good time to try it out! What's evolving? Trying not to evolve the serialized format. APIs are evolving. Transports are evolving.
  • 13. Obligatory Links Web page: http://hadoop.apache.org/avro/ Mailing list: avro-user-subscribe@hadoop.apache.org Source repository: http://svn.apache.org/repos/asf/hadoop/avro/
  • 14. Thanks! Questions? Philip Zeyliger philip@cloudera.com