SlideShare una empresa de Scribd logo
1 de 11
Descargar para leer sin conexión
Hackolade Tutorial
Visual design of Avro schema
Copyright © 2016-2023 Hackolade 1
About Apache Avro
• Open-source project
• Provides
• Data serialisation framework
• Data exchange services
• Often used with
• Kafka pub-sub pipelines
• Data lake storage
• Row-oriented object container
• as opposed to Parquet which is column-oriented
• Language independent, compact and efficient
Copyright © 2016-2023 Hackolade 2
Components of an Avro Schema Model
• An Avro schema can be viewed as a language-
agnostic contract for systems to interoperate.
• 4 attributes:
• Type: specifies the data type of the JSON record,
whether its complex type or primitive value. At the top
level of an Avro schema, it is mandatory to have a
“record” type.
• Name: the name of the Avro schema being defined
• Namespace: a high-level logical indicator of the Avro
schema
• Fields: the individual data elements of the
JSON object. Fields can be of primitive as
well as complex type, which can be
further made of simple and complex
data types.
Copyright © 2016-2023 Hackolade 3
About Apache Avro
• Uses JSON to define schema and data types
• Warning: Avro schema is quite different from JSON Schema, even if schema is
defined using JSON format
• Container file has header containing the schema, plus 1 or more
storage blocks
Copyright © 2016-2023 Hackolade 4
Avro schema evolution
• Avro allows for powerful schema evolution
• Can achieve backward- and forward-compatibility if done well
• Super interesting for “integration” services where many producers/consumers
may evolve applications using different versions of the schema
• Schemas and their evolving versions can be published in a schema
registry
• Goal is to maximise compatibility when decoupling the lifecycle of
publishers and consumers
• Specific guidelines and best practices
Copyright © 2016-2023 Hackolade 5
Benefits of Avro schema design in Hackolade Studio
• Visual design of Avro schema
• Anticipate future schema evolution
• Facilitate message validation
• Optimize message payload
• Enable compatibility and interoperability
• Integrate with Confluent Schema Registry and others
• Promote schema reuse (despite limited official specification and
documentation) via
• Avro namespace references
• Confluent Schema Registry’s
• Schema references
• Union schemas
Copyright © 2016-2023 Hackolade 6
Hackolade Studio support for Avro schema
• Graphical Avro schema design tool
• Can also import existing Avro schemas
• Integrates with the schema registries
(Confluent Schema Registry,
Azure EventHubs Schema Registry,
Pulsar Schema Registry,
other CSR API-compatible registries)
• For forward- and reverse-engineering of the Avro schema with these registries
• Hackolade also integrates with Object Storage providers (S3, ADLS, …)
for data lakes
• Hackolade maintains (recursive) references between elements of the
Avro schema
Copyright © 2016-2023 Hackolade 7
Hackolade Studio support for Avro schema
• Outputs of Avro Schema modeling
in Hackolade Studio:
• Entity-Relationship Diagrams of
multi-record/event environments
• Hierarchical view of nested objects
• Syntactically correct schema
generation for
• users without technical knowledge
• developers wanting to improve
quality and productivity
• Documentation of schema in
different formats
• Conversion to/from other targets
(OpenAPI, RDBMS, NoSQL) via
Polyglot Data Modeling
Copyright © 2016-2023 Hackolade 8
Hackolade Studio and Confluent Schema Registry
• Central repository with RESTful interface
• Developers publish schemas for
• Versioning
• Safe schema evolution
• Enhanced integrity
• Data discoverability
• Self-hosted or SaaS
• Hackolade Studio native integration
• can forward/reverse engineer Avro schemas with the registry
• including support for the different Subject Name Strategies
• supports namespace references, schema references, and union schemas
Copyright © 2016-2023 Hackolade 9
Reading material
• See Hackolade Studio online documentation
• The Hackolade Blog
• These excellent new books:
• MongoDB Data Modeling & Schema Design
• Many of the principles in the book are related to query-driven
modeling based on access patterns
• Neo4j Data Modeling
• Hackolade’s on social media: LinkedIn page, Twitter page
• Download Hackolade Studio for free
Copyright © 2016-2023 Hackolade 10
Questions?
Answers!
Copyright © 2016-2023 Hackolade 11

Más contenido relacionado

Similar a Tutorial Expert How-To - Create a model for Avro schemas

開放原始碼 Ch1.2 intro - oss - apahce foundry (ver 2.0)
開放原始碼 Ch1.2   intro - oss - apahce foundry (ver 2.0)開放原始碼 Ch1.2   intro - oss - apahce foundry (ver 2.0)
開放原始碼 Ch1.2 intro - oss - apahce foundry (ver 2.0)
My own sweet home!
 
End-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasEnd-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and Atlas
DataWorks Summit
 
ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark
ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and SparkODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark
ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark
Carolyn Duby
 
IOT15_Unit6.pptx
IOT15_Unit6.pptxIOT15_Unit6.pptx
IOT15_Unit6.pptx
suptel
 
Service-Oriented Design and Implement with Rails3
Service-Oriented Design and Implement with Rails3Service-Oriented Design and Implement with Rails3
Service-Oriented Design and Implement with Rails3
Wen-Tien Chang
 
HTML5 Programming
HTML5 ProgrammingHTML5 Programming
HTML5 Programming
hotrannam
 

Similar a Tutorial Expert How-To - Create a model for Avro schemas (20)

DataFrames: The Extended Cut
DataFrames: The Extended CutDataFrames: The Extended Cut
DataFrames: The Extended Cut
 
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
開放原始碼 Ch1.2 intro - oss - apahce foundry (ver 2.0)
開放原始碼 Ch1.2   intro - oss - apahce foundry (ver 2.0)開放原始碼 Ch1.2   intro - oss - apahce foundry (ver 2.0)
開放原始碼 Ch1.2 intro - oss - apahce foundry (ver 2.0)
 
Hackolade Tutorial - part 12 - Create a REST API model
Hackolade Tutorial - part  12 - Create a REST API modelHackolade Tutorial - part  12 - Create a REST API model
Hackolade Tutorial - part 12 - Create a REST API model
 
End-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasEnd-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and Atlas
 
Tutorial Workgroup - Model versioning and collaboration
Tutorial Workgroup - Model versioning and collaborationTutorial Workgroup - Model versioning and collaboration
Tutorial Workgroup - Model versioning and collaboration
 
ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark
ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and SparkODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark
ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark
 
Building Software Backend (Web API)
Building Software Backend (Web API)Building Software Backend (Web API)
Building Software Backend (Web API)
 
Schema registry
Schema registrySchema registry
Schema registry
 
Oracle Office Hours - Exposing REST services with APEX and ORDS
Oracle Office Hours - Exposing REST services with APEX and ORDSOracle Office Hours - Exposing REST services with APEX and ORDS
Oracle Office Hours - Exposing REST services with APEX and ORDS
 
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
 
IOT15_Unit6.pptx
IOT15_Unit6.pptxIOT15_Unit6.pptx
IOT15_Unit6.pptx
 
APIdays 2016 - The State of Web API Languages
APIdays 2016  - The State of Web API LanguagesAPIdays 2016  - The State of Web API Languages
APIdays 2016 - The State of Web API Languages
 
Service-Oriented Design and Implement with Rails3
Service-Oriented Design and Implement with Rails3Service-Oriented Design and Implement with Rails3
Service-Oriented Design and Implement with Rails3
 
MySQL Connector/Node.js and the X DevAPI
MySQL Connector/Node.js and the X DevAPIMySQL Connector/Node.js and the X DevAPI
MySQL Connector/Node.js and the X DevAPI
 
Parquet and AVRO
Parquet and AVROParquet and AVRO
Parquet and AVRO
 
NYC Identity Summit Tech Day: ForgeRock DevOps/Cloud Strategy
NYC Identity Summit Tech Day: ForgeRock DevOps/Cloud StrategyNYC Identity Summit Tech Day: ForgeRock DevOps/Cloud Strategy
NYC Identity Summit Tech Day: ForgeRock DevOps/Cloud Strategy
 
HTML5 Programming
HTML5 ProgrammingHTML5 Programming
HTML5 Programming
 

Más de PascalDesmarets1

Más de PascalDesmarets1 (20)

Tutorial Workgroup - Working with Forks
Tutorial Workgroup - Working with ForksTutorial Workgroup - Working with Forks
Tutorial Workgroup - Working with Forks
 
Tutorial Advanced How-To - Oracle 23c Duality views
Tutorial Advanced How-To - Oracle 23c Duality viewsTutorial Advanced How-To - Oracle 23c Duality views
Tutorial Advanced How-To - Oracle 23c Duality views
 
Tutorial Expert How-To - Docker-based automation
Tutorial Expert How-To - Docker-based automationTutorial Expert How-To - Docker-based automation
Tutorial Expert How-To - Docker-based automation
 
Tutorial Getting Started part 4 - Domain-Driven Data Modeling
Tutorial Getting Started part 4 - Domain-Driven Data ModelingTutorial Getting Started part 4 - Domain-Driven Data Modeling
Tutorial Getting Started part 4 - Domain-Driven Data Modeling
 
Tutorial Getting Started part 3 - Metadata-as-Code
Tutorial Getting Started part 3 - Metadata-as-CodeTutorial Getting Started part 3 - Metadata-as-Code
Tutorial Getting Started part 3 - Metadata-as-Code
 
Tutorial Getting Started part 2 - Polyglot Data Modeling
Tutorial Getting Started part 2 - Polyglot Data ModelingTutorial Getting Started part 2 - Polyglot Data Modeling
Tutorial Getting Started part 2 - Polyglot Data Modeling
 
Tutorial Getting Started part 1 - Overview
Tutorial Getting Started part 1 - OverviewTutorial Getting Started part 1 - Overview
Tutorial Getting Started part 1 - Overview
 
Tutorial Expert How-To - Verify Data Model
Tutorial Expert How-To - Verify Data ModelTutorial Expert How-To - Verify Data Model
Tutorial Expert How-To - Verify Data Model
 
Tutorial Expert How-To - Naming Conventions
Tutorial Expert How-To - Naming ConventionsTutorial Expert How-To - Naming Conventions
Tutorial Expert How-To - Naming Conventions
 
Tutorial Expert How-To - Export-Import with Excel template
Tutorial Expert How-To - Export-Import with Excel templateTutorial Expert How-To - Export-Import with Excel template
Tutorial Expert How-To - Export-Import with Excel template
 
Tutorial Expert How-To - Compare and Merge
Tutorial Expert How-To - Compare and MergeTutorial Expert How-To - Compare and Merge
Tutorial Expert How-To - Compare and Merge
 
Tutorial Expert How-To - Custom properties
Tutorial Expert How-To - Custom propertiesTutorial Expert How-To - Custom properties
Tutorial Expert How-To - Custom properties
 
Tutorial Expert How-To - Command Line Interface (CLI)
Tutorial Expert How-To - Command Line Interface (CLI)Tutorial Expert How-To - Command Line Interface (CLI)
Tutorial Expert How-To - Command Line Interface (CLI)
 
Tutorial Expert How-To - Add reusable Definitions
Tutorial Expert How-To - Add reusable DefinitionsTutorial Expert How-To - Add reusable Definitions
Tutorial Expert How-To - Add reusable Definitions
 
Hackolade Tutorial - part 13 - Leverage a Polyglot data model
Hackolade Tutorial - part 13 - Leverage a Polyglot data modelHackolade Tutorial - part 13 - Leverage a Polyglot data model
Hackolade Tutorial - part 13 - Leverage a Polyglot data model
 
Hackolade Tutorial - part 8 - Import or reverse-engineer.pdf
Hackolade Tutorial - part 8 - Import or reverse-engineer.pdfHackolade Tutorial - part 8 - Import or reverse-engineer.pdf
Hackolade Tutorial - part 8 - Import or reverse-engineer.pdf
 
Hackolade Tutorial - part 6 - Add choice, conditional, pattern fields.pdf
Hackolade Tutorial - part 6 - Add choice, conditional, pattern fields.pdfHackolade Tutorial - part 6 - Add choice, conditional, pattern fields.pdf
Hackolade Tutorial - part 6 - Add choice, conditional, pattern fields.pdf
 
Hackolade Tutorial - part 4 - Create your first data model
Hackolade Tutorial - part 4 - Create your first data modelHackolade Tutorial - part 4 - Create your first data model
Hackolade Tutorial - part 4 - Create your first data model
 
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
 
Hackolade Tutorial - part 2 - Overview of JSON and JSON schema
Hackolade Tutorial - part 2 - Overview of JSON and JSON schemaHackolade Tutorial - part 2 - Overview of JSON and JSON schema
Hackolade Tutorial - part 2 - Overview of JSON and JSON schema
 

Último

Último (20)

The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 

Tutorial Expert How-To - Create a model for Avro schemas

  • 1. Hackolade Tutorial Visual design of Avro schema Copyright © 2016-2023 Hackolade 1
  • 2. About Apache Avro • Open-source project • Provides • Data serialisation framework • Data exchange services • Often used with • Kafka pub-sub pipelines • Data lake storage • Row-oriented object container • as opposed to Parquet which is column-oriented • Language independent, compact and efficient Copyright © 2016-2023 Hackolade 2
  • 3. Components of an Avro Schema Model • An Avro schema can be viewed as a language- agnostic contract for systems to interoperate. • 4 attributes: • Type: specifies the data type of the JSON record, whether its complex type or primitive value. At the top level of an Avro schema, it is mandatory to have a “record” type. • Name: the name of the Avro schema being defined • Namespace: a high-level logical indicator of the Avro schema • Fields: the individual data elements of the JSON object. Fields can be of primitive as well as complex type, which can be further made of simple and complex data types. Copyright © 2016-2023 Hackolade 3
  • 4. About Apache Avro • Uses JSON to define schema and data types • Warning: Avro schema is quite different from JSON Schema, even if schema is defined using JSON format • Container file has header containing the schema, plus 1 or more storage blocks Copyright © 2016-2023 Hackolade 4
  • 5. Avro schema evolution • Avro allows for powerful schema evolution • Can achieve backward- and forward-compatibility if done well • Super interesting for “integration” services where many producers/consumers may evolve applications using different versions of the schema • Schemas and their evolving versions can be published in a schema registry • Goal is to maximise compatibility when decoupling the lifecycle of publishers and consumers • Specific guidelines and best practices Copyright © 2016-2023 Hackolade 5
  • 6. Benefits of Avro schema design in Hackolade Studio • Visual design of Avro schema • Anticipate future schema evolution • Facilitate message validation • Optimize message payload • Enable compatibility and interoperability • Integrate with Confluent Schema Registry and others • Promote schema reuse (despite limited official specification and documentation) via • Avro namespace references • Confluent Schema Registry’s • Schema references • Union schemas Copyright © 2016-2023 Hackolade 6
  • 7. Hackolade Studio support for Avro schema • Graphical Avro schema design tool • Can also import existing Avro schemas • Integrates with the schema registries (Confluent Schema Registry, Azure EventHubs Schema Registry, Pulsar Schema Registry, other CSR API-compatible registries) • For forward- and reverse-engineering of the Avro schema with these registries • Hackolade also integrates with Object Storage providers (S3, ADLS, …) for data lakes • Hackolade maintains (recursive) references between elements of the Avro schema Copyright © 2016-2023 Hackolade 7
  • 8. Hackolade Studio support for Avro schema • Outputs of Avro Schema modeling in Hackolade Studio: • Entity-Relationship Diagrams of multi-record/event environments • Hierarchical view of nested objects • Syntactically correct schema generation for • users without technical knowledge • developers wanting to improve quality and productivity • Documentation of schema in different formats • Conversion to/from other targets (OpenAPI, RDBMS, NoSQL) via Polyglot Data Modeling Copyright © 2016-2023 Hackolade 8
  • 9. Hackolade Studio and Confluent Schema Registry • Central repository with RESTful interface • Developers publish schemas for • Versioning • Safe schema evolution • Enhanced integrity • Data discoverability • Self-hosted or SaaS • Hackolade Studio native integration • can forward/reverse engineer Avro schemas with the registry • including support for the different Subject Name Strategies • supports namespace references, schema references, and union schemas Copyright © 2016-2023 Hackolade 9
  • 10. Reading material • See Hackolade Studio online documentation • The Hackolade Blog • These excellent new books: • MongoDB Data Modeling & Schema Design • Many of the principles in the book are related to query-driven modeling based on access patterns • Neo4j Data Modeling • Hackolade’s on social media: LinkedIn page, Twitter page • Download Hackolade Studio for free Copyright © 2016-2023 Hackolade 10