Más contenido relacionado Similar a Tutorial Expert How-To - Create a model for Avro schemas (20) Más de PascalDesmarets1 (20) Tutorial Expert How-To - Create a model for Avro schemas2. About Apache Avro
• Open-source project
• Provides
• Data serialisation framework
• Data exchange services
• Often used with
• Kafka pub-sub pipelines
• Data lake storage
• Row-oriented object container
• as opposed to Parquet which is column-oriented
• Language independent, compact and efficient
Copyright © 2016-2023 Hackolade 2
3. Components of an Avro Schema Model
• An Avro schema can be viewed as a language-
agnostic contract for systems to interoperate.
• 4 attributes:
• Type: specifies the data type of the JSON record,
whether its complex type or primitive value. At the top
level of an Avro schema, it is mandatory to have a
“record” type.
• Name: the name of the Avro schema being defined
• Namespace: a high-level logical indicator of the Avro
schema
• Fields: the individual data elements of the
JSON object. Fields can be of primitive as
well as complex type, which can be
further made of simple and complex
data types.
Copyright © 2016-2023 Hackolade 3
4. About Apache Avro
• Uses JSON to define schema and data types
• Warning: Avro schema is quite different from JSON Schema, even if schema is
defined using JSON format
• Container file has header containing the schema, plus 1 or more
storage blocks
Copyright © 2016-2023 Hackolade 4
5. Avro schema evolution
• Avro allows for powerful schema evolution
• Can achieve backward- and forward-compatibility if done well
• Super interesting for “integration” services where many producers/consumers
may evolve applications using different versions of the schema
• Schemas and their evolving versions can be published in a schema
registry
• Goal is to maximise compatibility when decoupling the lifecycle of
publishers and consumers
• Specific guidelines and best practices
Copyright © 2016-2023 Hackolade 5
6. Benefits of Avro schema design in Hackolade Studio
• Visual design of Avro schema
• Anticipate future schema evolution
• Facilitate message validation
• Optimize message payload
• Enable compatibility and interoperability
• Integrate with Confluent Schema Registry and others
• Promote schema reuse (despite limited official specification and
documentation) via
• Avro namespace references
• Confluent Schema Registry’s
• Schema references
• Union schemas
Copyright © 2016-2023 Hackolade 6
7. Hackolade Studio support for Avro schema
• Graphical Avro schema design tool
• Can also import existing Avro schemas
• Integrates with the schema registries
(Confluent Schema Registry,
Azure EventHubs Schema Registry,
Pulsar Schema Registry,
other CSR API-compatible registries)
• For forward- and reverse-engineering of the Avro schema with these registries
• Hackolade also integrates with Object Storage providers (S3, ADLS, …)
for data lakes
• Hackolade maintains (recursive) references between elements of the
Avro schema
Copyright © 2016-2023 Hackolade 7
8. Hackolade Studio support for Avro schema
• Outputs of Avro Schema modeling
in Hackolade Studio:
• Entity-Relationship Diagrams of
multi-record/event environments
• Hierarchical view of nested objects
• Syntactically correct schema
generation for
• users without technical knowledge
• developers wanting to improve
quality and productivity
• Documentation of schema in
different formats
• Conversion to/from other targets
(OpenAPI, RDBMS, NoSQL) via
Polyglot Data Modeling
Copyright © 2016-2023 Hackolade 8
9. Hackolade Studio and Confluent Schema Registry
• Central repository with RESTful interface
• Developers publish schemas for
• Versioning
• Safe schema evolution
• Enhanced integrity
• Data discoverability
• Self-hosted or SaaS
• Hackolade Studio native integration
• can forward/reverse engineer Avro schemas with the registry
• including support for the different Subject Name Strategies
• supports namespace references, schema references, and union schemas
Copyright © 2016-2023 Hackolade 9
10. Reading material
• See Hackolade Studio online documentation
• The Hackolade Blog
• These excellent new books:
• MongoDB Data Modeling & Schema Design
• Many of the principles in the book are related to query-driven
modeling based on access patterns
• Neo4j Data Modeling
• Hackolade’s on social media: LinkedIn page, Twitter page
• Download Hackolade Studio for free
Copyright © 2016-2023 Hackolade 10