Schemas, streams, and grocery stores

Schemas, Streams, and
Grocery Stores
Alexei Zenin
Platform Engineer, Uken Games
February 23rd, 2021

Overview
● Grease the gears - Farmers Market
● Protobuf & Event Modelling
● Conﬂuent Schema Registry
● Spring Boot example
● Schemas, CI/CD & Data Governance
2

Farmers Market
3
“All I want to do is sell my apples
and not manage all these people!”
- Unhappy Farmer

Secondary Concerns Grow
● As your market scales, the
need for more organizational
management and logistics
increases
● Manual processes get more
painful at scale, leading to
inconsistencies and overhead
● Latency of information
increases & quality decreases
5

Current State of your architectures
6
https://bit.ly/3bkSN8D
Unstandardized Data Processes Standardized Data Processes

Gaps at scale
● Scale defined as the number of data definitions/versions (Variety)
● No single source of truth of your data assets
● Duplication of code to represent/process data models in each technology you use (Java, C++, C#)
● Updating a data definition has a “ripple effect” across all pieces that touch that data model
Summary: Operational costs balloon while velocity of new features decreases with added risks for breaking
changes making it to production
7

Lingua Franca: Protobuf
● Developed by Google as an interface definition
language (IDL)
● Used to define data assets/contracts in a language
agnostic way
● Popular usage in gRPC (https://grpc.io/) for service
definitions
8
https://bit.ly/3rR65Aa

Machine-Driven World
9
● Equivalent JSON payload
takes up 82 bytes
● Protobuf takes up 33
bytes (2.5x smaller)
● JSON wasteful due to
retransmitting human
readable schema
information each time
https://bit.ly/3deX9Ay

Automatically Generate your code
10
Use pre-built models

What types of event patterns are available?
3 main patterns for event expression in Protobuf:
● Bare Letter
● Deep Envelope
● Shallow Envelope
11

Bare Letter
12
Emit only what you need to, how
you need to
Pros:
● Event Independence
● Clear definition, no extra
fields
Cons:
● Duplication of the same
fields across events
● Hard to reconcile on the
consumer side when
performing multi-event
analysis

Deep Envelope
14
Place your events in a rigid
envelope, sharing common fields
Pros:
● Encourages collaboration
on definition
● Leverage common fields
for generic processing
Cons:
● Harder to scope correctly
● Extra layer to understand

Shallow Envelope
16
Place any event into the
envelope, sharing common ﬁelds
Pros:
● No need to deﬁne events
upfront
● Great for apps that are
simply pass-through and
do not process the
attached event
Cons:
● Defers risk to runtime
● Need explicit code to
process the attached
payload (usually via
enums)

Shallow Envelope - Protobuf
17
Similar to Network Encapsulation

Event Modelling Summary
18
Pre-Release
Model
Compatibility
Checks
Maintainability
of Model
Maintainability
of Code
Generic
Processing
Ability
Bare Letter Yes High High Low
Deep Envelope Yes Medium Medium High
Shallow
Envelope
Only envelope
level
High Low Low-Medium
But how do we manage these deﬁnitions in our organizations?

Walmart’s Competitive Edge
+ =
As of 2018:
● 11,700 stores
● 2.3 million employees
● 28 countries
● $32 billion of inventory
Achieved this scale because of its focus on inventory management
● First company-wide adopter of the barcode (1983), immediately could analyze at a per store basis
● Now moving into RFID technology, which has decreased out-of-stock occurences by 16% compared
to barcodes
https://bit.ly/3itGE44
20

Small Companies, Global Reach
21
● The Cloud has empowered small teams to have
global reach, competing with large enterprises
with their own data centers
● These inventory management challenges that
traditionally only the largest of companies
would have now appear in “small” organizations
● Unlike large companies, small companies cannot
afford to hire dozens of new people overnight to
scale

Enter Schema Registry - Your Barcoding System
22
https://bit.ly/2N4itxS

Generic Protobuf to JSON Adapter
24
Code: https://bit.ly/3b4uMSR

Summary
● Did not need to add a single line of explicit code or any model dependencies into our app
● Allows to convert any data that has a Conﬂuent’s barcode embedded into the event
● Makes “onboarding” of a new event automatic and immediate; Conﬂuent’s Java client
auto-registration takes care of the automation
● One less meeting or email to read about needing to update your code
26

Where are we going with
schemas?
27

Types of Organizations - Data Management
Look at the level of automation needed to operate the organization’s data assets
Deﬁne 3 types of possible systems:
● “Mentat” System - people are the system
● People Bridged Systems - siloed processes
● System-Driven Interactions - the person drives the system, the person is not the system itself
Note these are in no particular order; one is not necessarily better than the other in all contexts
28

Mentat System
● Mentat (Dune) - a human with immense
mathematical skills, exceptional cognitive
abilities of memory and perception
● People handle the distribution and
crafting of all data deﬁnitions into code,
spreadsheets, and dashboards
● Usually lots of duplication & manual
effort in data asset generation and
validation (data quality)
● Technologies of choice: emails, meetings,
no SRE mindset
29

People Bridged Systems
● Add more automation between
people to handle what computers
are good at
● Generate boilerplate code,
distribute and store artifacts, and
other CI/CD principles
● Still, manual toil is incurred across
teams/departments
● Duplication of processes across
silos and divergence of data
deﬁnitions
30

System-Driven Interactions
31
● Can center system
around version control
● Run automated checks
for the data asset
changes
● Input changes once,
trigger many
downstream changes
automatically
● Standardizes process to
get new data deﬁnitions
into your organization

Tool for Protobuf Automation
Great CLI tool called Buf:
● Allows to lint schemas
● Run compatibility checks
● Can be run out of Docker or
through local installation
Written in Go
https://docs.buf.build/tour-1
33

Speed, not Haste
● Pick the right event patterns; they affect
how your teams work or don’t work together
● Leverage language agnostic IDLs like
Protobuf to reduce manual toil
● Utilize Schema Registry to centralize your
“inventory management” system
● Fit people & software into the right places
34

Schemas, streams, and grocery stores

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Schemas, streams, and grocery stores

Similar to Schemas, streams, and grocery stores (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

Schemas, streams, and grocery stores