5. Secondary Concerns Grow
● As your market scales, the
need for more organizational
management and logistics
increases
● Manual processes get more
painful at scale, leading to
inconsistencies and overhead
● Latency of information
increases & quality decreases
5
6. Current State of your architectures
6
https://bit.ly/3bkSN8D
Unstandardized Data Processes Standardized Data Processes
7. Gaps at scale
● Scale defined as the number of data definitions/versions (Variety)
● No single source of truth of your data assets
● Duplication of code to represent/process data models in each technology you use (Java, C++, C#)
● Updating a data definition has a “ripple effect” across all pieces that touch that data model
Summary: Operational costs balloon while velocity of new features decreases with added risks for breaking
changes making it to production
7
8. Lingua Franca: Protobuf
● Developed by Google as an interface definition
language (IDL)
● Used to define data assets/contracts in a language
agnostic way
● Popular usage in gRPC (https://grpc.io/) for service
definitions
8
https://bit.ly/3rR65Aa
9. Machine-Driven World
9
● Equivalent JSON payload
takes up 82 bytes
● Protobuf takes up 33
bytes (2.5x smaller)
● JSON wasteful due to
retransmitting human
readable schema
information each time
https://bit.ly/3deX9Ay
11. What types of event patterns are available?
3 main patterns for event expression in Protobuf:
● Bare Letter
● Deep Envelope
● Shallow Envelope
11
12. Bare Letter
12
Emit only what you need to, how
you need to
Pros:
● Event Independence
● Clear definition, no extra
fields
Cons:
● Duplication of the same
fields across events
● Hard to reconcile on the
consumer side when
performing multi-event
analysis
14. Deep Envelope
14
Place your events in a rigid
envelope, sharing common fields
Pros:
● Encourages collaboration
on definition
● Leverage common fields
for generic processing
Cons:
● Harder to scope correctly
● Extra layer to understand
16. Shallow Envelope
16
Place any event into the
envelope, sharing common fields
Pros:
● No need to define events
upfront
● Great for apps that are
simply pass-through and
do not process the
attached event
Cons:
● Defers risk to runtime
● Need explicit code to
process the attached
payload (usually via
enums)
20. Walmart’s Competitive Edge
+ =
As of 2018:
● 11,700 stores
● 2.3 million employees
● 28 countries
● $32 billion of inventory
Achieved this scale because of its focus on inventory management
● First company-wide adopter of the barcode (1983), immediately could analyze at a per store basis
● Now moving into RFID technology, which has decreased out-of-stock occurences by 16% compared
to barcodes
https://bit.ly/3itGE44
20
21. Small Companies, Global Reach
21
● The Cloud has empowered small teams to have
global reach, competing with large enterprises
with their own data centers
● These inventory management challenges that
traditionally only the largest of companies
would have now appear in “small” organizations
● Unlike large companies, small companies cannot
afford to hire dozens of new people overnight to
scale
26. Summary
● Did not need to add a single line of explicit code or any model dependencies into our app
● Allows to convert any data that has a Confluent’s barcode embedded into the event
● Makes “onboarding” of a new event automatic and immediate; Confluent’s Java client
auto-registration takes care of the automation
● One less meeting or email to read about needing to update your code
26
28. Types of Organizations - Data Management
Look at the level of automation needed to operate the organization’s data assets
Define 3 types of possible systems:
● “Mentat” System - people are the system
● People Bridged Systems - siloed processes
● System-Driven Interactions - the person drives the system, the person is not the system itself
Note these are in no particular order; one is not necessarily better than the other in all contexts
28
29. Mentat System
● Mentat (Dune) - a human with immense
mathematical skills, exceptional cognitive
abilities of memory and perception
● People handle the distribution and
crafting of all data definitions into code,
spreadsheets, and dashboards
● Usually lots of duplication & manual
effort in data asset generation and
validation (data quality)
● Technologies of choice: emails, meetings,
no SRE mindset
29
30. People Bridged Systems
● Add more automation between
people to handle what computers
are good at
● Generate boilerplate code,
distribute and store artifacts, and
other CI/CD principles
● Still, manual toil is incurred across
teams/departments
● Duplication of processes across
silos and divergence of data
definitions
30
31. System-Driven Interactions
31
● Can center system
around version control
● Run automated checks
for the data asset
changes
● Input changes once,
trigger many
downstream changes
automatically
● Standardizes process to
get new data definitions
into your organization
33. Tool for Protobuf Automation
Great CLI tool called Buf:
● Allows to lint schemas
● Run compatibility checks
● Can be run out of Docker or
through local installation
Written in Go
https://docs.buf.build/tour-1
33
34. Speed, not Haste
● Pick the right event patterns; they affect
how your teams work or don’t work together
● Leverage language agnostic IDLs like
Protobuf to reduce manual toil
● Utilize Schema Registry to centralize your
“inventory management” system
● Fit people & software into the right places
34