SlideShare una empresa de Scribd logo
1 de 40
Sea of data
Story of data, scale and how we evolve
architecture to handle it.
Daniel Marchant (@driedtoast)
What do you think of when
you hear the word “data”?
Setting the stage
Data – Things known or assumed as facts,
making the basis of reasoning or calculation
Time – the indefinite continued progress of
existence and events in the past, present and
future regarded as a whole
What types of data
are there?
Types of data
● Customer Data - Data the customer provides,
the lifeblood of your application
● Business Data - Metrics on how growth,
customer attrition, marketing, etc...
● Operation Data - Metrics and log messages
that help troubleshoot / monitor your
application
Let’s jump into the story...
Once upon a time...
A company was founded to
produce the best seamonkey
management application ever
produced. (purely fictional for
now)
More details: http://www.seamonkey.xyz (eventually)
A hypothetical system timeline
● Launch of application
● Reddit posts promote application
● Hacker News promotes application
● Product Hunt promotes application
Launch
Look ma, I got an app online!
Initial dataset
● Operation Data
○ cpu / memory / disk metrics
○ error messages in logs
● Business Data
○ Signup metrics
○ Access usage
● Customer Data
○ User
○ Seamonkey info
Launch Architecture
Architecture
● Load balancer - route traffic to application
● Application - handles requests and manages
data to the database
● Database - data storage
So simple, life is good! Some reads and writes!
Integrations
● Metric Service - google analytics,
kilometer.io, kissmetrics, mixpanel, etc...
● Operation Events - datadog, graylog,
newrelic, etc…
Troubleshooting
● Pretty straight forward
● Check application can write to DB
● Make sure database user can access tables
● Make sure the transactions scoped in the
application make sense
● Check rollback scenarios
A little about ACID
● Atomicity: all task(s) within a transaction are performed or
none of them are. An all-or-none principle.
● Consistency: transaction does not violate those protocols
and the data must remain in a consistent state at the
beginning and end of a transaction; no half-completed
transactions.
● Isolation: each transaction is independent unto itself for
both performance and consistency of transactions.
● Durability: Once complete the transaction will persist as
complete; it will survive system failure, power loss and other
types of system breakdowns.
Reddit
Oh, cool some people are looking at it!
Data evolution
● Operation Data Additions
○ Timers on critical logic
○ Customer requests
● Business Data Additions
○ Customer emails on problems
● Customer Data Additions
○ Seamonkey Tank
○ Seamonkey Social interactions
Architecture Changes
Architecture
● Load balancer - route traffic to application
● Application - still managing data, more
nodes added
● Worker - handles work from the db ‘queue’
table
● Cache - used to taper database reads
● Database - data storage master
● Read Only Database - slave data storage
More integrations
● Gmail - customer emails
● DataLoop - Timers and statsd data
● Open Tracing - distributed event tracing
http://opentracing.io/
More Troubleshooting
● If the application isn’t display the right data,
is the cache invalidated properly
● Has the worker updated over the application
as changes happen within the queued
process
● Is replication working on from master to
slave
Hacker News
What have I gotten myself into?
Data evolution
● Operation Data / Business Data convergence
○ Customer requests
○ Customer emails to support cases
○ Customer usage to product roadmap
● Customer Data requirements stabilize
Architecture Changes
Architecture
● Application - still managing data, more
nodes added, application pushes writes to a
queue for non-critical work
● Worker - handles work coming from queue
vs db, and writes from application. Also
invalidates cache now.
● Cache - used to taper database reads. App is
getting more complex invalidation logic
CAPs off to you!
● Consistency: same idea presented in ACID.
All data storage nodes see the data.
● Availability: data is available
● Partition Tolerance: system continues to
operate even under circumstances of data
loss or system failure. A single node failure
should not cause the entire system to
collapse.
Troubleshooting
● Oh boy, more systems more debugging
“opportunities”
● If data isn’t updated, has the queue gotten
the event from the application? Has the
worker processed the change event and
written to db?
● Is the queue up? Is the worker up?
Product Hunt
There’s too many people on this planet.
We need another plague.
Data evolution
● Operation Data
○ Hopes for attrition
● Business Data
○ Monitors customer attrition
○ Hopes for NO attrition
● Customer Data
○ Grows insane
○ Working out archive strategies
Architecture Changes
Architecture
● Lifecyle service / database - added a service
to migrate some of the monolith app, service
just handles seamonkey growth and lifecyle
● Worker - still listens for events, writes to
lifecycle service
● Stream - swapped out the queue with an
immutable stream, better data recovery
BASE
● Basically Available: system does guarantee the availability
of the data as regards CAP Theorem; there will be a
response to any request. Response could be a failure to find
data or data could be in an inconsistent state.
● Soft state: state of the data could change over time, there
may be changes going on due to ‘eventual consistency’
● Eventual consistency: data will eventually become
consistent once it stops receiving changes. The system will
continue to receive changes and is not checking the
consistency of every transaction before it moves onto the
next one.
Troubleshooting
● If seamonkeys aren’t progressing, debug
new service, is it up? Database for service
up?
● If event isn’t processed reset stream point to
catch up, handle duplicate events on the
worker vs stream.
● UI not finding events, check service up.
Immutability for the
changing chaos...
Time and data
As you see through the growth patterns, time
and data start to have trade offs. With
questions such as:
● How fast does the data update?
● How do we support a backup and restore?
● How do we ensure no data loss?
Immutability and Time
● If point in time never changes, immutability
is achieved
● Pointer vs point in time, current data version
is a pointer to the latest point in time
● A timeline of data changes provides for
restoration and easier debugging
Distributed immutability
● Database transaction log is an immutable
stream of changes
○ Used for replication, most database /
datastores use this approach
● Immutable stream(Kafkta, Kinesis) provides
an incoming change log, latest changes can
be pointed to part of stream. Reverse db
approach
What’s the point of all this?
Some plankton for thought
● If you have any idea where you'll end up,
you’d have a better idea where to start
● Understanding reactions to growth will help
with setting up services as you grow
● Misery loves company, knowing everyone
has these pain points somehow makes you
happier
● Know where you’ve been helps you now
Thank you!

Más contenido relacionado

Destacado

Windows Phone: Presente y futuro
Windows Phone: Presente y futuroWindows Phone: Presente y futuro
Windows Phone: Presente y futuroHernan Guzman
 
Hany Salah last update C.V
Hany Salah last update C.VHany Salah last update C.V
Hany Salah last update C.Vhany salah
 
Edinburgh | May-16 | Future Battery Chemistries – The Rôle of Sodium
 Edinburgh | May-16 | Future Battery Chemistries – The Rôle of Sodium Edinburgh | May-16 | Future Battery Chemistries – The Rôle of Sodium
Edinburgh | May-16 | Future Battery Chemistries – The Rôle of SodiumSmart Villages
 
Anaya -manual_avanzado_de_flash_mx
Anaya  -manual_avanzado_de_flash_mxAnaya  -manual_avanzado_de_flash_mx
Anaya -manual_avanzado_de_flash_mxDiego Aguilera
 
Premio innova s@lute2016 lecce cardioprotetto
Premio innova s@lute2016   lecce cardioprotettoPremio innova s@lute2016   lecce cardioprotetto
Premio innova s@lute2016 lecce cardioprotettoFPA
 
Nursing Homes: Making the Right Choice
Nursing Homes: Making the Right ChoiceNursing Homes: Making the Right Choice
Nursing Homes: Making the Right ChoiceDavid Corman
 
Digital Enterprise_Cover Story
Digital Enterprise_Cover StoryDigital Enterprise_Cover Story
Digital Enterprise_Cover Storysmita vasudevan
 
Chapter i quantities editing
Chapter i quantities editingChapter i quantities editing
Chapter i quantities editingrozi arrozi
 
Identification and analysis of “extended 210” promoters
Identification and analysis of “extended 210” promotersIdentification and analysis of “extended 210” promoters
Identification and analysis of “extended 210” promotersfateh11
 

Destacado (15)

Miniclase
MiniclaseMiniclase
Miniclase
 
Tam workbook
Tam workbookTam workbook
Tam workbook
 
Windows Phone: Presente y futuro
Windows Phone: Presente y futuroWindows Phone: Presente y futuro
Windows Phone: Presente y futuro
 
Resolución 652 de 2012
Resolución 652 de 2012Resolución 652 de 2012
Resolución 652 de 2012
 
Hany Salah last update C.V
Hany Salah last update C.VHany Salah last update C.V
Hany Salah last update C.V
 
Edinburgh | May-16 | Future Battery Chemistries – The Rôle of Sodium
 Edinburgh | May-16 | Future Battery Chemistries – The Rôle of Sodium Edinburgh | May-16 | Future Battery Chemistries – The Rôle of Sodium
Edinburgh | May-16 | Future Battery Chemistries – The Rôle of Sodium
 
Evaluación v3
Evaluación v3Evaluación v3
Evaluación v3
 
Anaya -manual_avanzado_de_flash_mx
Anaya  -manual_avanzado_de_flash_mxAnaya  -manual_avanzado_de_flash_mx
Anaya -manual_avanzado_de_flash_mx
 
Ensayo lucy inferencial
Ensayo lucy inferencialEnsayo lucy inferencial
Ensayo lucy inferencial
 
Premio innova s@lute2016 lecce cardioprotetto
Premio innova s@lute2016   lecce cardioprotettoPremio innova s@lute2016   lecce cardioprotetto
Premio innova s@lute2016 lecce cardioprotetto
 
Nursing Homes: Making the Right Choice
Nursing Homes: Making the Right ChoiceNursing Homes: Making the Right Choice
Nursing Homes: Making the Right Choice
 
Digital Enterprise_Cover Story
Digital Enterprise_Cover StoryDigital Enterprise_Cover Story
Digital Enterprise_Cover Story
 
Chapter i quantities editing
Chapter i quantities editingChapter i quantities editing
Chapter i quantities editing
 
Negociacion ces3
Negociacion ces3Negociacion ces3
Negociacion ces3
 
Identification and analysis of “extended 210” promoters
Identification and analysis of “extended 210” promotersIdentification and analysis of “extended 210” promoters
Identification and analysis of “extended 210” promoters
 

Similar a Sea of Data

Big Data overview
Big Data overviewBig Data overview
Big Data overviewalexisroos
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessAnant Corporation
 
Online Crime Management System.ppt.pptx
Online Crime Management System.ppt.pptxOnline Crime Management System.ppt.pptx
Online Crime Management System.ppt.pptxAshutoshmahale3
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data PlatformDani Solà Lagares
 
The Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServiceThe Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServicePoornima Vijayashanker
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 
How to build data accessibility for everyone
How to build data accessibility for everyoneHow to build data accessibility for everyone
How to build data accessibility for everyoneKaren Hsieh
 
Why does a business need real-time data processing?
Why does a business need real-time data processing?Why does a business need real-time data processing?
Why does a business need real-time data processing?NexSoftsys
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Brian Brazil
 
Unified Operations Vision
Unified Operations VisionUnified Operations Vision
Unified Operations VisionSteve Mushero
 
Self service BI for humans
Self service BI for humansSelf service BI for humans
Self service BI for humansAdrian Brudaru
 
Introduction to Data Stream Processing
Introduction to Data Stream ProcessingIntroduction to Data Stream Processing
Introduction to Data Stream ProcessingSafe Software
 
7 Emerging Data & Enterprise Integration Trends in 2022
7 Emerging Data & Enterprise Integration Trends in 20227 Emerging Data & Enterprise Integration Trends in 2022
7 Emerging Data & Enterprise Integration Trends in 2022Safe Software
 
TidalScale Overview
TidalScale OverviewTidalScale Overview
TidalScale OverviewPete Jarvis
 
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA
 
Data Processing ,Translate Raw Data Into Valuable Insights.pdf
Data Processing ,Translate Raw Data Into Valuable Insights.pdfData Processing ,Translate Raw Data Into Valuable Insights.pdf
Data Processing ,Translate Raw Data Into Valuable Insights.pdfAndrew Leo
 
Ledingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lkLedingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lkMukesh Singh
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InSnapLogic
 

Similar a Sea of Data (20)

Big Data overview
Big Data overviewBig Data overview
Big Data overview
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
 
Online Crime Management System.ppt.pptx
Online Crime Management System.ppt.pptxOnline Crime Management System.ppt.pptx
Online Crime Management System.ppt.pptx
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
 
The Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServiceThe Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web Service
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
3 types of monitoring for 2020
3 types of monitoring for 20203 types of monitoring for 2020
3 types of monitoring for 2020
 
How to build data accessibility for everyone
How to build data accessibility for everyoneHow to build data accessibility for everyone
How to build data accessibility for everyone
 
Why does a business need real-time data processing?
Why does a business need real-time data processing?Why does a business need real-time data processing?
Why does a business need real-time data processing?
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
Unified Operations Vision
Unified Operations VisionUnified Operations Vision
Unified Operations Vision
 
Self service BI for humans
Self service BI for humansSelf service BI for humans
Self service BI for humans
 
Introduction to Data Stream Processing
Introduction to Data Stream ProcessingIntroduction to Data Stream Processing
Introduction to Data Stream Processing
 
7 Emerging Data & Enterprise Integration Trends in 2022
7 Emerging Data & Enterprise Integration Trends in 20227 Emerging Data & Enterprise Integration Trends in 2022
7 Emerging Data & Enterprise Integration Trends in 2022
 
TidalScale Overview
TidalScale OverviewTidalScale Overview
TidalScale Overview
 
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
 
Data Processing ,Translate Raw Data Into Valuable Insights.pdf
Data Processing ,Translate Raw Data Into Valuable Insights.pdfData Processing ,Translate Raw Data Into Valuable Insights.pdf
Data Processing ,Translate Raw Data Into Valuable Insights.pdf
 
Ledingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lkLedingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lk
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
 

Último

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 

Último (20)

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 

Sea of Data

  • 1. Sea of data Story of data, scale and how we evolve architecture to handle it. Daniel Marchant (@driedtoast)
  • 2. What do you think of when you hear the word “data”?
  • 3. Setting the stage Data – Things known or assumed as facts, making the basis of reasoning or calculation Time – the indefinite continued progress of existence and events in the past, present and future regarded as a whole
  • 4. What types of data are there?
  • 5. Types of data ● Customer Data - Data the customer provides, the lifeblood of your application ● Business Data - Metrics on how growth, customer attrition, marketing, etc... ● Operation Data - Metrics and log messages that help troubleshoot / monitor your application
  • 6. Let’s jump into the story...
  • 7. Once upon a time... A company was founded to produce the best seamonkey management application ever produced. (purely fictional for now) More details: http://www.seamonkey.xyz (eventually)
  • 8. A hypothetical system timeline ● Launch of application ● Reddit posts promote application ● Hacker News promotes application ● Product Hunt promotes application
  • 9. Launch Look ma, I got an app online!
  • 10. Initial dataset ● Operation Data ○ cpu / memory / disk metrics ○ error messages in logs ● Business Data ○ Signup metrics ○ Access usage ● Customer Data ○ User ○ Seamonkey info
  • 12. Architecture ● Load balancer - route traffic to application ● Application - handles requests and manages data to the database ● Database - data storage So simple, life is good! Some reads and writes!
  • 13. Integrations ● Metric Service - google analytics, kilometer.io, kissmetrics, mixpanel, etc... ● Operation Events - datadog, graylog, newrelic, etc…
  • 14. Troubleshooting ● Pretty straight forward ● Check application can write to DB ● Make sure database user can access tables ● Make sure the transactions scoped in the application make sense ● Check rollback scenarios
  • 15. A little about ACID ● Atomicity: all task(s) within a transaction are performed or none of them are. An all-or-none principle. ● Consistency: transaction does not violate those protocols and the data must remain in a consistent state at the beginning and end of a transaction; no half-completed transactions. ● Isolation: each transaction is independent unto itself for both performance and consistency of transactions. ● Durability: Once complete the transaction will persist as complete; it will survive system failure, power loss and other types of system breakdowns.
  • 16. Reddit Oh, cool some people are looking at it!
  • 17. Data evolution ● Operation Data Additions ○ Timers on critical logic ○ Customer requests ● Business Data Additions ○ Customer emails on problems ● Customer Data Additions ○ Seamonkey Tank ○ Seamonkey Social interactions
  • 19. Architecture ● Load balancer - route traffic to application ● Application - still managing data, more nodes added ● Worker - handles work from the db ‘queue’ table ● Cache - used to taper database reads ● Database - data storage master ● Read Only Database - slave data storage
  • 20. More integrations ● Gmail - customer emails ● DataLoop - Timers and statsd data ● Open Tracing - distributed event tracing http://opentracing.io/
  • 21. More Troubleshooting ● If the application isn’t display the right data, is the cache invalidated properly ● Has the worker updated over the application as changes happen within the queued process ● Is replication working on from master to slave
  • 22. Hacker News What have I gotten myself into?
  • 23. Data evolution ● Operation Data / Business Data convergence ○ Customer requests ○ Customer emails to support cases ○ Customer usage to product roadmap ● Customer Data requirements stabilize
  • 25. Architecture ● Application - still managing data, more nodes added, application pushes writes to a queue for non-critical work ● Worker - handles work coming from queue vs db, and writes from application. Also invalidates cache now. ● Cache - used to taper database reads. App is getting more complex invalidation logic
  • 26. CAPs off to you! ● Consistency: same idea presented in ACID. All data storage nodes see the data. ● Availability: data is available ● Partition Tolerance: system continues to operate even under circumstances of data loss or system failure. A single node failure should not cause the entire system to collapse.
  • 27. Troubleshooting ● Oh boy, more systems more debugging “opportunities” ● If data isn’t updated, has the queue gotten the event from the application? Has the worker processed the change event and written to db? ● Is the queue up? Is the worker up?
  • 28. Product Hunt There’s too many people on this planet. We need another plague.
  • 29. Data evolution ● Operation Data ○ Hopes for attrition ● Business Data ○ Monitors customer attrition ○ Hopes for NO attrition ● Customer Data ○ Grows insane ○ Working out archive strategies
  • 31. Architecture ● Lifecyle service / database - added a service to migrate some of the monolith app, service just handles seamonkey growth and lifecyle ● Worker - still listens for events, writes to lifecycle service ● Stream - swapped out the queue with an immutable stream, better data recovery
  • 32. BASE ● Basically Available: system does guarantee the availability of the data as regards CAP Theorem; there will be a response to any request. Response could be a failure to find data or data could be in an inconsistent state. ● Soft state: state of the data could change over time, there may be changes going on due to ‘eventual consistency’ ● Eventual consistency: data will eventually become consistent once it stops receiving changes. The system will continue to receive changes and is not checking the consistency of every transaction before it moves onto the next one.
  • 33. Troubleshooting ● If seamonkeys aren’t progressing, debug new service, is it up? Database for service up? ● If event isn’t processed reset stream point to catch up, handle duplicate events on the worker vs stream. ● UI not finding events, check service up.
  • 35. Time and data As you see through the growth patterns, time and data start to have trade offs. With questions such as: ● How fast does the data update? ● How do we support a backup and restore? ● How do we ensure no data loss?
  • 36. Immutability and Time ● If point in time never changes, immutability is achieved ● Pointer vs point in time, current data version is a pointer to the latest point in time ● A timeline of data changes provides for restoration and easier debugging
  • 37. Distributed immutability ● Database transaction log is an immutable stream of changes ○ Used for replication, most database / datastores use this approach ● Immutable stream(Kafkta, Kinesis) provides an incoming change log, latest changes can be pointed to part of stream. Reverse db approach
  • 38. What’s the point of all this?
  • 39. Some plankton for thought ● If you have any idea where you'll end up, you’d have a better idea where to start ● Understanding reactions to growth will help with setting up services as you grow ● Misery loves company, knowing everyone has these pain points somehow makes you happier ● Know where you’ve been helps you now