SlideShare una empresa de Scribd logo
1 de 46
@ODSC
THE
DATAOPS
MANIFESTO
Boston | April 30 - May 4, 2019
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Topics
Why DataOps Is Essential
Seven Steps to DataOps
DataOps
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Strategic Trend: DataOps
• Increased rate of market adoption of
DataOps principles by leaders of data and
analytic teams
• Gartner Hype Cycle in late 2018
• Increased Analysts Coverage
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
DevOps has resulted in a transformative
improvement in Software Development
• High-performing IT
organizations deploy 200
times more frequently
• They have 24 times faster
recovery times and three
times lower change
failure rates
• And they spend 22
percent less time on
unplanned work and
rework
Source: State of DevOps Report
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Lean has resulted in a transformative
improvement in manufacturing
• Lean manufacturing improves
efficiency, reduces waste, and
increases productivity.
• The benefits are manifold:
• Increased product quality
• Reduces rework
• Employee satisfaction
• Higher profits
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
DataOps – Transformative to Data Analytics
DataOps – Continuous
Delivery of Analytics
• Delivery insights faster
• Ensure high quality
• Add features at the speed of
business
• Automate, orchestrate complex
environment of people and
technology
Source: Gartner
“Organizations that adopt a DevOps- and DataOps-based approach are
more successful in implementing end-to-end, reliable, robust, scalable and
repeatable solutions.”
Sumit Pal, Gartner, November 2018
People,
Process,
Organization
Technical
Environment
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
From To
Change Fear Change Velocity
Manual Operations Automated Operations
Hope For Quality Integrated Quality
Hero Mentality Repeatable Processes
Tool Centric Code Centric
Vendor Lock-In Diverse Tools
How To Succeed?
A Mindset Change to DataOps…
…to power your highly agile data culture.
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Currently, Teams Have High Errors
DataKitchen/Eckerson Survey (May 2019)
Errors In Production:
• 2% Acceptable Number Of Errors
(None)
• 67% High Number Of Errors
• 31% Dangerous Number Of Errors
(Greater Than 11 Per Month)
None
3%
1 to 2
18%
3 to 5
29%
6 to 10
20%
11+
30%
ON AVERAGE, HOW MANY ERRORS (E.G.,
INCORRECT DATA, BROKEN REPORTS, LATE
DELIVERY, CUSTOMER COMPLAINTS) DO YOU
HAVE EACH MONTH?
Forthcoming DataKitchen / Eckerson Research Survey of Medium – Large Companies US And Abroad
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Currently, Teams Deploy Too Slowly
DataKitchen/Eckerson Survey (May 2019)
Minutes
9%
Hours
15%
Days
36%
Weeks
27%
Months
13%
ON AVERAGE, HOW LONG DOES IT TAKE TO MOVE A
NEW OR MODIFIED DATA ANALYTIC PIPELINE FROM
DEVELOPMENT TO PRODUCTION?
Pipeline (Model, etc.) Deployment:
• 9% Acceptable Deployment
Speed (minutes or less)
• 78% Slow Deployment Speed
• 13% Dangerously Slow
Deployment Speed (months or
longer)
Forthcoming DataKitchen / Eckerson Research Survey of Medium – Large Companies US And Abroad
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Currently, Teams Struggle to Develop
DataKitchen/Eckerson Survey (May 2019)
New Development Environment:
• 12% Acceptable Env. Creation
Speed (minutes or less)
• 50% Slow Env. Creation Speed
• 38% Dangerously Slow Env.
Creation Speed (weeks or
longer)
Forthcoming DataKitchen / Eckerson Research Survey of Medium – Large Companies US And Abroad
Days
Hours
Minutes
Months
Weeks
(blank)
NEW DEVELOPMENT ENVIRONMENT WITH THE
APPROPRIATE TEST DATA, SERVERS, AND TOOLS
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Figure 1: Only a small fraction of real-world ML systems is
composed of the ML code, as shown by the small black
box in the middle. The required surrounding infrastructure
is vast and complex.
Google
Advances in Neural Information Processing Systems 28 (NIPS 2015)
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Business Need
Prep Data
Feature Extraction
Build Model
Evaluate Model
Deploy Model
Monitor Model
Iterate, Test and Improve
Model building
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Topics
Why DataOps Is Essential
Seven Steps to DataOps
Next Steps With DataOps
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Seven Steps to DataOps
1. Orchestrate Two Journeys
2. Add Tests And Monitoring
3. Use a Version Control
System
4. Branch and Merge
5. Use Multiple Environments
6. Reuse & Containerize
7. Parameterize Your
Processing
People,
Process,
Organization
Technical
Environment
= 7 steps
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Journey 1: Orchestrate data to customer value
Analytic process are like manufacturing: materials (data) and
production outputs (refined data, charts, graphs, model)
Access:
Python Code
Transform:
SQL Code, ETL
Model:
R Code
Visualize:
Tableau
Workbook
Report:
Tableau Online
❶
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Journey 2: Speed deployment to production
Analytic processes are like software development: deliverables
continually move from development to production
Data
Engineers
Data
Scientists
Data
Analysts
Diverse Team
Diverse Tools
Diverse Customers
Business
Customer
Products &
Systems
❶
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Innovation and Value Pipeline Together
Focus on both orchestration and deployment while automating &
monitoring quality
Don’t want break production
when I deploy my changes
Don’t want to learn about data quality issues from my customers
❶
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Add Automated Monitoring And Tests
Monitoring: To ensure that
during in the Value Pipeline, the
data quality remains high.
Tests: Before promoting work,
running new and old tests gives
high confidence that the change did
not break anything in the
Innovation Pipeline
❷
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Automate Monitoring & Tests In Production
Test Every Step And Every Tool in Your Value Pipeline
Are your outputs
consistent?
And Save Test Results!
Are data inputs
free from
issues?
Is your business logic
still correct?
Access:
Python Code
Transform:
SQL Code, ETL
Model:
R Code
Visualize:
Tableau
Workbook
Report:
Tableau Online
❷
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Support Multiple Types Of Tests
Testing Data Is Not Just Pass/Fail in Your Value Pipeline
Support Test Types
• Error – stop the line
• Warning – investigate later
• Info – list of changes
Keep Test History
• Statistical Process Control
❷
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Types of Tests
❷
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Example Tests
Simple
❷
Example Test
More Complex
Make sure all
table counts are
the same in the
production and
development
environment
❷
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Example Test
Location Balance❷
Access:
Python Code
Transform:
SQL Code, ETL
Model:
R Code
Visualize:
Tableau
Workbook
Report:
Tableau Online
source 1 million
rows
database
1 million
rows
1 million
facts
300K
dimension
report
1 million
facts
300K
dimensions
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Example Test Historical Balance
❷
SKU Product Product Group Volume
SKU1 P1
G1
100
SKU2 P1 50
SKU3 P2 75
SKU4 P3
G2
125
SKU5 P4 200
SKU6 P5 25
575
Production
Data, Pipeline
& Environment
Pre-Production
Data, Pipeline
& Environment
SKU Product Product Group Volume
SKU1 P1
G1
101
SKU2 P1 55
SKU3 P2 76
SKU4 P3 126
SKU5 P4
G2
200
SKU6 P5 29
587
Access:
Python
Code
Transfor
m: SQL
Code, ETL
Model:
R Code
Visualiz
e:
Tableau
Workbook
Report
: Tableau
Online
Access:
Python
Code
Transfor
m: SQL
Code, ETL
Model:
R Code
Visualiz
e:
Tableau
Workbook
Report
: Tableau
Online
Histbal
G1 225
G2 350
Histbal
G1 358
G2 229
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Automated ‘Tests’ Serve
a Dual Purpose:
1. Data Tests and
Monitoring in
Production
2. Regression, Functional
and Performance Tests
in Development
Data Fixed Data Variable
Code Fixed Value Pipeline
Code Variable
Innovation
Pipeline
Quality Your
Customer
Receives
= f (data, code)
https://medium.com/data-ops/disband-your-impact-review-board-automate-analytics-testing-42093d09fe11
Duality of Tests❷
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
For the Innovation Pipeline
Tests Are For Also Code: Keep Data Fixed
Deploy Feature
Run all tests here before
promoting
❷
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Use a Version Control System
At The End Of The Day, Analytic Work Is All Just Code
Access:
Python Code
Transform:
SQL Code,
ETL Code
Model:
R Code
Visualize:
Tableau
Workbook XML
Report:
Tableau Online
Source Code
Control
❸
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Branch & Merge
Source Code
Control
Branching & Merging enables people to safely work on their own tasks
❹
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Example Branch And Merge Pattern
Sprint 1 Sprint 2
f1 f2
f3
main / master / trunk
f5
❹
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Access:
Python Code
Transform:
SQL Code,
ETL Code
Model:
R Code
Visualize:
Tableau
Workbook XML
Report:
Tableau Online
Use Multiple Environments
Analytic Environment
Your Analytic Work Requires Coordinating Tools And Hardware
❺
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Use Multiple Environments
Provide an Analytic Environment for each branch
• Analysts need a controlled environment for their experiments
• Engineers need a place to develop outside of production
• Update Production only after all tests are run!
❺
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Environments Are Complex
Analytic Environment
❺
Data Engineers,
Scientists or Analytics Team’s Analytic Tools
R
(model)
Alteryx
(business ETL)
Redshift
(data)
SQL
(ETL)
Hardware & Network
Configurations
Right Hardware and
Software Versions
Tableau
(workbook)
Python
Test Data Sets
Code Branch
Test Result
History
Analytic Environment/
Development Sandbox
Creation is Complex:
Hard to create the
right set of data,
tools, people,
history and
configuration for a
fast build test debug
cycle
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Reuse & Containerize
Containerize
1. Manage the environment for each
component (e.g. Docker, AMI)
2. Practice Environment Version Control
Reuse
1. Do not create one ‘monolith’ of code
2. Reuse the code and results
❻
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Parameterize Your Processing
Think Of Your Value Pipeline Like A Big Function
• Named sets of parameters will
increase your velocity
• With parameters, you can vary
• Inputs
• Outputs
• Steps in the workflow
• You can make a time machine
• Secure storage for credentials
❼
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
The Seven Steps In Action
1. Select story
2. Create branch
3. Create environment
4. Implement feature
5. Write new tests
6. Run new and existing tests
7. Check in to branch
8. Merge to parent
9. Delete environment
When sprint ends
• Deliver all completed features to
customer
• Merge sprint branch to master
• Roll un-merged features into the
next sprint
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
The 7 Steps and Data Science
Journeys Tests Version Control Branch and Merge Environments Reuse / Containerize Parameterize
Business Need Agile
Prep Data x x x x x x x
Feature Extraction x x x x x x x
Build Model x x x x x x x
Evaluate Model x
Deploy Model x x x x x x x
Monitor Model x
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Topics
Why DataOps Is Essential
Seven Steps to DataOps
Next Steps With DataOps
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Where to Start With DataOps?
Look to manufacturing/DevOps
‘Theory of Constraints’
• Where are ‘bottlenecks’ (or
constraints in your data science
or analytic process?
• What impedes from creating new
insight for you customers?
• Iterate & improve
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Example Bottlenecks to Start
• “I don’t want to learn about
data quality issues from my
customers”
• “I don’t want break production
when I deploy my changes”
• “I don’t like the Hatfields vs
Mccoys war between data
science and analytic teams”
None
3%
1 to 2
18%
3 to 5
29%
6 to 10
20%
11+
30%
ON AVERAGE, HOW MANY ERRORS (E.G.,
INCORRECT DATA, BROKEN REPORTS,
LATE DELIVERY, CUSTOMER
COMPLAINTS) DO YOU HAVE EACH
MONTH?
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Example Bottlenecks to Start
• “I don’t want to learn about
data quality issues from my
customers”
• “I don’t want break
production when I deploy my
changes”
• “I don’t like the Hatfields vs
Mccoys war between data
science and analytic teams”
Minutes
9%
Hours
15%
Days
36%
Weeks
27%
Months
13%
ON AVERAGE, HOW LONG DOES IT TAKE TO MOVE A
NEW OR MODIFIED DATA ANALYTIC PIPELINE FROM
DEVELOPMENT TO PRODUCTION?
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Example: Which Constraint?
• “I don’t want to learn about
data quality issues from my
customers”
• “I don’t want break production
when I deploy my changes”
• “I don’t like the Hatfields vs
Mccoys war between data
science and analytic teams”
Errors : Constraint
Deployment : Constraint
Team Coordination : Constraint
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Example: Which Constraint?
• “I don’t want to learn about
data quality issues from my
customers”
• “I don’t want break production
when I deploy my changes”
• “I don’t like the Hatfields vs
Mccoys war between data
science and analytic teams”
Errors, Deployment, and Team
Coordination Are Bottlenecks or
Constraints That Inhibit
GOAL: Flow of Innovation
“How do measure team progress
and show results to leadership?”
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
DataKitchen Software Platform
Our cloud platform orchestrates data to customer value,
speeds features to production, and automates quality.
Kitchens
Recipes & Tests
Orders
Ingredients
1. Orchestrate Two Journeys
2. Add Tests And Monitoring
3. Use a Version Control System
4. Branch and Merge
5. Use Multiple Environments
6. Reuse & Containerize
7. Parameterize Your Processing
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
For More Information
• For These Slides, Contact Me:
• cbergh@datakitchen.io
• DataOps Manifesto:
• http://dataopsmanifesto.org
• DataOps Blog:
• http://medium.com/data-ops
• Follow Twitter:
• #DataOps

Más contenido relacionado

La actualidad más candente

Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019DataKitchen
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Adrien Blind
 
Accelerate and modernize your data pipelines
Accelerate and modernize your data pipelinesAccelerate and modernize your data pipelines
Accelerate and modernize your data pipelinesPaul Van Siclen
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for DinnerKent Graziano
 
Modern Data Architecture
Modern Data Architecture Modern Data Architecture
Modern Data Architecture Mark Hewitt
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureDATAVERSITY
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...HostedbyConfluent
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
Understanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application QualityUnderstanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application QualityDevOps.com
 
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...HostedbyConfluent
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 

La actualidad más candente (20)

Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
 
Accelerate and modernize your data pipelines
Accelerate and modernize your data pipelinesAccelerate and modernize your data pipelines
Accelerate and modernize your data pipelines
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Modern Data Architecture
Modern Data Architecture Modern Data Architecture
Modern Data Architecture
 
Data ops in practice
Data ops in practiceData ops in practice
Data ops in practice
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Understanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application QualityUnderstanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application Quality
 
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 

Similar a Provisioning- Configuration- OrchestrationData:- Raw- Staging- Training- Testing- ProductionTools:- ETL- Databases- Notebooks- Modeling- VisualizationInfrastructure:- Compute- Storage- NetworkingMonitoring:- Logs- Metrics- AlertsSecurity:- Authentication- Authorization- EncryptionPeople, Process, Organization Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.Reuse & ContainerizeAnalytic Environment❻ Copyright © 2019 by Data

seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019DataKitchen
 
Fri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataopsFri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataopsDataKitchen
 
ODSC data science to DataOps
ODSC data science to DataOpsODSC data science to DataOps
ODSC data science to DataOpsChristopher Bergh
 
Your Data Nerd Friends Need You!
Your Data Nerd Friends Need You!Your Data Nerd Friends Need You!
Your Data Nerd Friends Need You! DataKitchen
 
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...Elemica
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Bitrock manufacturing
Bitrock manufacturing Bitrock manufacturing
Bitrock manufacturing cosma_r
 
MongoDB World 2019: Data Digital Decoupling
MongoDB World 2019: Data Digital DecouplingMongoDB World 2019: Data Digital Decoupling
MongoDB World 2019: Data Digital DecouplingMongoDB
 
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...DataKitchen
 
Data Science Case Studies: The Internet of Things: Implications for the Enter...
Data Science Case Studies: The Internet of Things: Implications for the Enter...Data Science Case Studies: The Internet of Things: Implications for the Enter...
Data Science Case Studies: The Internet of Things: Implications for the Enter...VMware Tanzu
 
Big Data Developer Career Path: Job & Interview Preparation
Big Data Developer Career Path: Job & Interview PreparationBig Data Developer Career Path: Job & Interview Preparation
Big Data Developer Career Path: Job & Interview PreparationIntellipaat
 
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?SnapLogic
 
Revolutionising Testing with the Power of AI - Deepa Mamtani, Pillay Almira &...
Revolutionising Testing with the Power of AI - Deepa Mamtani, Pillay Almira &...Revolutionising Testing with the Power of AI - Deepa Mamtani, Pillay Almira &...
Revolutionising Testing with the Power of AI - Deepa Mamtani, Pillay Almira &...Sogeti Nederland B.V.
 
ENT206 Product Development in the Cloud
ENT206 Product Development in the CloudENT206 Product Development in the Cloud
ENT206 Product Development in the CloudAmazon Web Services
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023RTTS
 
Agile-plus-DevOps Testing for Packaged Applications
Agile-plus-DevOps Testing for Packaged ApplicationsAgile-plus-DevOps Testing for Packaged Applications
Agile-plus-DevOps Testing for Packaged ApplicationsWorksoft
 

Similar a Provisioning- Configuration- OrchestrationData:- Raw- Staging- Training- Testing- ProductionTools:- ETL- Databases- Notebooks- Modeling- VisualizationInfrastructure:- Compute- Storage- NetworkingMonitoring:- Logs- Metrics- AlertsSecurity:- Authentication- Authorization- EncryptionPeople, Process, Organization Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.Reuse & ContainerizeAnalytic Environment❻ Copyright © 2019 by Data (20)

seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019
 
Fri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataopsFri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataops
 
ODSC data science to DataOps
ODSC data science to DataOpsODSC data science to DataOps
ODSC data science to DataOps
 
Your Data Nerd Friends Need You!
Your Data Nerd Friends Need You!Your Data Nerd Friends Need You!
Your Data Nerd Friends Need You!
 
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Bitrock manufacturing
Bitrock manufacturing Bitrock manufacturing
Bitrock manufacturing
 
MongoDB World 2019: Data Digital Decoupling
MongoDB World 2019: Data Digital DecouplingMongoDB World 2019: Data Digital Decoupling
MongoDB World 2019: Data Digital Decoupling
 
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
 
Data Science Case Studies: The Internet of Things: Implications for the Enter...
Data Science Case Studies: The Internet of Things: Implications for the Enter...Data Science Case Studies: The Internet of Things: Implications for the Enter...
Data Science Case Studies: The Internet of Things: Implications for the Enter...
 
DevOps for Enterprise Systems - Rosalind Radcliffe
DevOps for Enterprise Systems - Rosalind RadcliffeDevOps for Enterprise Systems - Rosalind Radcliffe
DevOps for Enterprise Systems - Rosalind Radcliffe
 
Cisco & Open Source
Cisco & Open SourceCisco & Open Source
Cisco & Open Source
 
Big Data Developer Career Path: Job & Interview Preparation
Big Data Developer Career Path: Job & Interview PreparationBig Data Developer Career Path: Job & Interview Preparation
Big Data Developer Career Path: Job & Interview Preparation
 
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
 
Revolutionising Testing with the Power of AI - Deepa Mamtani, Pillay Almira &...
Revolutionising Testing with the Power of AI - Deepa Mamtani, Pillay Almira &...Revolutionising Testing with the Power of AI - Deepa Mamtani, Pillay Almira &...
Revolutionising Testing with the Power of AI - Deepa Mamtani, Pillay Almira &...
 
ENT206 Product Development in the Cloud
ENT206 Product Development in the CloudENT206 Product Development in the Cloud
ENT206 Product Development in the Cloud
 
Enabling Agility Through DevOps
Enabling Agility Through DevOpsEnabling Agility Through DevOps
Enabling Agility Through DevOps
 
Adopting DevOps for 2-Speed IT
Adopting DevOps for 2-Speed ITAdopting DevOps for 2-Speed IT
Adopting DevOps for 2-Speed IT
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
 
Agile-plus-DevOps Testing for Packaged Applications
Agile-plus-DevOps Testing for Packaged ApplicationsAgile-plus-DevOps Testing for Packaged Applications
Agile-plus-DevOps Testing for Packaged Applications
 

Más de DataKitchen

Data kitchen 7 agile steps - big data fest 9-18-2015
Data kitchen   7 agile steps - big data fest 9-18-2015Data kitchen   7 agile steps - big data fest 9-18-2015
Data kitchen 7 agile steps - big data fest 9-18-2015DataKitchen
 
Open Data Science Conference Agile Data
Open Data Science Conference Agile DataOpen Data Science Conference Agile Data
Open Data Science Conference Agile DataDataKitchen
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...DataKitchen
 
Do Agile Data in Just 5 Shocking Steps!
Do Agile Data in Just 5 Shocking Steps!Do Agile Data in Just 5 Shocking Steps!
Do Agile Data in Just 5 Shocking Steps!DataKitchen
 
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & RedshiftIntroduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & RedshiftDataKitchen
 
Redshift Introduction
Redshift IntroductionRedshift Introduction
Redshift IntroductionDataKitchen
 

Más de DataKitchen (7)

Data kitchen 7 agile steps - big data fest 9-18-2015
Data kitchen   7 agile steps - big data fest 9-18-2015Data kitchen   7 agile steps - big data fest 9-18-2015
Data kitchen 7 agile steps - big data fest 9-18-2015
 
Open Data Science Conference Agile Data
Open Data Science Conference Agile DataOpen Data Science Conference Agile Data
Open Data Science Conference Agile Data
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
 
Do Agile Data in Just 5 Shocking Steps!
Do Agile Data in Just 5 Shocking Steps!Do Agile Data in Just 5 Shocking Steps!
Do Agile Data in Just 5 Shocking Steps!
 
Amazon EMR
Amazon EMRAmazon EMR
Amazon EMR
 
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & RedshiftIntroduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
 
Redshift Introduction
Redshift IntroductionRedshift Introduction
Redshift Introduction
 

Último

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Provisioning- Configuration- OrchestrationData:- Raw- Staging- Training- Testing- ProductionTools:- ETL- Databases- Notebooks- Modeling- VisualizationInfrastructure:- Compute- Storage- NetworkingMonitoring:- Logs- Metrics- AlertsSecurity:- Authentication- Authorization- EncryptionPeople, Process, Organization Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.Reuse & ContainerizeAnalytic Environment❻ Copyright © 2019 by Data

  • 2. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
  • 3. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Topics Why DataOps Is Essential Seven Steps to DataOps DataOps
  • 4. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Strategic Trend: DataOps • Increased rate of market adoption of DataOps principles by leaders of data and analytic teams • Gartner Hype Cycle in late 2018 • Increased Analysts Coverage
  • 5. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. DevOps has resulted in a transformative improvement in Software Development • High-performing IT organizations deploy 200 times more frequently • They have 24 times faster recovery times and three times lower change failure rates • And they spend 22 percent less time on unplanned work and rework Source: State of DevOps Report
  • 6. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Lean has resulted in a transformative improvement in manufacturing • Lean manufacturing improves efficiency, reduces waste, and increases productivity. • The benefits are manifold: • Increased product quality • Reduces rework • Employee satisfaction • Higher profits
  • 7. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. DataOps – Transformative to Data Analytics DataOps – Continuous Delivery of Analytics • Delivery insights faster • Ensure high quality • Add features at the speed of business • Automate, orchestrate complex environment of people and technology Source: Gartner “Organizations that adopt a DevOps- and DataOps-based approach are more successful in implementing end-to-end, reliable, robust, scalable and repeatable solutions.” Sumit Pal, Gartner, November 2018 People, Process, Organization Technical Environment
  • 8. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. From To Change Fear Change Velocity Manual Operations Automated Operations Hope For Quality Integrated Quality Hero Mentality Repeatable Processes Tool Centric Code Centric Vendor Lock-In Diverse Tools How To Succeed? A Mindset Change to DataOps… …to power your highly agile data culture.
  • 9. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Currently, Teams Have High Errors DataKitchen/Eckerson Survey (May 2019) Errors In Production: • 2% Acceptable Number Of Errors (None) • 67% High Number Of Errors • 31% Dangerous Number Of Errors (Greater Than 11 Per Month) None 3% 1 to 2 18% 3 to 5 29% 6 to 10 20% 11+ 30% ON AVERAGE, HOW MANY ERRORS (E.G., INCORRECT DATA, BROKEN REPORTS, LATE DELIVERY, CUSTOMER COMPLAINTS) DO YOU HAVE EACH MONTH? Forthcoming DataKitchen / Eckerson Research Survey of Medium – Large Companies US And Abroad
  • 10. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Currently, Teams Deploy Too Slowly DataKitchen/Eckerson Survey (May 2019) Minutes 9% Hours 15% Days 36% Weeks 27% Months 13% ON AVERAGE, HOW LONG DOES IT TAKE TO MOVE A NEW OR MODIFIED DATA ANALYTIC PIPELINE FROM DEVELOPMENT TO PRODUCTION? Pipeline (Model, etc.) Deployment: • 9% Acceptable Deployment Speed (minutes or less) • 78% Slow Deployment Speed • 13% Dangerously Slow Deployment Speed (months or longer) Forthcoming DataKitchen / Eckerson Research Survey of Medium – Large Companies US And Abroad
  • 11. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Currently, Teams Struggle to Develop DataKitchen/Eckerson Survey (May 2019) New Development Environment: • 12% Acceptable Env. Creation Speed (minutes or less) • 50% Slow Env. Creation Speed • 38% Dangerously Slow Env. Creation Speed (weeks or longer) Forthcoming DataKitchen / Eckerson Research Survey of Medium – Large Companies US And Abroad Days Hours Minutes Months Weeks (blank) NEW DEVELOPMENT ENVIRONMENT WITH THE APPROPRIATE TEST DATA, SERVERS, AND TOOLS
  • 12. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Figure 1: Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in the middle. The required surrounding infrastructure is vast and complex. Google Advances in Neural Information Processing Systems 28 (NIPS 2015)
  • 13. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Business Need Prep Data Feature Extraction Build Model Evaluate Model Deploy Model Monitor Model Iterate, Test and Improve Model building
  • 14. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Topics Why DataOps Is Essential Seven Steps to DataOps Next Steps With DataOps
  • 15. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Seven Steps to DataOps 1. Orchestrate Two Journeys 2. Add Tests And Monitoring 3. Use a Version Control System 4. Branch and Merge 5. Use Multiple Environments 6. Reuse & Containerize 7. Parameterize Your Processing People, Process, Organization Technical Environment = 7 steps
  • 16. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Journey 1: Orchestrate data to customer value Analytic process are like manufacturing: materials (data) and production outputs (refined data, charts, graphs, model) Access: Python Code Transform: SQL Code, ETL Model: R Code Visualize: Tableau Workbook Report: Tableau Online ❶
  • 17. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Journey 2: Speed deployment to production Analytic processes are like software development: deliverables continually move from development to production Data Engineers Data Scientists Data Analysts Diverse Team Diverse Tools Diverse Customers Business Customer Products & Systems ❶
  • 18. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Innovation and Value Pipeline Together Focus on both orchestration and deployment while automating & monitoring quality Don’t want break production when I deploy my changes Don’t want to learn about data quality issues from my customers ❶
  • 19. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Add Automated Monitoring And Tests Monitoring: To ensure that during in the Value Pipeline, the data quality remains high. Tests: Before promoting work, running new and old tests gives high confidence that the change did not break anything in the Innovation Pipeline ❷
  • 20. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Automate Monitoring & Tests In Production Test Every Step And Every Tool in Your Value Pipeline Are your outputs consistent? And Save Test Results! Are data inputs free from issues? Is your business logic still correct? Access: Python Code Transform: SQL Code, ETL Model: R Code Visualize: Tableau Workbook Report: Tableau Online ❷
  • 21. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Support Multiple Types Of Tests Testing Data Is Not Just Pass/Fail in Your Value Pipeline Support Test Types • Error – stop the line • Warning – investigate later • Info – list of changes Keep Test History • Statistical Process Control ❷
  • 22. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Types of Tests ❷
  • 23. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Example Tests Simple ❷
  • 24. Example Test More Complex Make sure all table counts are the same in the production and development environment ❷
  • 25. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Example Test Location Balance❷ Access: Python Code Transform: SQL Code, ETL Model: R Code Visualize: Tableau Workbook Report: Tableau Online source 1 million rows database 1 million rows 1 million facts 300K dimension report 1 million facts 300K dimensions
  • 26. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Example Test Historical Balance ❷ SKU Product Product Group Volume SKU1 P1 G1 100 SKU2 P1 50 SKU3 P2 75 SKU4 P3 G2 125 SKU5 P4 200 SKU6 P5 25 575 Production Data, Pipeline & Environment Pre-Production Data, Pipeline & Environment SKU Product Product Group Volume SKU1 P1 G1 101 SKU2 P1 55 SKU3 P2 76 SKU4 P3 126 SKU5 P4 G2 200 SKU6 P5 29 587 Access: Python Code Transfor m: SQL Code, ETL Model: R Code Visualiz e: Tableau Workbook Report : Tableau Online Access: Python Code Transfor m: SQL Code, ETL Model: R Code Visualiz e: Tableau Workbook Report : Tableau Online Histbal G1 225 G2 350 Histbal G1 358 G2 229
  • 27. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Automated ‘Tests’ Serve a Dual Purpose: 1. Data Tests and Monitoring in Production 2. Regression, Functional and Performance Tests in Development Data Fixed Data Variable Code Fixed Value Pipeline Code Variable Innovation Pipeline Quality Your Customer Receives = f (data, code) https://medium.com/data-ops/disband-your-impact-review-board-automate-analytics-testing-42093d09fe11 Duality of Tests❷
  • 28. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. For the Innovation Pipeline Tests Are For Also Code: Keep Data Fixed Deploy Feature Run all tests here before promoting ❷
  • 29. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Use a Version Control System At The End Of The Day, Analytic Work Is All Just Code Access: Python Code Transform: SQL Code, ETL Code Model: R Code Visualize: Tableau Workbook XML Report: Tableau Online Source Code Control ❸
  • 30. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Branch & Merge Source Code Control Branching & Merging enables people to safely work on their own tasks ❹
  • 31. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Example Branch And Merge Pattern Sprint 1 Sprint 2 f1 f2 f3 main / master / trunk f5 ❹
  • 32. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Access: Python Code Transform: SQL Code, ETL Code Model: R Code Visualize: Tableau Workbook XML Report: Tableau Online Use Multiple Environments Analytic Environment Your Analytic Work Requires Coordinating Tools And Hardware ❺
  • 33. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Use Multiple Environments Provide an Analytic Environment for each branch • Analysts need a controlled environment for their experiments • Engineers need a place to develop outside of production • Update Production only after all tests are run! ❺
  • 34. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Environments Are Complex Analytic Environment ❺ Data Engineers, Scientists or Analytics Team’s Analytic Tools R (model) Alteryx (business ETL) Redshift (data) SQL (ETL) Hardware & Network Configurations Right Hardware and Software Versions Tableau (workbook) Python Test Data Sets Code Branch Test Result History Analytic Environment/ Development Sandbox Creation is Complex: Hard to create the right set of data, tools, people, history and configuration for a fast build test debug cycle
  • 35. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Reuse & Containerize Containerize 1. Manage the environment for each component (e.g. Docker, AMI) 2. Practice Environment Version Control Reuse 1. Do not create one ‘monolith’ of code 2. Reuse the code and results ❻
  • 36. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Parameterize Your Processing Think Of Your Value Pipeline Like A Big Function • Named sets of parameters will increase your velocity • With parameters, you can vary • Inputs • Outputs • Steps in the workflow • You can make a time machine • Secure storage for credentials ❼
  • 37. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. The Seven Steps In Action 1. Select story 2. Create branch 3. Create environment 4. Implement feature 5. Write new tests 6. Run new and existing tests 7. Check in to branch 8. Merge to parent 9. Delete environment When sprint ends • Deliver all completed features to customer • Merge sprint branch to master • Roll un-merged features into the next sprint
  • 38. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. The 7 Steps and Data Science Journeys Tests Version Control Branch and Merge Environments Reuse / Containerize Parameterize Business Need Agile Prep Data x x x x x x x Feature Extraction x x x x x x x Build Model x x x x x x x Evaluate Model x Deploy Model x x x x x x x Monitor Model x
  • 39. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Topics Why DataOps Is Essential Seven Steps to DataOps Next Steps With DataOps
  • 40. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Where to Start With DataOps? Look to manufacturing/DevOps ‘Theory of Constraints’ • Where are ‘bottlenecks’ (or constraints in your data science or analytic process? • What impedes from creating new insight for you customers? • Iterate & improve
  • 41. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Example Bottlenecks to Start • “I don’t want to learn about data quality issues from my customers” • “I don’t want break production when I deploy my changes” • “I don’t like the Hatfields vs Mccoys war between data science and analytic teams” None 3% 1 to 2 18% 3 to 5 29% 6 to 10 20% 11+ 30% ON AVERAGE, HOW MANY ERRORS (E.G., INCORRECT DATA, BROKEN REPORTS, LATE DELIVERY, CUSTOMER COMPLAINTS) DO YOU HAVE EACH MONTH?
  • 42. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Example Bottlenecks to Start • “I don’t want to learn about data quality issues from my customers” • “I don’t want break production when I deploy my changes” • “I don’t like the Hatfields vs Mccoys war between data science and analytic teams” Minutes 9% Hours 15% Days 36% Weeks 27% Months 13% ON AVERAGE, HOW LONG DOES IT TAKE TO MOVE A NEW OR MODIFIED DATA ANALYTIC PIPELINE FROM DEVELOPMENT TO PRODUCTION?
  • 43. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Example: Which Constraint? • “I don’t want to learn about data quality issues from my customers” • “I don’t want break production when I deploy my changes” • “I don’t like the Hatfields vs Mccoys war between data science and analytic teams” Errors : Constraint Deployment : Constraint Team Coordination : Constraint
  • 44. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Example: Which Constraint? • “I don’t want to learn about data quality issues from my customers” • “I don’t want break production when I deploy my changes” • “I don’t like the Hatfields vs Mccoys war between data science and analytic teams” Errors, Deployment, and Team Coordination Are Bottlenecks or Constraints That Inhibit GOAL: Flow of Innovation “How do measure team progress and show results to leadership?”
  • 45. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. DataKitchen Software Platform Our cloud platform orchestrates data to customer value, speeds features to production, and automates quality. Kitchens Recipes & Tests Orders Ingredients 1. Orchestrate Two Journeys 2. Add Tests And Monitoring 3. Use a Version Control System 4. Branch and Merge 5. Use Multiple Environments 6. Reuse & Containerize 7. Parameterize Your Processing
  • 46. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. For More Information • For These Slides, Contact Me: • cbergh@datakitchen.io • DataOps Manifesto: • http://dataopsmanifesto.org • DataOps Blog: • http://medium.com/data-ops • Follow Twitter: • #DataOps