SlideShare una empresa de Scribd logo
1 de 16
Descargar para leer sin conexión
Starting Your Modern
DataOps Journey
Lucas Stone
Solutions Engineer
01.12.2020
The DataOps Story
Pre
2000
2007
Waterfall
Linear approach
Good for projects where end state is well-defined
(e.g. physical infrastructure)
Not as good where product is continually
changing / developing (e.g. software)
Agile
Began with “Agile Manifesto”
Designed for production of software products
Response to how rapid business
requirements could change
Emphasised iteration
DevOps
Even given the rise of Agile, dev and
ops teams remained siloed
DevOps aimed to bring them together
Set of practices to release high quality
code and faster
DataOps
Bring the DevOps approach to data function
Align data science / management and
operations teams
Ensures the business can fully leverage
data and convert into actionable insight
2000 2014
Where is DataOps in the “Hype Cycle”?
DataOps Search Trends: Past 5 Years
DataOps
DATA PRODUCTION VALUE
DEVELOPMENT
Quality
Data + Operations
Ensure that data value is delivered to business as soon as possible
Intersection of the Value and Innovation Pipelines
Value
Pipeline
Innovation Pipeline
IDEA
Quality
DataOps Misconceptions
A technology
Although there are a set of technologies that
are commonly used to support its
implementation within an organisation
Restricted
Either to 1) “big data” – the scale of data and
complexity doesn’t preclude benefits 2) only
advanced data science applications (e.g. ML)
More than just
DevOps for Data!
It brings together 3 distinct elements:
Agile development, DevOps, and
Statistical Process Control (Lean)
A methodology
It brings together a number of principles
and practices around the way an
organization manages and processes data
DataOps is… DataOps is not…
DataOps – What are the benefits?
Faster
Deployment & Feedback
Consumers of data analytics will get what
they need faster and be able to feedback
more often, creating virtuous circle with
business requirements as the starting point
Happier Colleagues
Introduction of DataOps will mean those involved in
the process are more able to quickly see the positive
impact of their work leading to more engaged and
productive teams
Higher Data Quality
Increased automation (particularly of testing) and
standardised processes will lead to high Data
Quality, which will in turn lead to better insights
generated from Machine Learning models
Collaboration
DataOps promotes collaboration,
communication, and coordination between
teams that may otherwise remained siloed
Which companies use DataOps?
Please see Qubole’s Creating a Data-Driven Enterprise with DataOps ebook for further
information on how each of these organisations implements DataOps
Case Study: Facebook & Apache Hive
Stage 2
Created a Hadoop data
lake and developed Hive
to make it more
accessible – data team
evolved from a service
team to building self-
service platforms for data
extraction
Stage 4
Developed Uis that were
easy for business users to
understand and use to
independently extract data
- data becomes fully
democratised
Stage 1
No structure around data
requests, rather the
business would request on
an ad-hoc basis – the data
function would act as a
service
Stage 3
Combined metadata
services with Hive
allowing users to look at
data and metadata –
however, data still not
accessible to non-tech
users
What is required to begin implementing DataOps?
Processes
It is then important to establish clear processes
including who is “RACI”. Those responsible should
receive appropriate training. Measuring process
effectiveness with appropriate KPIs is also crucial
People & Culture
The foundation for introducing DataOps lies firstly
with buy-in from the key stakeholder groups
particularly the business so that business
requirements are understood
Technologies
Once the correct culture and processes have been
established, an organisation can introduce tooling to
support related activities, notably automation, testing,
and orchestration
2
1
3
People: Stakeholders Groups
Data Consumers
Those who will use data to perform
analysis and extract insights to then
deliver to those in the business who
can use these to drive value
Data Suppliers
Those managing the integrity of
Authoritative Data Sources to ensure
data quality and availability
Data Preparers
Those who build data pipelines
linking one source to another as well
as managing its transformation into a
usable format for Data Consumers
Business
Other parts of the organisation would
not use DataOps – rather they rely
on and benefit from better outputs in
terms of insights / BI / analytics and
convey Business Requirements
Data Ops builds two crucial bridges, firstly between the business and technology
functions, secondly within the data function itself
People: Ingraining a DataOps Culture
Push from the Top
Cultural changes must be endorsed by
senior management both within and
outside of the data function before being
pushed down to individual teams
Embrace the Process
Acknowledge that change won’t happen
over night and that improvements will be
incremental – allow a realistic timeframe for
the process of implementing data ops
Remove silos
Breaking down organisational barriers between
Data Suppliers, Preparers, Consumers, and the
Business will be crucial to the smooth flow of
data to those making decisions
Emphasise Data
Data should be front and centre of
strategic decision making for DataOps to
realise its full potential – this should be
embedded as a company value
Invest in Tools
Carefully selecting a complementary set of
technologies underpinning the implementation of
DataOps is essential as will be providing the
relevant training to upskill your teams
1
2
3
4
5
Processes: Building a “Data Supply Chain”
Data
Suppliers
Data
Preparers
Data
Consumers
The
Business
Source Owner, DBA, Infrastructure and Ops Personnel,
Application Admins + Developers
Data Engineers, Data Architects, Data Stewards,
Integration Architects + Developers, Data Modelers
Machine Learning Model Developers, Data Scientists
HR, Finance, Strategy, Operations etc.
Data Product Managers
Business Analysts
Data Security Teams
Data Privacy Officers
Technology: Agile, Collaboration,
Automation, Infrastructure as Code
Agile
Small but frequent deliveries
of new features
Constant feedback loop
between the business and tech
Version Control to decrease risk
and increase productivity
Job / issue tracking to ensure
even minor feedback captured
Collaboration
Automation
Continuous Integration,
Deployment, Delivery
Automate testing and speed up
getting code into production
Infrastructure as Code
Manage IT infrastructure
using code
Make changes to
existing infrastructure
much more easily
CloverDX & DataOps
Increased deployment frequency
Package, share, and reuse any
functionality you design
Automated testing
Incorporate data quality tests and build in error
handling to your data pipelines
Consistent metadata and version control
CloverDX is easy to integrate with most VC tools,
metadata can easily be tracked and visualized
Monitoring
The CloverDX server has a monitoring suite that
can be applied to individual jobs or whole
business processes
Collaboration across all stakeholders
CloverDX’s visual design allows technical and
non-technical users to “speak the same language”
Gartner identified 5 “key techniques” that will support with the delivery of DataOps –
Clover can support “Data Preparers” with each
Upcoming Webinar
Code Management with
Version Control in CloverDX
December 8th
11am EST / 4pm GMT / 5pm CET
Register
Q&A

Más contenido relacionado

La actualidad más candente

Webinar: Attaining Excellence in Big Data Integration
Webinar: Attaining Excellence in Big Data IntegrationWebinar: Attaining Excellence in Big Data Integration
Webinar: Attaining Excellence in Big Data Integration
SnapLogic
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
EMC
 

La actualidad más candente (20)

Understanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application QualityUnderstanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application Quality
 
NYC Data Amp - Microsoft Azure and Data Services Overview
NYC Data Amp - Microsoft Azure and Data Services OverviewNYC Data Amp - Microsoft Azure and Data Services Overview
NYC Data Amp - Microsoft Azure and Data Services Overview
 
Webinar: Attaining Excellence in Big Data Integration
Webinar: Attaining Excellence in Big Data IntegrationWebinar: Attaining Excellence in Big Data Integration
Webinar: Attaining Excellence in Big Data Integration
 
Solution Architecture US healthcare
Solution Architecture US healthcare Solution Architecture US healthcare
Solution Architecture US healthcare
 
Architecting for analytics
Architecting for analyticsArchitecting for analytics
Architecting for analytics
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 
[Infographic] Cloud Integration Drivers and Requirements in 2015
[Infographic] Cloud Integration Drivers and Requirements in 2015[Infographic] Cloud Integration Drivers and Requirements in 2015
[Infographic] Cloud Integration Drivers and Requirements in 2015
 
Cloud Digital Transformation
Cloud Digital TransformationCloud Digital Transformation
Cloud Digital Transformation
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
 
The Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud WorldThe Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud World
 
Your Data Nerd Friends Need You!
Your Data Nerd Friends Need You!Your Data Nerd Friends Need You!
Your Data Nerd Friends Need You!
 
Moving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in HealthcareMoving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in Healthcare
 
Data driven decision making through analytics and IoT
Data driven decision making through analytics and IoTData driven decision making through analytics and IoT
Data driven decision making through analytics and IoT
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019
 
WhereScape, the pioneer in data warehouse automation software
WhereScape, the pioneer in data warehouse automation software WhereScape, the pioneer in data warehouse automation software
WhereScape, the pioneer in data warehouse automation software
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
 
NetApp Clustered Data ONTAP with Oracle Databases
NetApp Clustered Data ONTAP with Oracle DatabasesNetApp Clustered Data ONTAP with Oracle Databases
NetApp Clustered Data ONTAP with Oracle Databases
 
Webinar: The Death of Traditional Data Integration
Webinar: The Death of Traditional Data IntegrationWebinar: The Death of Traditional Data Integration
Webinar: The Death of Traditional Data Integration
 
Pivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
Pivotal Digital Transformation Forum: Becoming a Data Driven EnterprisePivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
Pivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 

Similar a Starting Your Modern DataOps Journey

Migrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeMigrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie Mae
DataWorks Summit
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA
December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWADecember 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA
December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA
Carsten Roland
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
Jane Roberts
 
Sabre: Master Reference Data in the Large Enterprise
Sabre: Master Reference Data in the Large EnterpriseSabre: Master Reference Data in the Large Enterprise
Sabre: Master Reference Data in the Large Enterprise
Orchestra Networks
 

Similar a Starting Your Modern DataOps Journey (20)

Migrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeMigrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie Mae
 
Should You Invest In DataOps Services?
Should You Invest In DataOps Services?Should You Invest In DataOps Services?
Should You Invest In DataOps Services?
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
 
DataOps vs. DevOps_ A detailed comparison .pdf
DataOps vs. DevOps_ A detailed comparison .pdfDataOps vs. DevOps_ A detailed comparison .pdf
DataOps vs. DevOps_ A detailed comparison .pdf
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
DataOps , cbuswaw April '23
DataOps , cbuswaw April '23DataOps , cbuswaw April '23
DataOps , cbuswaw April '23
 
How Can You Implement DataOps In Your Existing Workflow?
How Can You Implement DataOps In Your Existing Workflow?How Can You Implement DataOps In Your Existing Workflow?
How Can You Implement DataOps In Your Existing Workflow?
 
DevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-OracleDevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-Oracle
 
Dev ops
Dev opsDev ops
Dev ops
 
CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA
December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWADecember 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA
December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA
 
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
 
Big data journey to the cloud maz chaudhri 5.30.18
Big data journey to the cloud   maz chaudhri 5.30.18Big data journey to the cloud   maz chaudhri 5.30.18
Big data journey to the cloud maz chaudhri 5.30.18
 
Creating a Successful DataOps Framework for Your Business.pdf
Creating a Successful DataOps Framework for Your Business.pdfCreating a Successful DataOps Framework for Your Business.pdf
Creating a Successful DataOps Framework for Your Business.pdf
 
What's New in Pentaho 7.0?
What's New in Pentaho 7.0?What's New in Pentaho 7.0?
What's New in Pentaho 7.0?
 
Everything you wanted to know about data ops
Everything you wanted to know about data opsEverything you wanted to know about data ops
Everything you wanted to know about data ops
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
 
Sabre: Master Reference Data in the Large Enterprise
Sabre: Master Reference Data in the Large EnterpriseSabre: Master Reference Data in the Large Enterprise
Sabre: Master Reference Data in the Large Enterprise
 

Más de CloverDX

Más de CloverDX (14)

Data architecture principles to accelerate your data strategy
Data architecture principles to accelerate your data strategyData architecture principles to accelerate your data strategy
Data architecture principles to accelerate your data strategy
 
Characteristics of modern data architecture that drive innovation
Characteristics of modern data architecture that drive innovationCharacteristics of modern data architecture that drive innovation
Characteristics of modern data architecture that drive innovation
 
How to build an automated customer data onboarding pipeline
How to build an automated customer data onboarding pipelineHow to build an automated customer data onboarding pipeline
How to build an automated customer data onboarding pipeline
 
Automating Data Pipelines: Moving away from Scripts and Excel
Automating Data Pipelines: Moving away from Scripts and ExcelAutomating Data Pipelines: Moving away from Scripts and Excel
Automating Data Pipelines: Moving away from Scripts and Excel
 
CloverDX 6.2 Release
CloverDX 6.2 ReleaseCloverDX 6.2 Release
CloverDX 6.2 Release
 
How to Effectively Migrate Data From Legacy Apps
How to Effectively Migrate Data From Legacy AppsHow to Effectively Migrate Data From Legacy Apps
How to Effectively Migrate Data From Legacy Apps
 
Deploying ETL to Cloud
Deploying ETL to CloudDeploying ETL to Cloud
Deploying ETL to Cloud
 
Moving Legacy Apps to Cloud: How to Avoid Risk
Moving Legacy Apps to Cloud: How to Avoid RiskMoving Legacy Apps to Cloud: How to Avoid Risk
Moving Legacy Apps to Cloud: How to Avoid Risk
 
CloverDX for IBM Infosphere MDM (for 11.4 and later)
CloverDX for IBM Infosphere MDM (for 11.4 and later)CloverDX for IBM Infosphere MDM (for 11.4 and later)
CloverDX for IBM Infosphere MDM (for 11.4 and later)
 
Modern management of data pipelines made easier
Modern management of data pipelines made easierModern management of data pipelines made easier
Modern management of data pipelines made easier
 
Removing Danger From Data
Removing Danger From DataRemoving Danger From Data
Removing Danger From Data
 
Data Anonymization For Better Software Testing
Data Anonymization For Better Software TestingData Anonymization For Better Software Testing
Data Anonymization For Better Software Testing
 
How to publish data and transformations over APIs with CloverDX Data Services
How to publish data and transformations over APIs with CloverDX Data ServicesHow to publish data and transformations over APIs with CloverDX Data Services
How to publish data and transformations over APIs with CloverDX Data Services
 
Moving "Something Simple" To The Cloud - What It Really Takes
Moving "Something Simple" To The Cloud - What It Really TakesMoving "Something Simple" To The Cloud - What It Really Takes
Moving "Something Simple" To The Cloud - What It Really Takes
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 

Starting Your Modern DataOps Journey

  • 1. Starting Your Modern DataOps Journey Lucas Stone Solutions Engineer 01.12.2020
  • 2. The DataOps Story Pre 2000 2007 Waterfall Linear approach Good for projects where end state is well-defined (e.g. physical infrastructure) Not as good where product is continually changing / developing (e.g. software) Agile Began with “Agile Manifesto” Designed for production of software products Response to how rapid business requirements could change Emphasised iteration DevOps Even given the rise of Agile, dev and ops teams remained siloed DevOps aimed to bring them together Set of practices to release high quality code and faster DataOps Bring the DevOps approach to data function Align data science / management and operations teams Ensures the business can fully leverage data and convert into actionable insight 2000 2014
  • 3. Where is DataOps in the “Hype Cycle”?
  • 4. DataOps Search Trends: Past 5 Years
  • 5. DataOps DATA PRODUCTION VALUE DEVELOPMENT Quality Data + Operations Ensure that data value is delivered to business as soon as possible Intersection of the Value and Innovation Pipelines Value Pipeline Innovation Pipeline IDEA Quality
  • 6. DataOps Misconceptions A technology Although there are a set of technologies that are commonly used to support its implementation within an organisation Restricted Either to 1) “big data” – the scale of data and complexity doesn’t preclude benefits 2) only advanced data science applications (e.g. ML) More than just DevOps for Data! It brings together 3 distinct elements: Agile development, DevOps, and Statistical Process Control (Lean) A methodology It brings together a number of principles and practices around the way an organization manages and processes data DataOps is… DataOps is not…
  • 7. DataOps – What are the benefits? Faster Deployment & Feedback Consumers of data analytics will get what they need faster and be able to feedback more often, creating virtuous circle with business requirements as the starting point Happier Colleagues Introduction of DataOps will mean those involved in the process are more able to quickly see the positive impact of their work leading to more engaged and productive teams Higher Data Quality Increased automation (particularly of testing) and standardised processes will lead to high Data Quality, which will in turn lead to better insights generated from Machine Learning models Collaboration DataOps promotes collaboration, communication, and coordination between teams that may otherwise remained siloed
  • 8. Which companies use DataOps? Please see Qubole’s Creating a Data-Driven Enterprise with DataOps ebook for further information on how each of these organisations implements DataOps
  • 9. Case Study: Facebook & Apache Hive Stage 2 Created a Hadoop data lake and developed Hive to make it more accessible – data team evolved from a service team to building self- service platforms for data extraction Stage 4 Developed Uis that were easy for business users to understand and use to independently extract data - data becomes fully democratised Stage 1 No structure around data requests, rather the business would request on an ad-hoc basis – the data function would act as a service Stage 3 Combined metadata services with Hive allowing users to look at data and metadata – however, data still not accessible to non-tech users
  • 10. What is required to begin implementing DataOps? Processes It is then important to establish clear processes including who is “RACI”. Those responsible should receive appropriate training. Measuring process effectiveness with appropriate KPIs is also crucial People & Culture The foundation for introducing DataOps lies firstly with buy-in from the key stakeholder groups particularly the business so that business requirements are understood Technologies Once the correct culture and processes have been established, an organisation can introduce tooling to support related activities, notably automation, testing, and orchestration 2 1 3
  • 11. People: Stakeholders Groups Data Consumers Those who will use data to perform analysis and extract insights to then deliver to those in the business who can use these to drive value Data Suppliers Those managing the integrity of Authoritative Data Sources to ensure data quality and availability Data Preparers Those who build data pipelines linking one source to another as well as managing its transformation into a usable format for Data Consumers Business Other parts of the organisation would not use DataOps – rather they rely on and benefit from better outputs in terms of insights / BI / analytics and convey Business Requirements Data Ops builds two crucial bridges, firstly between the business and technology functions, secondly within the data function itself
  • 12. People: Ingraining a DataOps Culture Push from the Top Cultural changes must be endorsed by senior management both within and outside of the data function before being pushed down to individual teams Embrace the Process Acknowledge that change won’t happen over night and that improvements will be incremental – allow a realistic timeframe for the process of implementing data ops Remove silos Breaking down organisational barriers between Data Suppliers, Preparers, Consumers, and the Business will be crucial to the smooth flow of data to those making decisions Emphasise Data Data should be front and centre of strategic decision making for DataOps to realise its full potential – this should be embedded as a company value Invest in Tools Carefully selecting a complementary set of technologies underpinning the implementation of DataOps is essential as will be providing the relevant training to upskill your teams 1 2 3 4 5
  • 13. Processes: Building a “Data Supply Chain” Data Suppliers Data Preparers Data Consumers The Business Source Owner, DBA, Infrastructure and Ops Personnel, Application Admins + Developers Data Engineers, Data Architects, Data Stewards, Integration Architects + Developers, Data Modelers Machine Learning Model Developers, Data Scientists HR, Finance, Strategy, Operations etc. Data Product Managers Business Analysts Data Security Teams Data Privacy Officers
  • 14. Technology: Agile, Collaboration, Automation, Infrastructure as Code Agile Small but frequent deliveries of new features Constant feedback loop between the business and tech Version Control to decrease risk and increase productivity Job / issue tracking to ensure even minor feedback captured Collaboration Automation Continuous Integration, Deployment, Delivery Automate testing and speed up getting code into production Infrastructure as Code Manage IT infrastructure using code Make changes to existing infrastructure much more easily
  • 15. CloverDX & DataOps Increased deployment frequency Package, share, and reuse any functionality you design Automated testing Incorporate data quality tests and build in error handling to your data pipelines Consistent metadata and version control CloverDX is easy to integrate with most VC tools, metadata can easily be tracked and visualized Monitoring The CloverDX server has a monitoring suite that can be applied to individual jobs or whole business processes Collaboration across all stakeholders CloverDX’s visual design allows technical and non-technical users to “speak the same language” Gartner identified 5 “key techniques” that will support with the delivery of DataOps – Clover can support “Data Preparers” with each
  • 16. Upcoming Webinar Code Management with Version Control in CloverDX December 8th 11am EST / 4pm GMT / 5pm CET Register Q&A