SlideShare una empresa de Scribd logo
1 de 28
DataDevOps: A Manifesto for a
DevOps-like Culture Shift
in Data & Analytics
Dr. Arif Wider & Sebastian Herold
Munich, Feb 7th, 2018
Seite 2
Dr. Arif Wider
- Senior Consultant/Dev
- Scala/FP enthusiast
- ThoughtWorks Germany
data strategy group
@arifwider
Sebastian Herold
- Chief Data Architect
@Scout @Scout24
until Dec
- BigData Architect
@Zalando from Jan
- Data Evangelist
@heroldamus
Seite 3
Road to MicroService Architecture – How we started in 2007
BI Tool
Middle
Tier
DWH
Staging
Core DB
CRM
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
2007
Web
Tier
Analyst
BI Dev
Seite 4
Road to MicroService Architecture – How things got complicated in 2011
BI Tool
Middle
Tier
DWH
Staging
Core DB
CRM
Web
2011
API
APP
$$$
APPMySQL
Analyst
BI Dev
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
APPMySQL
APPMySQL
APPMySQL
Seite 5
Road to MicroService Architecture – How we sliced the monolith in 2013
BI Tool
DWH
StagingCRM
Web
2013
API
APPMySQL
Core DB
EXP
Mongo
SEA
Elastic
Sync APP
APIAPI
API
HADOOP
REST API
Analyst
BI Dev
DE
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
AWS
APP
APP
APP
APPMySQL
APPMySQL
APPMySQL
Seite 6
Road to MicroService Architecture – How a central data team doesn’t scale
BI Tool
DWH
StagingCRM
Web
2015
API
APPMySQL
Core DB
EXP
Mongo
SEA
Elastic
Sync APP
APIAPIAPI
HADOOP
REST API
APPAPP
Analyst
BI Dev
DE
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
Core DB APPAPPAPPAPPAPPAPPAPPAPPAPP
AWS
Seite 7
Road to MicroService Architecture – How we rearchitectured our Data Landscape
BI Tool
DWH
Central Data Lake on S3
CRM
2017
Core DB APP
REST API
Analyst
DE
BI Dev
APPAPPAPP
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
Seite 8
Scout24 wants to become a truly data-driven company
Fast & easy data-driven
product development…
…supported by
Data & Analytics
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
Seite 9
Scout24 wants to become a truly data-driven company
Everywhere in the company... ...without bloating up D‘n‘A
Image source: https://www.oddsemiconductorservices.com/
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
Seite 10
SCOUT24
DATA LANDSCAPE
MANIFESTO
ROLES, RESPONSIBILITIES, AND VALUES
FOR A DATA-DRIVEN COMPANY AT SCALE
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
Seite 11
SCOUT24 DATA LANDSCAPE MANIFESTO
#1 Preamble
Data is a key asset of our
company.
SCOUT24 DATA LANDSCAPE MANIFESTO
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
Seite 12
#2 Our Responsibility
We, Data & Analytics, are
responsible for providing a
solid Data Platform as well
as clear guidelines and
training how to participate
in the Data Landscape.
SCOUT24 DATA LANDSCAPE MANIFESTO
Data Platform
D’n’A
Data Landscape
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
Seite 13
SCOUT24 DATA LANDSCAPE MANIFESTO
#3 Data Autonomy, Not Anarchy
Data autonomy puts data
producers & data consumers
in control of their data & of
their metrics and thereby
allows us to be data-driven
at scale, but this comes with
responsibility.
SCOUT24 DATA LANDSCAPE MANIFESTO
Data Platform
Data
Producer Consumer
D’n’A
Data Landscape
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
Seite 14
Roles & Responsibilities
Central Data Lake on S3
Checkout
service
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
Special
offer
service
D’N’A
Producer
Consumer
Data Catalog
D’n’A
Seite 15
SCOUT24 DATA LANDSCAPE MANIFESTO
#4 Producer’s Responsibility
Data producers are
responsible for publishing
data to the central Data
Lake, for the data's quality,
and for publishing metadata
that makes it easy to find
and consume the data.
SCOUT24 DATA LANDSCAPE MANIFESTO
Data Platform
Metadata
Data
Producer
D’n’A
Data Landscape
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
Data Catalog
Seite 16
Roles & Responsibilities
Central Data Lake on S3
Checkout
service
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
order events
Special
offer
service
Producer
Consumer
D’n’A
Data Catalog
Seite 17
Roles & Responsibilities
Central Data Lake on S3
Checkout
service
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
order events
Special
offer
service
Ingestion Template
Producer
Consumer
D’n’A
Seite 18
SCOUT24 DATA LANDSCAPE MANIFESTO
#5 Consumer’s Responsibility
Data consumers are
responsible for the definition
& visualization of metrics
and for driving the imple-
mentation and maintenance
of these metrics.
SCOUT24 DATA LANDSCAPE MANIFESTO
Data Platform
Producer Consumer
D’n’A
Data Landscape
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
Data Catalog
Seite 19
Roles & Responsibilities
Central Data Lake on S3
Checkout
service
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
order events
Special
offer
service
View: order history by userIngestion Template
Producer
Consumer
D’n’A
Seite 20
SCOUT24 DATA LANDSCAPE MANIFESTO
#6 Exception: Core KPIs
We, Data & Analytics, take
the full ownership and
responsibility of the few top
company-wide core KPIs.
SCOUT24 DATA LANDSCAPE MANIFESTO
Data Platform
Producer Consumer
D’n’A
Data Landscape
Core
metric
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
Data Catalog
Seite 21
Roles & Responsibilities
BI Tool
Central Data Lake on S3
Analyst
Checkout
service
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
order events
Special
offer
service
View: order history by user
View: revenue generated
from orders by segments
Ingestion Template
Producer
Consumer
D’n’A
Seite 22
SCOUT24 DATA LANDSCAPE MANIFESTO
#7 Transparency Over Continuity
We value data transparency
over data continuity, which
means we may break metric
comparability if it is for the
cause of enabling better
insights.
SCOUT24 DATA LANDSCAPE MANIFESTO
Data Platform
Producer Consumer
D’n’A
Data Landscape
Core
metric
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
Seite 23
SCOUT24 DATA LANDSCAPE MANIFESTO
The Ultimate Goal
SCOUT24 DATA LANDSCAPE MANIFESTO
Data Platform
Metadata
Data
Producer Consumer
D’n’A
Data Landscape
Core
metric
Data
products
A federal landscape of data
producers and consumers
with just enough rules to
ensure seamless co-
operation without severely
impeding autonomy.
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
Seite 24
Consequences for Product
Development Teams?
- Think about data & reporting
- Deliver your data to the lake
- Provide meta data (schema, descriptions, versions)
- Eat your own dog food: Consume your own data
for reporting -> take responsibility for data quality
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
Seite 25
Benefits for Product Development
Teams?
- Independently work with data
- No dependencies to data teams
- Company data is curated and it’s easy to consume
data produced by other teams
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
DevOps
Seite 26
#DataDevOps
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
Seite 27
Learnings and lessons
 Publish exhaustive, general, and denormalized event data
 Avoid consumer-specific tailoring of data you publish
 Consume your own data, e.g. for KPI reports
 Try out ad-hoc analytics notebooks to get better insights
 Inform data producers, if you rely on their data
 Invest in documentation and guidelines for your data
platform to keep your effort for support low
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
www.scout24.com
Thanks!
Questions?
Sebastian Herold Arif Wider

Más contenido relacionado

La actualidad más candente

Denodo DataFest 2016: Metadata and Data: Search and Exploration
Denodo DataFest 2016: Metadata and Data: Search and ExplorationDenodo DataFest 2016: Metadata and Data: Search and Exploration
Denodo DataFest 2016: Metadata and Data: Search and ExplorationDenodo
 
Enterprise ready: a look at Neo4j in production
Enterprise ready: a look at Neo4j in productionEnterprise ready: a look at Neo4j in production
Enterprise ready: a look at Neo4j in productionNeo4j
 
Eneco Ronald Root
Eneco Ronald RootEneco Ronald Root
Eneco Ronald RootBigDataExpo
 
GraphTour - DXC - Digital Explorer
GraphTour - DXC - Digital ExplorerGraphTour - DXC - Digital Explorer
GraphTour - DXC - Digital ExplorerNeo4j
 
Denodo Data Innovation Award: Creating a Logical Data Fabric to Digitize City...
Denodo Data Innovation Award: Creating a Logical Data Fabric to Digitize City...Denodo Data Innovation Award: Creating a Logical Data Fabric to Digitize City...
Denodo Data Innovation Award: Creating a Logical Data Fabric to Digitize City...Denodo
 
Design Thinking for Data Superwomen & Supermen
Design Thinking for Data Superwomen & SupermenDesign Thinking for Data Superwomen & Supermen
Design Thinking for Data Superwomen & SupermenDatentreiber
 
GDPR: the IBM journey to compliance
GDPR: the IBM journey to complianceGDPR: the IBM journey to compliance
GDPR: the IBM journey to complianceDataWorks Summit
 
Jeff Kelly, Wikibon Slides; Big Data Summit 2015
Jeff Kelly, Wikibon Slides; Big Data Summit 2015Jeff Kelly, Wikibon Slides; Big Data Summit 2015
Jeff Kelly, Wikibon Slides; Big Data Summit 2015MassTLC
 
Multi-Cloud Data Integration with Data Virtualization (APAC)
Multi-Cloud Data Integration with Data Virtualization (APAC)Multi-Cloud Data Integration with Data Virtualization (APAC)
Multi-Cloud Data Integration with Data Virtualization (APAC)Denodo
 
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & AnalyticsDataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & AnalyticsDr. Arif Wider
 
Power BI : A Detailed Discussion
Power BI : A Detailed DiscussionPower BI : A Detailed Discussion
Power BI : A Detailed DiscussionSwatiTripathi44
 
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...Tyler Wishnoff
 
‘Edge’ Technologies: a new language of innovation
‘Edge’ Technologies: a new language of innovation‘Edge’ Technologies: a new language of innovation
‘Edge’ Technologies: a new language of innovationDXC Eclipse
 
Powering Asurion's Connected Home Platform with Spark Structured Streaming, D...
Powering Asurion's Connected Home Platform with Spark Structured Streaming, D...Powering Asurion's Connected Home Platform with Spark Structured Streaming, D...
Powering Asurion's Connected Home Platform with Spark Structured Streaming, D...Databricks
 
Data Thinking Preview - Predictive Analytics World for Industry 4.0
Data Thinking Preview - Predictive Analytics World for Industry 4.0Data Thinking Preview - Predictive Analytics World for Industry 4.0
Data Thinking Preview - Predictive Analytics World for Industry 4.0Datentreiber
 
Pieter den Hamer Alliander
Pieter den Hamer Alliander Pieter den Hamer Alliander
Pieter den Hamer Alliander BigDataExpo
 
Gartner Go to Market Strategy Assumptions
Gartner Go to Market Strategy AssumptionsGartner Go to Market Strategy Assumptions
Gartner Go to Market Strategy AssumptionsFour Quadrant LLC
 
What makes it worth becoming a Data Engineer?
What makes it worth becoming a Data Engineer?What makes it worth becoming a Data Engineer?
What makes it worth becoming a Data Engineer?Hadi Fadlallah
 

La actualidad más candente (20)

Future of data
Future of dataFuture of data
Future of data
 
Denodo DataFest 2016: Metadata and Data: Search and Exploration
Denodo DataFest 2016: Metadata and Data: Search and ExplorationDenodo DataFest 2016: Metadata and Data: Search and Exploration
Denodo DataFest 2016: Metadata and Data: Search and Exploration
 
Enterprise ready: a look at Neo4j in production
Enterprise ready: a look at Neo4j in productionEnterprise ready: a look at Neo4j in production
Enterprise ready: a look at Neo4j in production
 
Eneco Ronald Root
Eneco Ronald RootEneco Ronald Root
Eneco Ronald Root
 
GraphTour - DXC - Digital Explorer
GraphTour - DXC - Digital ExplorerGraphTour - DXC - Digital Explorer
GraphTour - DXC - Digital Explorer
 
Denodo Data Innovation Award: Creating a Logical Data Fabric to Digitize City...
Denodo Data Innovation Award: Creating a Logical Data Fabric to Digitize City...Denodo Data Innovation Award: Creating a Logical Data Fabric to Digitize City...
Denodo Data Innovation Award: Creating a Logical Data Fabric to Digitize City...
 
Design Thinking for Data Superwomen & Supermen
Design Thinking for Data Superwomen & SupermenDesign Thinking for Data Superwomen & Supermen
Design Thinking for Data Superwomen & Supermen
 
GDPR: the IBM journey to compliance
GDPR: the IBM journey to complianceGDPR: the IBM journey to compliance
GDPR: the IBM journey to compliance
 
Jeff Kelly, Wikibon Slides; Big Data Summit 2015
Jeff Kelly, Wikibon Slides; Big Data Summit 2015Jeff Kelly, Wikibon Slides; Big Data Summit 2015
Jeff Kelly, Wikibon Slides; Big Data Summit 2015
 
20191106 brasil it 2
20191106 brasil it 220191106 brasil it 2
20191106 brasil it 2
 
Multi-Cloud Data Integration with Data Virtualization (APAC)
Multi-Cloud Data Integration with Data Virtualization (APAC)Multi-Cloud Data Integration with Data Virtualization (APAC)
Multi-Cloud Data Integration with Data Virtualization (APAC)
 
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & AnalyticsDataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
 
Power BI : A Detailed Discussion
Power BI : A Detailed DiscussionPower BI : A Detailed Discussion
Power BI : A Detailed Discussion
 
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
 
‘Edge’ Technologies: a new language of innovation
‘Edge’ Technologies: a new language of innovation‘Edge’ Technologies: a new language of innovation
‘Edge’ Technologies: a new language of innovation
 
Powering Asurion's Connected Home Platform with Spark Structured Streaming, D...
Powering Asurion's Connected Home Platform with Spark Structured Streaming, D...Powering Asurion's Connected Home Platform with Spark Structured Streaming, D...
Powering Asurion's Connected Home Platform with Spark Structured Streaming, D...
 
Data Thinking Preview - Predictive Analytics World for Industry 4.0
Data Thinking Preview - Predictive Analytics World for Industry 4.0Data Thinking Preview - Predictive Analytics World for Industry 4.0
Data Thinking Preview - Predictive Analytics World for Industry 4.0
 
Pieter den Hamer Alliander
Pieter den Hamer Alliander Pieter den Hamer Alliander
Pieter den Hamer Alliander
 
Gartner Go to Market Strategy Assumptions
Gartner Go to Market Strategy AssumptionsGartner Go to Market Strategy Assumptions
Gartner Go to Market Strategy Assumptions
 
What makes it worth becoming a Data Engineer?
What makes it worth becoming a Data Engineer?What makes it worth becoming a Data Engineer?
What makes it worth becoming a Data Engineer?
 

Similar a DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics

Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)
Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)
Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)Thoughtworks
 
The Scout24 Data Landscape Manifesto: Building an Opinionated Data Platform
The Scout24 Data Landscape Manifesto: Building an Opinionated Data PlatformThe Scout24 Data Landscape Manifesto: Building an Opinionated Data Platform
The Scout24 Data Landscape Manifesto: Building an Opinionated Data PlatformRising Media Ltd.
 
Big Data Enabled: How YARN Changes the Game
Big Data Enabled: How YARN Changes the GameBig Data Enabled: How YARN Changes the Game
Big Data Enabled: How YARN Changes the GameInside Analysis
 
By Thoughtworks | Building data as a product: The key to unlocking Data Mesh'...
By Thoughtworks | Building data as a product: The key to unlocking Data Mesh'...By Thoughtworks | Building data as a product: The key to unlocking Data Mesh'...
By Thoughtworks | Building data as a product: The key to unlocking Data Mesh'...IngridBuenaventura
 
Turning Business Intelligence Into Actionable Insights
Turning Business Intelligence Into Actionable InsightsTurning Business Intelligence Into Actionable Insights
Turning Business Intelligence Into Actionable InsightsG3 Communications
 
Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Harald Erb
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldDataWorks Summit/Hadoop Summit
 
Emerging Trends in Multimodal Data Collection - Miovision Fall 2016
Emerging Trends in Multimodal Data Collection - Miovision Fall 2016Emerging Trends in Multimodal Data Collection - Miovision Fall 2016
Emerging Trends in Multimodal Data Collection - Miovision Fall 2016Miovision
 
Trends for Modernizing Analytics and Data Warehousing in 2019
Trends for Modernizing Analytics and Data Warehousing in 2019Trends for Modernizing Analytics and Data Warehousing in 2019
Trends for Modernizing Analytics and Data Warehousing in 2019Arcadia Data
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixNicolas Morales
 
Growth hacking in the age of Data
Growth hacking in the age of DataGrowth hacking in the age of Data
Growth hacking in the age of DataDaniel Saito
 
Data and its Role in Your Digital Transformation
Data and its Role in Your Digital TransformationData and its Role in Your Digital Transformation
Data and its Role in Your Digital TransformationVMware Tanzu
 
Role of Data in Digital Transformation
Role of Data in Digital TransformationRole of Data in Digital Transformation
Role of Data in Digital TransformationVMware Tanzu
 
Data Integration for Both Self-Service Analytics and IT Users
Data Integration for Both Self-Service Analytics and IT Users Data Integration for Both Self-Service Analytics and IT Users
Data Integration for Both Self-Service Analytics and IT Users Senturus
 
Big Data in Action – Real-World Solution Showcase
 Big Data in Action – Real-World Solution Showcase Big Data in Action – Real-World Solution Showcase
Big Data in Action – Real-World Solution ShowcaseInside Analysis
 
IDC Portugal | Como Libertar os Seus Dados com Virtualização de Dados
IDC Portugal | Como Libertar os Seus Dados com Virtualização de DadosIDC Portugal | Como Libertar os Seus Dados com Virtualização de Dados
IDC Portugal | Como Libertar os Seus Dados com Virtualização de DadosDenodo
 
¿Cómo las manufacturas están evolucionando hacia la Industria 4.0 con la virt...
¿Cómo las manufacturas están evolucionando hacia la Industria 4.0 con la virt...¿Cómo las manufacturas están evolucionando hacia la Industria 4.0 con la virt...
¿Cómo las manufacturas están evolucionando hacia la Industria 4.0 con la virt...Denodo
 
RWDG Slides: Building Data Governance Through Data Stewardship
RWDG Slides: Building Data Governance Through Data StewardshipRWDG Slides: Building Data Governance Through Data Stewardship
RWDG Slides: Building Data Governance Through Data StewardshipDATAVERSITY
 
CIO priorities and Data Virtualization: Balancing the Yin and Yang of the IT
CIO priorities and Data Virtualization: Balancing the Yin and Yang of the ITCIO priorities and Data Virtualization: Balancing the Yin and Yang of the IT
CIO priorities and Data Virtualization: Balancing the Yin and Yang of the ITDenodo
 

Similar a DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics (20)

Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)
Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)
Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)
 
The Scout24 Data Landscape Manifesto: Building an Opinionated Data Platform
The Scout24 Data Landscape Manifesto: Building an Opinionated Data PlatformThe Scout24 Data Landscape Manifesto: Building an Opinionated Data Platform
The Scout24 Data Landscape Manifesto: Building an Opinionated Data Platform
 
Big Data Enabled: How YARN Changes the Game
Big Data Enabled: How YARN Changes the GameBig Data Enabled: How YARN Changes the Game
Big Data Enabled: How YARN Changes the Game
 
By Thoughtworks | Building data as a product: The key to unlocking Data Mesh'...
By Thoughtworks | Building data as a product: The key to unlocking Data Mesh'...By Thoughtworks | Building data as a product: The key to unlocking Data Mesh'...
By Thoughtworks | Building data as a product: The key to unlocking Data Mesh'...
 
Turning Business Intelligence Into Actionable Insights
Turning Business Intelligence Into Actionable InsightsTurning Business Intelligence Into Actionable Insights
Turning Business Intelligence Into Actionable Insights
 
Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data World
 
Emerging Trends in Multimodal Data Collection - Miovision Fall 2016
Emerging Trends in Multimodal Data Collection - Miovision Fall 2016Emerging Trends in Multimodal Data Collection - Miovision Fall 2016
Emerging Trends in Multimodal Data Collection - Miovision Fall 2016
 
Trends for Modernizing Analytics and Data Warehousing in 2019
Trends for Modernizing Analytics and Data Warehousing in 2019Trends for Modernizing Analytics and Data Warehousing in 2019
Trends for Modernizing Analytics and Data Warehousing in 2019
 
Are you ready for Big Data 2.0? EMA Analyst Research
Are you ready for Big Data 2.0? EMA Analyst ResearchAre you ready for Big Data 2.0? EMA Analyst Research
Are you ready for Big Data 2.0? EMA Analyst Research
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with Bluemix
 
Growth hacking in the age of Data
Growth hacking in the age of DataGrowth hacking in the age of Data
Growth hacking in the age of Data
 
Data and its Role in Your Digital Transformation
Data and its Role in Your Digital TransformationData and its Role in Your Digital Transformation
Data and its Role in Your Digital Transformation
 
Role of Data in Digital Transformation
Role of Data in Digital TransformationRole of Data in Digital Transformation
Role of Data in Digital Transformation
 
Data Integration for Both Self-Service Analytics and IT Users
Data Integration for Both Self-Service Analytics and IT Users Data Integration for Both Self-Service Analytics and IT Users
Data Integration for Both Self-Service Analytics and IT Users
 
Big Data in Action – Real-World Solution Showcase
 Big Data in Action – Real-World Solution Showcase Big Data in Action – Real-World Solution Showcase
Big Data in Action – Real-World Solution Showcase
 
IDC Portugal | Como Libertar os Seus Dados com Virtualização de Dados
IDC Portugal | Como Libertar os Seus Dados com Virtualização de DadosIDC Portugal | Como Libertar os Seus Dados com Virtualização de Dados
IDC Portugal | Como Libertar os Seus Dados com Virtualização de Dados
 
¿Cómo las manufacturas están evolucionando hacia la Industria 4.0 con la virt...
¿Cómo las manufacturas están evolucionando hacia la Industria 4.0 con la virt...¿Cómo las manufacturas están evolucionando hacia la Industria 4.0 con la virt...
¿Cómo las manufacturas están evolucionando hacia la Industria 4.0 con la virt...
 
RWDG Slides: Building Data Governance Through Data Stewardship
RWDG Slides: Building Data Governance Through Data StewardshipRWDG Slides: Building Data Governance Through Data Stewardship
RWDG Slides: Building Data Governance Through Data Stewardship
 
CIO priorities and Data Virtualization: Balancing the Yin and Yang of the IT
CIO priorities and Data Virtualization: Balancing the Yin and Yang of the ITCIO priorities and Data Virtualization: Balancing the Yin and Yang of the IT
CIO priorities and Data Virtualization: Balancing the Yin and Yang of the IT
 

Más de Dr. Arif Wider

Data Mesh - It's not about technology, it's about people
Data Mesh - It's not about technology, it's about peopleData Mesh - It's not about technology, it's about people
Data Mesh - It's not about technology, it's about peopleDr. Arif Wider
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
 
Continuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in ProductionContinuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in ProductionDr. Arif Wider
 
Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...
Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...
Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...Dr. Arif Wider
 
Continuous Intelligence: Moving Machine Learning into Production Reliably
Continuous Intelligence: Moving Machine Learning into Production ReliablyContinuous Intelligence: Moving Machine Learning into Production Reliably
Continuous Intelligence: Moving Machine Learning into Production ReliablyDr. Arif Wider
 
Continuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in ProductionContinuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in ProductionDr. Arif Wider
 
Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...
Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...
Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...Dr. Arif Wider
 
A High-Performance Solution to Microservice UI Composition @ XConf Hamburg
A High-Performance Solution to Microservice UI Composition @ XConf HamburgA High-Performance Solution to Microservice UI Composition @ XConf Hamburg
A High-Performance Solution to Microservice UI Composition @ XConf HamburgDr. Arif Wider
 
An Unexpected Solution to Microservices UI Composition
An Unexpected Solution to Microservices UI CompositionAn Unexpected Solution to Microservices UI Composition
An Unexpected Solution to Microservices UI CompositionDr. Arif Wider
 

Más de Dr. Arif Wider (9)

Data Mesh - It's not about technology, it's about people
Data Mesh - It's not about technology, it's about peopleData Mesh - It's not about technology, it's about people
Data Mesh - It's not about technology, it's about people
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
Continuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in ProductionContinuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in Production
 
Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...
Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...
Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...
 
Continuous Intelligence: Moving Machine Learning into Production Reliably
Continuous Intelligence: Moving Machine Learning into Production ReliablyContinuous Intelligence: Moving Machine Learning into Production Reliably
Continuous Intelligence: Moving Machine Learning into Production Reliably
 
Continuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in ProductionContinuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in Production
 
Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...
Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...
Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...
 
A High-Performance Solution to Microservice UI Composition @ XConf Hamburg
A High-Performance Solution to Microservice UI Composition @ XConf HamburgA High-Performance Solution to Microservice UI Composition @ XConf Hamburg
A High-Performance Solution to Microservice UI Composition @ XConf Hamburg
 
An Unexpected Solution to Microservices UI Composition
An Unexpected Solution to Microservices UI CompositionAn Unexpected Solution to Microservices UI Composition
An Unexpected Solution to Microservices UI Composition
 

Último

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 

Último (20)

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 

DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics

  • 1. DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics Dr. Arif Wider & Sebastian Herold Munich, Feb 7th, 2018
  • 2. Seite 2 Dr. Arif Wider - Senior Consultant/Dev - Scala/FP enthusiast - ThoughtWorks Germany data strategy group @arifwider Sebastian Herold - Chief Data Architect @Scout @Scout24 until Dec - BigData Architect @Zalando from Jan - Data Evangelist @heroldamus
  • 3. Seite 3 Road to MicroService Architecture – How we started in 2007 BI Tool Middle Tier DWH Staging Core DB CRM DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider 2007 Web Tier Analyst BI Dev
  • 4. Seite 4 Road to MicroService Architecture – How things got complicated in 2011 BI Tool Middle Tier DWH Staging Core DB CRM Web 2011 API APP $$$ APPMySQL Analyst BI Dev DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  • 5. APPMySQL APPMySQL APPMySQL Seite 5 Road to MicroService Architecture – How we sliced the monolith in 2013 BI Tool DWH StagingCRM Web 2013 API APPMySQL Core DB EXP Mongo SEA Elastic Sync APP APIAPI API HADOOP REST API Analyst BI Dev DE DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  • 6. AWS APP APP APP APPMySQL APPMySQL APPMySQL Seite 6 Road to MicroService Architecture – How a central data team doesn’t scale BI Tool DWH StagingCRM Web 2015 API APPMySQL Core DB EXP Mongo SEA Elastic Sync APP APIAPIAPI HADOOP REST API APPAPP Analyst BI Dev DE DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  • 7. Core DB APPAPPAPPAPPAPPAPPAPPAPPAPP AWS Seite 7 Road to MicroService Architecture – How we rearchitectured our Data Landscape BI Tool DWH Central Data Lake on S3 CRM 2017 Core DB APP REST API Analyst DE BI Dev APPAPPAPP DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  • 8. Seite 8 Scout24 wants to become a truly data-driven company Fast & easy data-driven product development… …supported by Data & Analytics DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  • 9. Seite 9 Scout24 wants to become a truly data-driven company Everywhere in the company... ...without bloating up D‘n‘A Image source: https://www.oddsemiconductorservices.com/ DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  • 10. Seite 10 SCOUT24 DATA LANDSCAPE MANIFESTO ROLES, RESPONSIBILITIES, AND VALUES FOR A DATA-DRIVEN COMPANY AT SCALE DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  • 11. Seite 11 SCOUT24 DATA LANDSCAPE MANIFESTO #1 Preamble Data is a key asset of our company. SCOUT24 DATA LANDSCAPE MANIFESTO DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  • 12. Seite 12 #2 Our Responsibility We, Data & Analytics, are responsible for providing a solid Data Platform as well as clear guidelines and training how to participate in the Data Landscape. SCOUT24 DATA LANDSCAPE MANIFESTO Data Platform D’n’A Data Landscape DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  • 13. Seite 13 SCOUT24 DATA LANDSCAPE MANIFESTO #3 Data Autonomy, Not Anarchy Data autonomy puts data producers & data consumers in control of their data & of their metrics and thereby allows us to be data-driven at scale, but this comes with responsibility. SCOUT24 DATA LANDSCAPE MANIFESTO Data Platform Data Producer Consumer D’n’A Data Landscape DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  • 14. Seite 14 Roles & Responsibilities Central Data Lake on S3 Checkout service DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider Special offer service D’N’A Producer Consumer Data Catalog D’n’A
  • 15. Seite 15 SCOUT24 DATA LANDSCAPE MANIFESTO #4 Producer’s Responsibility Data producers are responsible for publishing data to the central Data Lake, for the data's quality, and for publishing metadata that makes it easy to find and consume the data. SCOUT24 DATA LANDSCAPE MANIFESTO Data Platform Metadata Data Producer D’n’A Data Landscape DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  • 16. Data Catalog Seite 16 Roles & Responsibilities Central Data Lake on S3 Checkout service DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider order events Special offer service Producer Consumer D’n’A
  • 17. Data Catalog Seite 17 Roles & Responsibilities Central Data Lake on S3 Checkout service DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider order events Special offer service Ingestion Template Producer Consumer D’n’A
  • 18. Seite 18 SCOUT24 DATA LANDSCAPE MANIFESTO #5 Consumer’s Responsibility Data consumers are responsible for the definition & visualization of metrics and for driving the imple- mentation and maintenance of these metrics. SCOUT24 DATA LANDSCAPE MANIFESTO Data Platform Producer Consumer D’n’A Data Landscape DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  • 19. Data Catalog Seite 19 Roles & Responsibilities Central Data Lake on S3 Checkout service DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider order events Special offer service View: order history by userIngestion Template Producer Consumer D’n’A
  • 20. Seite 20 SCOUT24 DATA LANDSCAPE MANIFESTO #6 Exception: Core KPIs We, Data & Analytics, take the full ownership and responsibility of the few top company-wide core KPIs. SCOUT24 DATA LANDSCAPE MANIFESTO Data Platform Producer Consumer D’n’A Data Landscape Core metric DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  • 21. Data Catalog Seite 21 Roles & Responsibilities BI Tool Central Data Lake on S3 Analyst Checkout service DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider order events Special offer service View: order history by user View: revenue generated from orders by segments Ingestion Template Producer Consumer D’n’A
  • 22. Seite 22 SCOUT24 DATA LANDSCAPE MANIFESTO #7 Transparency Over Continuity We value data transparency over data continuity, which means we may break metric comparability if it is for the cause of enabling better insights. SCOUT24 DATA LANDSCAPE MANIFESTO Data Platform Producer Consumer D’n’A Data Landscape Core metric DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  • 23. Seite 23 SCOUT24 DATA LANDSCAPE MANIFESTO The Ultimate Goal SCOUT24 DATA LANDSCAPE MANIFESTO Data Platform Metadata Data Producer Consumer D’n’A Data Landscape Core metric Data products A federal landscape of data producers and consumers with just enough rules to ensure seamless co- operation without severely impeding autonomy. DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  • 24. Seite 24 Consequences for Product Development Teams? - Think about data & reporting - Deliver your data to the lake - Provide meta data (schema, descriptions, versions) - Eat your own dog food: Consume your own data for reporting -> take responsibility for data quality DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  • 25. Seite 25 Benefits for Product Development Teams? - Independently work with data - No dependencies to data teams - Company data is curated and it’s easy to consume data produced by other teams DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  • 26. DevOps Seite 26 #DataDevOps DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  • 27. Seite 27 Learnings and lessons  Publish exhaustive, general, and denormalized event data  Avoid consumer-specific tailoring of data you publish  Consume your own data, e.g. for KPI reports  Try out ad-hoc analytics notebooks to get better insights  Inform data producers, if you rely on their data  Invest in documentation and guidelines for your data platform to keep your effort for support low DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider

Notas del editor

  1. Welcome to our talk: DataDevOps – Data Manifesto
  2. We worked together in Data & Analytics (D‘n‘A)
  3. Perspective of a data engineer -> in reality much complexer -> such to simplify things Let’s go back 10 years to 2007 (someparts are even older than that) Applicatoin: clean 3-tier-architecture Web Tier Middle Tier Operative Oracle DB (Klick) Analysts wanted to create reports (Klick) own DWH to not block the CoreDB for analytical queries Core DB -> Staging -> DWH -> BI Tool
  4. 2011: more and more Systems needed to be integrated into DWH One size fits all database approach DB doesn’t scale anymore, more different load profiles pay big amount of money to Oracle More systems needed to be integrated into DWH
  5. 2013 (4 years ago): Beginning of the chaos DB scaling problem solved -> Denormalisation of data: own DB for search, detail pages -> synchronization of the data More microservices showed up that provide data More unstructured data which do not fit into classical relational data storages Build Hadoop Cluster Not for inserting single events REST-API in front, collects events of same type and package them in bigger chunks and copies them to HDFS Easy Reporting for applications JSON for business reporting is the new standard, completely different then the previous relational world Standardisation thourgh company wide unique IDs Direct connection to BI Tools More and more analysts and data scientists directly work on the cluster by using Hive, Spark, etc.
  6. 2015: We had a complete chaos More and more applications Cloud strategy -> on the long run we should put everything to AWS Most of the time we were maintaining mappings My team need to collect metadata all the time and deeply understand the different domains Central bootleneck for whole company No one could introduce new or change data without us People got mad at us We needed to change something
  7. 2017: (Klick) Merge BI Developers and Data Engineers into one team (Klick) establish a central data lake within AWS Leading system for structured and unstructured data, easy to connect/join things Why S3? Cheap & reliable, at least cheaper and more reliable than most of the people in the room could provide Integrated into most of the current big data technologies Access through many clusters at once Performance deadvantage not that big, intermedite results will kept in HDFS sometimes (Klick) DWH just a cache fpr analytical queries (Klick) old applications in our on-premise data center still use Hadoop Rest API (Klick) direct exports from databases (Klick) CRM imports and exports data (Klick) New applications stream data through Kinesis Firehose These are the requirements that the dev teams could easily ingest data to the data platform and data could be join of course, this is a birds view, in reality it’s much more complex And then another topic came along, but Arif will tell you about it
  8. - Because of microservices the amount and heterogeneity of data sources has multiplied. - Sebastian explained nicley how this can be tackled with a more appropriate technical approach. - However, in parallel to this technical development, there was a also strong push for data-driven product development happening at Scout. - What does this mean? Culture of Experimentation (small cycles)
  9. - …this means that now also the number of data consumers in the company has multiplied. - These consumers want to correlate their specific data with that general data warehouse data. - DNA wants to help but but company does not want to spend the resources to equally multiply the data team. - As a result the data team was even more becoming a bottleneck and the frustration on both sides went up - Often because of unclear responsibilities or a distribution of responsibilities that had not changed since 2007 - Therefore we realised that it is not enough to put the techincal organisation on a new solid foundation but also the way how people interact with data and manage responsibilities about data needed a new foundation.
  10. - To signal a new thinking here, we had to idea to formulate a Data Landscape Manifesto which we as a company would agree on. - This is about roles, responsibilities and common values - Consists of 7 principles, which are each based on a assumption or a belief from which we derived that principle.
  11. We believe that collecting & analyzing data is crucial to understand our business, our customers, and the market in order to provide the right services & products Although this is nothing surprising these days, we wanted to start with this in order to ensure a common understanding of why all of this is important in the first place. --> Loosely coupled (Microservices), strongly ALIGNED (Jez Humble, Adrian Cockroft)
  12. We therefore believe that everyone in the company must have easy access to the data available and it must be easy to publish data which can be used by others. This requires a solid Data Platform: easy-to-use tools, reliable infrastructure , and simple guidelines for publishing & consuming data. … This is our core responsibility (and we wanted to start with this side). The data landscape is the playground on which data producers and data consumers interact. We provide the platform and the clear guidelines but we do not own that space . The reason for this is that we believe..
  13. We believe that an exhaustive centralized data management does not allow us to scale to the level of data creation and consumption we aspire as a company, because it creates a bottleneck and introduces accidental, indirect dependencies. Instead , we believe that data autonomy is the only way for data usage to scale across the company. However, for data autonomy to not become data anarchy, there has to be a clear set of basic rules and responsibilities. Data autonomy puts…
  14. Introduce roles
  15. We believe that extensive data availability, data discoverability, and data usability are crucial and that – at scale – no one else can ensure this other than the one controlling the source where the data is originally generated.
  16. We believe that the stakeholder of a metric has to be the single owner of that metric and its definition, and has to drive its implementation. Without a single source of truth about what a metric means, we risk that multiple diverging and possibly contradicting understandings and implementations develop over time.
  17. We believe that a minimum level of company-wide compar-ability& reliability of core KPIs is crucial for leading the company into the right direction. The management is the owner of these core KPIs and the data group represents the management here in terms of metric ownership.
  18. We believe that transparency is crucial for understanding what the meaning of a metric is. If month-to-month comparability must never break, there is no way to continuously improve metrics and their transparency based on new insights. To stay in the example: if we actually understand that a certain number of orders are actually fraud than we want to report the actual real revenue.
  19. A federal landscape of data producers and consumers with just enough rules to ensure seamless co-operation without severely impeding autonomy.
  20. What does it mean to product development team in their day-to-day business? (Klick) Think about data: Reporting, how to structure data? And Which database should I use, at least in AWS there are tons of options Maybe you need to maintain it yourself (Klick) They need to bring the data theirself (supported by data platform team/documentation) (Klick) They need to provide metadata: Schema Description Connectivity (ids matching other ids in the lake) Versionint (Klick) Eat your own dog food: Use your delivered data for your own reporting Twist in responsibility: Data-Quality is managed by the producer -> understand the reporting infrastructure -> Take the view of a data consumer and understand what other people do with the data
  21. What is the benefit? No waiting for Data Team -> work indepedently Their data and data from other team is easier to use and can be easily integrated into their, because everybody is using the same paradigm
  22. So we just heard more responsibility and required skills on the one side, but therefore less dependencies and decreased cycle time on the other side. This sounds a lot like what DevOps is preaching. …
  23.  Publish exhaustive, general, and denormalized event data Avoid consumer-specific tailoring of data you publish Consume your own data, e.g. for KPI reports Try out ad-hoc analytics notebooks to get better insights Inform data producers, if you rely on their data Invest in documentation and guidelines for your data platform to keep your effort for support low
  24. Done