SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
www.scling.com
Data democratised
Next data analytics & protection, 2019-12-11
Lars Albertsson (@lalleal)
Scling
1
www.scling.com
Big data adoption
22
● 2003-2007: Only Google
● 2007-2014: Hadoop era (Europe). Highly technical
companies succeed and disrupt.
● 2015-2019: Enterprise adoption (Europe). Big data
gone from Gartner hype cycle. “New normal”
● 2019: Many enterprises in production, but big data and
machine learning ROI still confined to high-tech.
www.scling.com
Data value efficiency gap
aka
disrupted or disruptor
3
Early Spotify recommendations
Creator of Luigi, Annoy
www.scling.com
Efficiency gap, latency
4
We just took a machine
learning pipeline in
production after 8 months.
Great success!
Scandinavian retail
(pycon.se, 2019)Document similarity
pipeline finally in
production. Estimated 3
months, took 8 months.
Scandinavian telecom
(NDSML Summit 2019)
2016: Data platform approval
2018: Pipeline in production
Dutch bank
(Dataworks Summit 2018)
Bonnier News
(Riga DevOpsDays 2018)
Platform + 1st pipeline in production.
Seven weeks, 1 person.
Scandinavian retail
2018
New pipeline: < 1 day
Mend pipeline: < 1 hour
Spotify DataOps
transform, 2013
Platform + 1st pipeline in production.
Three weeks, 4 persons.
20 pipelines in 8 months.
www.scling.com
Efficiency gap, data cost & value
● Data processing produces datasets
● Each dataset has business value
○ Financial, sales, forecasting reports
○ A/B test, auto completion, insights
○ Recommendations, fraud
● Proxy metric: datasets / day
○ S-M traditional: < 10
○ Bank, telecom, media: 10-1000
5
2016: 20000 datasets / day
2017: 100B events collected / day
Spotify
2016: 1600 000 000
datasets / day
Google
www.scling.com
Data efficiency key factors
6
Data democratisation
● Making data available,
usable, accessible DataOps
● Short path from idea to production
● Cross-functional teams
○ Data engineering, domain experts, product, (data science)
○ Aligned with value, not function
● Low cost of failure
○ Machine and human failure
○ Risks ok → move fast
● Engineered operations
www.scling.com
Service-oriented organisations
● Teams own services
● Teams own data
7
www.scling.com
Data-centric innovation
● Need data from teams
○ willing?
○ backlog?
○ collected?
○ useful?
○ quality?
○ extraction?
○ data governance?
○ history?
8
www.scling.com
Data-centric innovation
● Need data from teams
○ willing?
○ backlog?
○ collected?
○ useful?
○ quality?
○ extraction?
○ data governance?
○ history?
● Innovation friction
Value adding Waste
9
www.scling.com
Centralising data
10
Data lake
www.scling.com
More data - decreased friction
11
Data lake
Stream storage
www.scling.com
Hadoop is dead?
12
www.scling.com
Traditional systems
13
Mutation
www.scling.com
Data lake
Transformation
Cold
store
Data pipelines at a glance
14
Mutation
Immutable,
shareable
www.scling.com
Data lake
Transformation
Cold
store
Data pipelines at a glance
15
Mutation
Immutable,
shareable
Early Hadoop:
● Weak indexing
● No transactions
● Weak security
● Batch transformations
DataOps workflows:
● Immutable, shared data
● Resilient to failure
● Quick error recovery
● Low-risk experiments
www.scling.com
Late Hadoop adoption
16
Mutation
Can you please
implement mutability,
transactions, SQL, etc?
We would like to keep
our workflows.
Anything, as long as
you are buying.
DataOps workflows:
● Immutable, shared data
● Resilient to failure
● Quick error recovery
● Low-risk experiments
www.scling.com
Complex business logic - MDM @ Spotify ~2014
● 10 pipelines like this
● Pipeline dev environment
● Pipeline continuous deployment
infrastructure
One team of five engineers
17
www.scling.com
Data value = data + domain expertise + data practices
18
Disrupt?
https://xkcd.com/1831/
+ 1000s of failures...
www.scling.com
Data value = data + domain expertise + data practices
19
Disrupt?
https://xkcd.com/1831/
Adapt?
+ 1000s of failures...
www.scling.com
Data value = data + domain expertise + data practices
20
Data lake
Stream storage
Client data +
domain expertise
Practices from
data leaders
Disrupt?
https://xkcd.com/1831/
Collaborate?
Data-value-as-a-service
Adapt?
+ 1000s of failures...
www.scling.com
Factors of democratisation
21
Siloed Shared
Distributed
storage
Homogeneous
storage
CoordinatedOrganic
www.scling.com
Factors of democratisation
22
Siloed Shared
Distributed
storage
Homogeneous
storage
Documentation
read+write accessNeed-to-know
basis
CoordinatedOrganic
www.scling.com
Factors of democratisation
23
Siloed Shared
Distributed
storage
Homogeneous
storage
Documentation
read+write accessNeed-to-know
basis
Code read+write
access
Closed code
ownership
CoordinatedOrganic
www.scling.com
Factors of democratisation
24
Siloed Shared
Distributed
storage
Homogeneous
storage
Documentation
read+write accessNeed-to-know
basis
Code read+write
access
Closed code
ownership
Coordinated data
governanceLocal rituals
CoordinatedOrganic
www.scling.com
Factors of democratisation
25
Siloed Shared
Distributed
storage
Homogeneous
storage
Documentation
read+write accessNeed-to-know
basis
Code read+write
access
Closed code
ownership
Coordinated data
governanceLocal rituals
Common glossary,
semantics
Tribal
knowledge
CoordinatedOrganic
www.scling.com
Factors of democratisation
26
Siloed Shared
Distributed
storage
Homogeneous
storage
Documentation
read+write accessNeed-to-know
basis
Code read+write
access
Closed code
ownership
Coordinated data
governanceLocal rituals
Common glossary,
semantics
Tribal
knowledge
Common data
provenance
Unclear data
origin
CoordinatedOrganic
www.scling.com
Factors of democratisation
27
Siloed Shared
Distributed
storage
Homogeneous
storage
Documentation
read+write accessNeed-to-know
basis
Code read+write
access
Closed code
ownership
Coordinated data
governanceLocal rituals
Common glossary,
semantics
Tribal
knowledge
Common DataOps
procedures
Lay-on-hands
deployment
Common data
provenance
Unclear data
origin
CoordinatedOrganic
www.scling.com
An e-shopping tale
28
1. Log in, search for product X
○ X + 100s of accessories, random order
2. Find X in product catalog
○ No link to web shop
3. Put in cart, delivery?
○ Ask for address, customer club number
4. …
Full story: “Avoid artificial stupidity” blog post
1. Log in, search for product X
○ Popular items first
2. Find X in product catalog
○ Take me to shop
3. Put in cart, delivery?
○ I am logged in
4. ...
www.scling.com
● Include minimal governance, security, privacy
Data lake
Transformation
Cold
store
Document a clean architecture
29
Mutation
Immutable,
shareable
● Align team with use case
○ Zero budget
● Ingest only necessary data
● Key technical component: Workflow orchestrator (Luigi / Airflow)
A lean start
30
www.scling.com
An MVP is minimal
31
Out of scope
Minimal privacy -
limiting access
One use
case
In scope
Minimal
privacy
Security
One DB
source
One use
caseData
scala-
bility
High
availa-
bility
Dura-
bility
Most
privacy
Self
service
Data
quality
Auto-
mation
Clusters
Audita-
bility
Scalable
BI
Fill lake
Real-
time
Lineage
● Remove complexity wherever possible
○ Unfamiliar tools may be less complex
● Pay attention to human and social factors
Journey towards data value
32
“Five dysfunctions of a data engineering team” -
Jesse Anderson
● Only database admins
● Set up for failure
● No one understands schema
● No veterans
● Too ambitious
“Avoiding big data antipatterns” -
Alex Holmes
● Big data tech for small data
● Point-to-point data integration
● Single tool for the job
● Excess volume or precision
● Lack of security

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality
 
Data ops in practice
Data ops in practiceData ops in practice
Data ops in practice
 
The right side of speed - learning to shift left
The right side of speed - learning to shift leftThe right side of speed - learning to shift left
The right side of speed - learning to shift left
 
DataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesDataOps - Lean principles and lean practices
DataOps - Lean principles and lean practices
 
Protecting privacy in practice
Protecting privacy in practiceProtecting privacy in practice
Protecting privacy in practice
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
 
Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015
Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015
Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
Enabling the Bank of the Future by Ignacio Bernal
Enabling the Bank of the Future by Ignacio BernalEnabling the Bank of the Future by Ignacio Bernal
Enabling the Bank of the Future by Ignacio Bernal
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data Pipeline
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!
 
Testing data streaming applications
Testing data streaming applicationsTesting data streaming applications
Testing data streaming applications
 
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
 
Big Data with Apache Hadoop
Big Data with Apache HadoopBig Data with Apache Hadoop
Big Data with Apache Hadoop
 
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
 
Building a Self-Service Big Data Pipeline
Building a Self-Service Big Data PipelineBuilding a Self-Service Big Data Pipeline
Building a Self-Service Big Data Pipeline
 
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
 
Stored Procedure Superpowers: A Developer’s Guide
Stored Procedure Superpowers: A Developer’s GuideStored Procedure Superpowers: A Developer’s Guide
Stored Procedure Superpowers: A Developer’s Guide
 
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
 

Similar a Data democratised

Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Matt Stubbs
 

Similar a Data democratised (20)

Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
What is the future of data strategy?
What is the future of data strategy?What is the future of data strategy?
What is the future of data strategy?
 
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
 
Big Data Pitfalls
Big Data PitfallsBig Data Pitfalls
Big Data Pitfalls
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
 
The lean principles of data ops
The lean principles of data opsThe lean principles of data ops
The lean principles of data ops
 
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
Building & Scaling Data Teams
Building & Scaling Data TeamsBuilding & Scaling Data Teams
Building & Scaling Data Teams
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern Staender
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Accelerating Self-Service Analytics with Denodo and Tableau (Singapore)
Accelerating Self-Service Analytics with Denodo and Tableau (Singapore)Accelerating Self-Service Analytics with Denodo and Tableau (Singapore)
Accelerating Self-Service Analytics with Denodo and Tableau (Singapore)
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccion
 
Introduction to Harnessing Big Data
Introduction to Harnessing Big DataIntroduction to Harnessing Big Data
Introduction to Harnessing Big Data
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with Alation
 

Más de Lars Albertsson

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
The 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdfThe 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdf
Lars Albertsson
 

Más de Lars Albertsson (15)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Crossing the data divide
Crossing the data divideCrossing the data divide
Crossing the data divide
 
Schema management with Scalameta
Schema management with ScalametaSchema management with Scalameta
Schema management with Scalameta
 
How to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfHow to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdf
 
Data engineering in 10 years.pdf
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdf
 
The 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdfThe 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdf
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application quality
 
Secure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetSecure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budget
 
Ai legal and ethics
Ai   legal and ethicsAi   legal and ethics
Ai legal and ethics
 
Eventually, time will kill your data pipeline
Eventually, time will kill your data pipelineEventually, time will kill your data pipeline
Eventually, time will kill your data pipeline
 
Big data == lean data
Big data == lean dataBig data == lean data
Big data == lean data
 
Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0
 
A primer on building real time data-driven products
A primer on building real time data-driven productsA primer on building real time data-driven products
A primer on building real time data-driven products
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelines
 

Último

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 

Data democratised

  • 1. www.scling.com Data democratised Next data analytics & protection, 2019-12-11 Lars Albertsson (@lalleal) Scling 1
  • 2. www.scling.com Big data adoption 22 ● 2003-2007: Only Google ● 2007-2014: Hadoop era (Europe). Highly technical companies succeed and disrupt. ● 2015-2019: Enterprise adoption (Europe). Big data gone from Gartner hype cycle. “New normal” ● 2019: Many enterprises in production, but big data and machine learning ROI still confined to high-tech.
  • 3. www.scling.com Data value efficiency gap aka disrupted or disruptor 3 Early Spotify recommendations Creator of Luigi, Annoy
  • 4. www.scling.com Efficiency gap, latency 4 We just took a machine learning pipeline in production after 8 months. Great success! Scandinavian retail (pycon.se, 2019)Document similarity pipeline finally in production. Estimated 3 months, took 8 months. Scandinavian telecom (NDSML Summit 2019) 2016: Data platform approval 2018: Pipeline in production Dutch bank (Dataworks Summit 2018) Bonnier News (Riga DevOpsDays 2018) Platform + 1st pipeline in production. Seven weeks, 1 person. Scandinavian retail 2018 New pipeline: < 1 day Mend pipeline: < 1 hour Spotify DataOps transform, 2013 Platform + 1st pipeline in production. Three weeks, 4 persons. 20 pipelines in 8 months.
  • 5. www.scling.com Efficiency gap, data cost & value ● Data processing produces datasets ● Each dataset has business value ○ Financial, sales, forecasting reports ○ A/B test, auto completion, insights ○ Recommendations, fraud ● Proxy metric: datasets / day ○ S-M traditional: < 10 ○ Bank, telecom, media: 10-1000 5 2016: 20000 datasets / day 2017: 100B events collected / day Spotify 2016: 1600 000 000 datasets / day Google
  • 6. www.scling.com Data efficiency key factors 6 Data democratisation ● Making data available, usable, accessible DataOps ● Short path from idea to production ● Cross-functional teams ○ Data engineering, domain experts, product, (data science) ○ Aligned with value, not function ● Low cost of failure ○ Machine and human failure ○ Risks ok → move fast ● Engineered operations
  • 8. www.scling.com Data-centric innovation ● Need data from teams ○ willing? ○ backlog? ○ collected? ○ useful? ○ quality? ○ extraction? ○ data governance? ○ history? 8
  • 9. www.scling.com Data-centric innovation ● Need data from teams ○ willing? ○ backlog? ○ collected? ○ useful? ○ quality? ○ extraction? ○ data governance? ○ history? ● Innovation friction Value adding Waste 9
  • 11. www.scling.com More data - decreased friction 11 Data lake Stream storage
  • 14. www.scling.com Data lake Transformation Cold store Data pipelines at a glance 14 Mutation Immutable, shareable
  • 15. www.scling.com Data lake Transformation Cold store Data pipelines at a glance 15 Mutation Immutable, shareable Early Hadoop: ● Weak indexing ● No transactions ● Weak security ● Batch transformations DataOps workflows: ● Immutable, shared data ● Resilient to failure ● Quick error recovery ● Low-risk experiments
  • 16. www.scling.com Late Hadoop adoption 16 Mutation Can you please implement mutability, transactions, SQL, etc? We would like to keep our workflows. Anything, as long as you are buying. DataOps workflows: ● Immutable, shared data ● Resilient to failure ● Quick error recovery ● Low-risk experiments
  • 17. www.scling.com Complex business logic - MDM @ Spotify ~2014 ● 10 pipelines like this ● Pipeline dev environment ● Pipeline continuous deployment infrastructure One team of five engineers 17
  • 18. www.scling.com Data value = data + domain expertise + data practices 18 Disrupt? https://xkcd.com/1831/ + 1000s of failures...
  • 19. www.scling.com Data value = data + domain expertise + data practices 19 Disrupt? https://xkcd.com/1831/ Adapt? + 1000s of failures...
  • 20. www.scling.com Data value = data + domain expertise + data practices 20 Data lake Stream storage Client data + domain expertise Practices from data leaders Disrupt? https://xkcd.com/1831/ Collaborate? Data-value-as-a-service Adapt? + 1000s of failures...
  • 21. www.scling.com Factors of democratisation 21 Siloed Shared Distributed storage Homogeneous storage CoordinatedOrganic
  • 22. www.scling.com Factors of democratisation 22 Siloed Shared Distributed storage Homogeneous storage Documentation read+write accessNeed-to-know basis CoordinatedOrganic
  • 23. www.scling.com Factors of democratisation 23 Siloed Shared Distributed storage Homogeneous storage Documentation read+write accessNeed-to-know basis Code read+write access Closed code ownership CoordinatedOrganic
  • 24. www.scling.com Factors of democratisation 24 Siloed Shared Distributed storage Homogeneous storage Documentation read+write accessNeed-to-know basis Code read+write access Closed code ownership Coordinated data governanceLocal rituals CoordinatedOrganic
  • 25. www.scling.com Factors of democratisation 25 Siloed Shared Distributed storage Homogeneous storage Documentation read+write accessNeed-to-know basis Code read+write access Closed code ownership Coordinated data governanceLocal rituals Common glossary, semantics Tribal knowledge CoordinatedOrganic
  • 26. www.scling.com Factors of democratisation 26 Siloed Shared Distributed storage Homogeneous storage Documentation read+write accessNeed-to-know basis Code read+write access Closed code ownership Coordinated data governanceLocal rituals Common glossary, semantics Tribal knowledge Common data provenance Unclear data origin CoordinatedOrganic
  • 27. www.scling.com Factors of democratisation 27 Siloed Shared Distributed storage Homogeneous storage Documentation read+write accessNeed-to-know basis Code read+write access Closed code ownership Coordinated data governanceLocal rituals Common glossary, semantics Tribal knowledge Common DataOps procedures Lay-on-hands deployment Common data provenance Unclear data origin CoordinatedOrganic
  • 28. www.scling.com An e-shopping tale 28 1. Log in, search for product X ○ X + 100s of accessories, random order 2. Find X in product catalog ○ No link to web shop 3. Put in cart, delivery? ○ Ask for address, customer club number 4. … Full story: “Avoid artificial stupidity” blog post 1. Log in, search for product X ○ Popular items first 2. Find X in product catalog ○ Take me to shop 3. Put in cart, delivery? ○ I am logged in 4. ...
  • 29. www.scling.com ● Include minimal governance, security, privacy Data lake Transformation Cold store Document a clean architecture 29 Mutation Immutable, shareable
  • 30. ● Align team with use case ○ Zero budget ● Ingest only necessary data ● Key technical component: Workflow orchestrator (Luigi / Airflow) A lean start 30
  • 31. www.scling.com An MVP is minimal 31 Out of scope Minimal privacy - limiting access One use case In scope Minimal privacy Security One DB source One use caseData scala- bility High availa- bility Dura- bility Most privacy Self service Data quality Auto- mation Clusters Audita- bility Scalable BI Fill lake Real- time Lineage
  • 32. ● Remove complexity wherever possible ○ Unfamiliar tools may be less complex ● Pay attention to human and social factors Journey towards data value 32 “Five dysfunctions of a data engineering team” - Jesse Anderson ● Only database admins ● Set up for failure ● No one understands schema ● No veterans ● Too ambitious “Avoiding big data antipatterns” - Alex Holmes ● Big data tech for small data ● Point-to-point data integration ● Single tool for the job ● Excess volume or precision ● Lack of security