Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
The Curse of the
Data Lake Monster
Kiran Prakash and Lucy Chambers
We have a
problem
@lucyfedia
So what is a data lake?
● Democratisation of Data
● Centralized and Monolithic
● Domain Agnostic
● Structured and unstruct...
https://martinfowler.com/articles/data-monolith-to-mesh.html @kiran_p
Why do Data
Lakes Fail?
@kiran_p
Build it they will come!
● Seen primarily as an infrastructure problem
● Pinning down uses cases & value stream is hard
● ...
Centralised and Monolithic
https://martinfowler.com/articles/data-monolith-to-mesh.html @kiran_p
Functional Decomposition
@kiran_phttps://martinfowler.com/articles/data-monolith-to-mesh.html
Axis of change
@kiran_phttps://martinfowler.com/articles/data-monolith-to-mesh.html
@kiran_p
Data Swamps
Focus on initiatives which
align with business outcomes.
Structure teams around
business capabilities.
Product
Thinking
Se...
Product Thinking
For Data Projects
@lucyfedia
Project vs Product
Project Mode Product Mode
START Solution (often) defined at outset.
Problem identified at outset.
Solut...
Product teams have two jobs and two customers
● Deliver business capabilities
- External User
● Expose their domain’s data...
● Discoverable
● Addressable
● Trustworthy
● Self-describing
● Interoperable
● Secure
A data product is:
@lucyfedia
Data Swamps
“If a tree falls in a wood, and
no-one is around to hear it,
does it make a sound?”
- Some philosopher
@lucyfedia
“If someone puts data into
a data lake, and no-one
can find it, is it even there?”
- Me
@lucyfedia
Data Mesh
Architecture
Domain Driven
Design
Self Service
Platforms
@kiran_p
Distributed Pipelines
@kiran_phttps://martinfowler.com/articles/data-monolith-to-mesh.html
@kiran_p
Self service platforms for:
● Storage
● Data pipeline
● Discovery & Catalogue
● Access control
● Archiving
● Encr...
Data Mesh
@kiran_phttps://martinfowler.com/articles/data-monolith-to-mesh.html
Example:
a fictional insurance company
Reduce fraud by
5% per year
Identify
fraudulent
claims
Reduce vehicle damage
claims by 2% per year
Increase conversion
rat...
@lucyfedia
Fraud Detection
Customer Claims
Customer
Health
Customer
Vehicle
Claims
Health
Claims
Vehicle
Customer
House
Cl...
@lucyfedia
Fraud Detection
Customer Claims
Customer
Health
Customer
Vehicle
Claims
Health
Claims
Vehicle
Customer
House
Cl...
@lucyfedia
Fraud Detection
Customer Claims
Customer
Health
Customer
Vehicle
Claims
Health
Claims
Vehicle
Customer
House
Cl...
Not a technology problem
Becoming data-driven
is usually an
organisational problem
Work with cross functional
product team...
Kiran Prakash
@kiran_p
Thank you
Lucy Chambers
@lucyfedia
How to Move Beyond a
Monolithic Data Lake to
a Distributed Data ...
Próxima SlideShare
Cargando en…5
×

The Curse of the Data Lake Monster

87 visualizaciones

Publicado el

Artificial intelligence and machine learning are currently all the rage. Every organisation is trying to jump on this bandwagon and cash in on their data reserves. At ThoughtWorks, we’d agree that this tech has huge potential — but as with all things, realising value depends on understanding how best to use it.

Publicado en: Software
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

The Curse of the Data Lake Monster

  1. 1. The Curse of the Data Lake Monster Kiran Prakash and Lucy Chambers
  2. 2. We have a problem @lucyfedia
  3. 3. So what is a data lake? ● Democratisation of Data ● Centralized and Monolithic ● Domain Agnostic ● Structured and unstructured data @kiran_p
  4. 4. https://martinfowler.com/articles/data-monolith-to-mesh.html @kiran_p
  5. 5. Why do Data Lakes Fail? @kiran_p
  6. 6. Build it they will come! ● Seen primarily as an infrastructure problem ● Pinning down uses cases & value stream is hard ● Analysis paralysis & overengineering @kiran_p
  7. 7. Centralised and Monolithic https://martinfowler.com/articles/data-monolith-to-mesh.html @kiran_p
  8. 8. Functional Decomposition @kiran_phttps://martinfowler.com/articles/data-monolith-to-mesh.html
  9. 9. Axis of change @kiran_phttps://martinfowler.com/articles/data-monolith-to-mesh.html
  10. 10. @kiran_p Data Swamps
  11. 11. Focus on initiatives which align with business outcomes. Structure teams around business capabilities. Product Thinking Self service platform for storage, catalogue, computation, access rights and pipelines etc. Autonomous teams with clear bounded context building and running products independently. Platform Thinking Domain Driven Design The Data Mesh Paradigm @kiran_p
  12. 12. Product Thinking For Data Projects @lucyfedia
  13. 13. Project vs Product Project Mode Product Mode START Solution (often) defined at outset. Problem identified at outset. Solution developed iteratively and tested. STOP Team moves on when solution delivered. Team moves on when problem verifiably fixed. FOCUS Features delivered in a given time & budget. Progress made on key business goals (measured by metrics). HAS FIXED SCOPE? Usually. Almost never. @lucyfedia
  14. 14. Product teams have two jobs and two customers ● Deliver business capabilities - External User ● Expose their domain’s data for others to consume - (often) Internal User @lucyfedia
  15. 15. ● Discoverable ● Addressable ● Trustworthy ● Self-describing ● Interoperable ● Secure A data product is:
  16. 16. @lucyfedia Data Swamps
  17. 17. “If a tree falls in a wood, and no-one is around to hear it, does it make a sound?” - Some philosopher @lucyfedia
  18. 18. “If someone puts data into a data lake, and no-one can find it, is it even there?” - Me @lucyfedia
  19. 19. Data Mesh Architecture
  20. 20. Domain Driven Design Self Service Platforms @kiran_p
  21. 21. Distributed Pipelines @kiran_phttps://martinfowler.com/articles/data-monolith-to-mesh.html
  22. 22. @kiran_p Self service platforms for: ● Storage ● Data pipeline ● Discovery & Catalogue ● Access control ● Archiving ● Encryption
  23. 23. Data Mesh @kiran_phttps://martinfowler.com/articles/data-monolith-to-mesh.html
  24. 24. Example: a fictional insurance company
  25. 25. Reduce fraud by 5% per year Identify fraudulent claims Reduce vehicle damage claims by 2% per year Increase conversion rate by 2% Predict Weather Patterns Upselling Insurance Products The Use-Cases @lucyfedia
  26. 26. @lucyfedia Fraud Detection Customer Claims Customer Health Customer Vehicle Claims Health Claims Vehicle Customer House Claims House Lake Shore Marts Data Lake (for Raw Data)
  27. 27. @lucyfedia Fraud Detection Customer Claims Customer Health Customer Vehicle Claims Health Claims Vehicle Customer House Claims House Lake Shore Marts Data Lake (for Raw Data) Upselling Customer Products Products
  28. 28. @lucyfedia Fraud Detection Customer Claims Customer Health Customer Vehicle Claims Health Claims Vehicle Customer House Claims House Lake Shore Marts Data Lake (for Raw Data) Upselling Customer Products Alert Customer Weather Products Weather
  29. 29. Not a technology problem Becoming data-driven is usually an organisational problem Work with cross functional product teams and real use- cases to deliver business value. Build by autonomous cross functional teams using data platforms instead of centralized data lake Domain data is a product Distributed Data Mesh Key Takeaways @kiran_p & @lucyfedia
  30. 30. Kiran Prakash @kiran_p Thank you Lucy Chambers @lucyfedia How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh martinfowler.com/articles/ data-monolith-to-mesh.html The Curse of the Data Lake Monster thoughtworks.com/insights/ blog/curse-data-lake-monster

×