Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Data Engineering Efficiency @ Netflix - Strata 2017

Ad

Working smarter,
not harder
DATA ENGINEERING
EFFICIENCY @
MICHELLE UFFORD
MANAGER, CORE INNOVATION
DATA ENGINEERING & ANAL...

Ad

A brief stroll down memory lane
Part one.

Ad

year 1 year 2 year 3
EngineerTime Michelle’s Wildly Subjective & Completely Unscientific
Observations of Engineering Effor...

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Cargando en…3
×

Eche un vistazo a continuación

1 de 45 Anuncio
1 de 45 Anuncio

Data Engineering Efficiency @ Netflix - Strata 2017

Slides from Strata 2017 talk, "Data Engineering Efficiency @ Netflix."

Michelle Ufford explains how Netflix’s data engineering and analytics team is using data to find common patterns among the chaos that enable the company to automate repetitive and time-consuming tasks and discover ways to improve data quality, reduce costs, and quickly identify and respond to issues. Michelle provides a quick overview of Netflix’s analytics environment before diving into some of the major challenges facing the company’s data engineers. Along the way, Michelle shares how Netflix is building more intelligent data platform services and tools to improve data quality, automate data maintenance, alert on job optimization opportunities, and more.

Slides from Strata 2017 talk, "Data Engineering Efficiency @ Netflix."

Michelle Ufford explains how Netflix’s data engineering and analytics team is using data to find common patterns among the chaos that enable the company to automate repetitive and time-consuming tasks and discover ways to improve data quality, reduce costs, and quickly identify and respond to issues. Michelle provides a quick overview of Netflix’s analytics environment before diving into some of the major challenges facing the company’s data engineers. Along the way, Michelle shares how Netflix is building more intelligent data platform services and tools to improve data quality, automate data maintenance, alert on job optimization opportunities, and more.

Más Contenido Relacionado

Data Engineering Efficiency @ Netflix - Strata 2017

  1. 1. Working smarter, not harder DATA ENGINEERING EFFICIENCY @ MICHELLE UFFORD MANAGER, CORE INNOVATION DATA ENGINEERING & ANALYTICS STRATA NYC, FALL 2017
  2. 2. A brief stroll down memory lane Part one.
  3. 3. year 1 year 2 year 3 EngineerTime Michelle’s Wildly Subjective & Completely Unscientific Observations of Engineering Efforts over Time new development support & maintenance everything else Circa 2007
  4. 4. year 1 year 2 year 3 EngineerTime Michelle’s Wildly Subjective & Completely Unscientific Observations of Engineering Efforts over Time new development support & maintenance everything else Circa 2007 ~10% ~75% when Michelle jumps ship 
  5. 5. ● archiving old data or unused tables ● fixing & reflowing bad data ● documenting lineage & relationships ● etc. etc. etc. Support & Maintenance. ● troubleshooting failed jobs ● investigating data quality issues ● migrating to newer releases ● optimizing job performance
  6. 6. There must be a better way.
  7. 7. A peak at data engineering at Netflix Part two.
  8. 8. data acces s 20170914 Amazon Redshift data processin g fast storage data viz events data RAW data storage DW RPT METACA T data catalo g api job execution data ingestion harlotte
  9. 9. data scientists business analysts data engineers data viz engineers quantitative analysts product managers analytics engineers software engineers ML scientists Planning & Analysis Data Engineering & Analytics Science & Algorithms algorithm engineers research scientists Algorithms Engineering executives executives Business Engineering
  10. 10. 20162015201420132012
  11. 11. 2017
  12. 12. data scientists business analysts data engineers data viz engineers quantitative analysts product managers analytics engineers software engineers ML scientists Planning & Analysis Data Engineering & Analytics Science & Algorithms algorithm engineers research scientists Algorithms Engineering executives executives Business Engineering
  13. 13. How can we do more?
  14. 14. Driving data engineering efficiency Part three.
  15. 15. Simplify & Automate
  16. 16. Simplify & Automate: Data Maintenance. ● data archival ● unused data assets ● table metadata ● data lineage ● etc.
  17. 17. 20170612 Quinto evaluations ● intelligent recommendations ● multiple tiers of coverage ● configurable rules Jumpstarter. Python Library
  18. 18. Simplify & Automate: Data Quality. ● identify appropriate level of quality coverage for a given table based upon usage data ● provide initial configuration of quality thresholds based upon table behavior patterns ● simplify integration of quality checks into data pipelines ● etc.
  19. 19. 20170612 Metacat Federated Metastore s3://…/dw/fact_table_f/utc_date=20170101/batchid=1483229855 … s3://…/dw/fact_table_f/utc_date=20170611/batchid=1497226702 s3://…/dw/fact_table_f/utc_date=20170612/batchid=1497312541 dw.fact_table_f
  20. 20. 20170612 Metacat Federated Metastore s3://…/dw/fact_table_f/utc_date=20170101/batchid=1483229855 … s3://…/dw/fact_table_f/utc_date=20170611/batchid=1497226702 s3://…/dw/fact_table_f/utc_date=20170612/batchid=1497312541 dw.fact_table_f utc_date=20170101 utc_date=20170611 utc_date=20170612 …
  21. 21. 20170612 Metacat Federated Metastore utc_date=20170101
  22. 22. 20170612 Metacat Federated Metastore utc_date=20170101
  23. 23. 20170612 Data Quality ● intelligent recommendations ● multiple tiers of coverage ● configurable rules Jumpstarter. Python Library
  24. 24. 20170612 s3://…/utc_date=20170101/batchid=1483229855 … s3://…/utc_date=20170611/batchid=1497226702 dw.my_table_f audit.my_table_f_1497312000 WAPStage-0: Prep ETL Pattern
  25. 25. 20170612 s3://…/utc_date=20170101/batchid=1483229855 … s3://…/utc_date=20170611/batchid=1497226702 s3://…/utc_date=20170612/batchid=1497312541 WAPStage-1: Write audit.my_table_f_1497312000dw.my_table_f ETL Pattern
  26. 26. 20170612 s3://…/utc_date=20170101/batchid=1483229855 … s3://…/utc_date=20170611/batchid=1497226702 s3://…/utc_date=20170612/batchid=1497312541 WAPStage-2: Audit audit.my_table_f_1497312000dw.my_table_f ETL Pattern
  27. 27. 20170612 s3://…/utc_date=20170101/batchid=1483229855 … s3://…/utc_date=20170611/batchid=1497226702 WAPStage-3: Publish audit.my_table_f_1497312000dw.my_table_f s3://…/utc_date=20170612/batchid=1497312541 ETL Pattern
  28. 28. Simplify & Automate: Data Insight. ● provide easy visibility into current state & changes over time ● provide prescriptive guidance on impactful optimization opportunities ● notify users of unexpected conditions which may indicate problems ● etc.
  29. 29. Data Engineering @ Netflix. Support & maintenance: 35% New development & functionality: 45%
  30. 30. Good. But we can do better.
  31. 31. A sneak peak at what we’re working on now Part four.
  32. 32. year 1 year 2 year 3 EngineerTime Michelle’s Wildly Subjective & Currently Unproven Theory of the Impact of ‘Smarter’ Solutions new development support & maintenance everything else Circa 2017 ~20% ??? ~60% ???
  33. 33. Faster & Smarter: Data Maintenance. ● multi-node object deprecation ● field-level deprecation ● beyond pattern matching ● etc.
  34. 34. Faster & Smarter: Data Quality. ● additional Metacat statistics ● robust anomaly detection ● aggressively experiment with configurations ● etc.
  35. 35. Faster & Smarter: Data Insight. ● … next year’s Strata talk? 
  36. 36. MICHELLE UFFORD mufford@netflix.com twitter.com/MichelleUfford DATA techblog.netflix.com medium.com/netflix-techblog twitter.com/NetflixData tinyurl.com/NetflixData Thank you! WE’RE HIRING! jobs.netflix.com

Notas del editor

  • https://conferences.oreilly.com/strata/strata-ny/user/proposal/status/59963
    Working smarter, not harder: Driving data engineering efficiency at Netflix

    Complex data structures. Incomplete data. Upstream failures. Cryptic error messages. Rapidly evolving technology. No question, data engineering can be hard. But what if we used the wealth of data and experience at our disposal to make data engineering a little easier?

    Michelle Ufford explains how Netflix’s Data Engineering and Analytics team is using data to find common patterns among the chaos that enable the company to automate repetitive and time-consuming tasks and discover ways to improve data quality, reduce costs, or quickly identify and respond to issues. Michelle provides a quick overview of Netflix’s analytics environment before diving into some of the major challenges facing the company’s data engineers. Along the way, Michelle shares how Netflix is building more intelligent data platform services and tools to improve data quality, automate data maintenance, alert on job optimization opportunities, and more.

×