Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Evolution of Data at Nubank - Product.io Meetup 2019-01-29

435 visualizaciones

Publicado el

Nubank is using machine learning, analytics and data engineering to disrupt the financial industry in Brazil. With over 5 million customers, it's already the biggest fintech outside Asia.

In this presentation I go over the main learnings of the company in the data space since it was founded 5 years ago. How did the company look like on each of those years? What mistakes did we make? What lessons did we learn?

This deck was presented in the São Paulo Product.io Meetup

Publicado en: Datos y análisis

Evolution of Data at Nubank - Product.io Meetup 2019-01-29

  1. 1. Evolution of Data at Nubank 29/01/2019 André Tavares Product Manager
  2. 2. Nubank
  3. 3. • Biggest fintech outside Asia • 5 million+ credit card customers • 2.5 million+ NuContas • 1300 employees
  4. 4. • 60 squads • 200 microservices and 30 models in production • 40 Tb of data processed everyday • 550 DAUs on data tools
  5. 5. Team Mission
  6. 6. Provide reliable and efficient platform, services and stewardship for Nubank to make better decisions with data
  7. 7. 2013 2013
  8. 8. • Company started on May 2013 • 10 employees by the end of the year • Mostly engineers, no one directly working with data • No product yet 2013
  9. 9. No Product = No Data • Getting to product-market fit is priority #1 • You won’t even have that much data to work with until you get there • Early stage startups are not the right place to work as a Data Product Manager Learning 1
  10. 10. 2014
  11. 11. • First credit card transaction in April 2014 • Product launched for friends & family • Manual credit approval • From 10 to 35 employees, head of credit and first 4 analysts hired • 10.000 customers by the end of the year 2014
  12. 12. Credit is hard! • Takes a long time for credit decisions to be evaluated (in our case, several months) • An incorrect policy could cause the company to go bankrupt before anyone notices Learning 2
  13. 13. 2013 2015
  14. 14. • Product goes viral: from 10.000 customers to 400.000 in a single year! • Surge in number of customers requires very fast growth of customer service: from 35 to 250 employees • Business Analysts and Data Scientists are now 10 • Squad data science created 2015
  15. 15. • First policies built to predict how much customers would spend and how likely they are to pay back their cards 2015
  16. 16. Data itself is a product • Do we have all the data we need? Obtaining it is part of the problem • Is it complete? Correct? Of good quality? Do we need backfills? • Need to follow all regulations Learning 3
  17. 17. Failure: “We don’t need SQL” 2015
  18. 18. 2016
  19. 19. • Hit a million customers during the year • Finished the year with 400 employees • 30 BAs and DSs, • Squad DS is exploded, data people working from various teams • Some engineers start specializing on data pipelines 2016
  20. 20. Centralized BI doesn’t scale • A central team can be effective to establish standards and best practices, and to prioritize an overwhelming number of requests • As the company grows, you need to embed analytics into each team to keep agile Learning 4
  21. 21. • Model creation starts to become more industrialized • Automatizing key reports for central bank leads us to creating our ETL and our analytical environment 2016
  22. 22. 22 ETL • Extract: Data is extracted from the production environment and sent to the analytical environment • Transform: Data is refined into cleaner and easier to use datasets • Load: Datasets are loaded into databases that can be accessed by consumers
  23. 23. You need an ETL • High latency, high throughput • Horizontally scalable • High accessibility • Heterogeneous data • Pain on write • Unified, global Learning 5
  24. 24. 2013 2017
  25. 25. • Over 3 million customers • Launched our next two products: Rewards and NuConta • 700 employees • 50 BAs and DSs, • Squad data infra 2017
  26. 26. • Structuring our data warehouse • Dimensional modeling • Batch models running on the ETL • First BI tool: metabase 2017
  27. 27. First BI tool: Metabase • Open source, self-hosted • Allows querying our data warehouse (ETL results) • Go-to tool for writing simple queries and creating simple dashboards • Point and click interface empowers users that don’t know SQL
  28. 28. 2017 Failure: Contribution Margin Dataset
  29. 29. ETL Jobs • Anyone in the company can contribute ETL jobs by opening a PR in our monorepo
 • Teams are responsible for writing and maintaining their jobs • Jobs are written in scala (sparkSQL); some DSLs are provided • Use databricks to iterate on logic • Peer review to ensure quality and consistency • 100 contributors making 400+ contributions per month
  30. 30. Focus on the Platform Problem: Data team creating datasets (tables) for the entire company • Lack fo context • Hard to prioritize among various teams • Becoming a bottleneck Learning 6
  31. 31. Solution: Empower vertical teams to own dataset creation • Focus on tooling, training and support • Remove interdependencies Focus on the Platform Learning 6
  32. 32. 2018
  33. 33. • Over 5 million customers • Launched debit cards • 1200 employees • 90 BAs and DSs, • Squad data infra in Berlin office, squad data access in São Paulo office 2018
  34. 34. • Models starting to pop on several areas of the company 2018
  35. 35. Data Services Trainings: Weekly trainings on SQL, python or scala, new employee onboarding, new tool rollout Support: Dedicated slack support channels; community of users support each other Meetings: Forums for sharing data scientist and analyst work, monthly meetings to discuss state of Data Data Analysts: Function focused to improving data usage in the company (not SQL slaves!)
  36. 36. Invest on your people Learning 7 • Training employees is not only HR’s job • Proactive investment on training can avoid reactive support work • Sometimes the problem is behavioral, not technological
  37. 37. Failure: Moving users to a new BI tool too fast 2018
  38. 38. Building is not enough • Internal launches are also launches • You need training and support • Do the benefits of your mew internal product outweigh the switching costs? Learning 8
  39. 39. 2013 2019and beyond
  40. 40. • Future: dozens of millions of customers • Thousands of employees • Hundreds of analysts, dozens of data scientists • Growing data org 2019 and beyond
  41. 41. • Things we’ll work on: • New data protection law • Giving employees even more data ownership • Data Portal • New Data Warehouse • Infra refactors to better support new product and refactors 2019 and beyond
  42. 42. RECAP
  43. 43. No Product = No Data Credit is hard! Data itself is a product Centralized BI doesn’t scale You need an ETL Focus on the Platform Invest on your people Building is not enough
  44. 44. Interested in working with us? sou.nu/jobs-at-nubank

×