SlideShare a Scribd company logo
1 of 36
Building a Financial Data
Warehouse:
A lesson in empathy
Solmaz Shahalizadeh
Data Analysis Lead, Shopify
@solmaz_sh
@solmaz_sh
Data Team Lead @ Shopify
Statistics/ Machine Learning in Cancer Research
Analytics and development in Banking
what is this talk about?
1. How did our data journey start at Shopify?
2. Why we needed a massive change?
3. Change is difficult, how did we push forward?
4. After all of this, where are we now? What next?
5. Takeaways
Where did our data journey start?
What is a Data Warehouse?
What is a Data Warehouse?
What is a Data Warehouse?
ETL systems overview
1st generation data warehouse
ETL:
- E: xtract from Production MySQL and other sources using in-house ruby etl
- !T: very little (almost no) T in the ETL
- L: oad into a Vertica database
* a copy of “any” operational table in the warehouse
1st generation reporting
In-house reporting (SQL and coffeescript):
- Clean the data at the report level using SQL
- Define the metric in the report
- use D3 for visualizations
* JIT metric definition
KPI for an e-commerce SaaS company
Customers
GMV (Gross Merchandise Volume)
Revenue
KPI for an e-commerce SaaS company
Customers: by cohort, by monthly recurring revenue, by partnership
GMV: by payment gateway, by industry, by monthly recurring revenue
Revenue: by cohort, by subscription type, by partnership
shop details
billing information
partnership information
hard to connect silos
image link https://goo.gl/aOkbXU
Different business areas, different definitions
● What is the geographical location of a customer?
● What do we mean when we talk about cohorts?
● Did the customer churn?
● ...
Things that are called the same thing, do not mean the same thing
Difficult to talk
about data
image link https://goo.gl/VJwFYx
Exploratory analysis for all:
While analysts idea of question!= business idea of question:
1) write ad-hoc SQL query
2) assume analyst 100% understands the business process
3) assume the ad-hoc cleaning and conforming of values is reproducible
4) extract data and sending data dump
5) upon seeing the results, business user realizes this is not what they wanted
sad users & analysts
image link https://goo.gl/zdatv6
“The world as we have created it is a process of our thinking. It cannot be
changed without changing our thinking.”
- Albert Einstein
Dimensional modelling
Dimensional Modelling is a design technique for databases intended to support
end-user queries in a data warehouse. It is oriented around understandability and
performance.
- Wikipedia
Focus on a business process:
- What are the dimensions?
- What are the measures?
- What is the grain?
Focus on modelling a process
Not a specific source tables but a business process
e.g: processing a financial transaction in exchange for a product as part of GMV
Define dimensions, facts, grain
Business Process: processing financial transaction
Dimensions: payment gateway that processed the transaction, Industry of the
shop, monthly recurring revenue (subscription amount) of the shop
Measures: $ amount of transaction
Business Process: processing financial transaction
Dimensions: payment gateway that processed the transaction, Industry of the
shop, monthly recurring revenue (subscription amount) of the shop
Measures: $ amount of transaction
Remove anything without analytical value: name of the customer, exact
address of the customer, etc.
Define dimensions, facts, grain
image link https://goo.gl/b8jYVk
2nd generation data warehouse
E: in-house Ruby extracts to
T: Heavy transforms in
L: Load into Amazon Redshift
Building the ETL framework in Spark
2nd generation reporting
KPI for an e-commerce SaaS company
Customers: by cohort, by monthly recurring revenue, by partnership
Sales: by payment gateway, by industry, by monthly recurring revenue
Revenue: by cohort, by subscription type, by partnership
shop details
billing information
partnership information
A walk in the data garden
Insightful reports
Self-service analytics (Tableau/ ipython)
New Business Analyst role
Faster feature selection (ML)
image link https://goo.gl/MtVWcJ
Takeaways
- Separation of business process/data model from ETL implementation
- Dimensional modelling for business process/ data models
- Clear data definitions to enable self service analytics
- Reproducible data models for reliable (and faster) predictive analysis
Questions?
We are hiring!
http://shopify.com/careers
Thank you
@solmaz_sh

More Related Content

Similar to Building a financial data warehouse: A lesson in empathy

Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
work
 
Data Provisioning & Optimization
Data Provisioning & OptimizationData Provisioning & Optimization
Data Provisioning & Optimization
Ambareesh Kulkarni
 

Similar to Building a financial data warehouse: A lesson in empathy (20)

Overview of business intelligence
Overview of business intelligenceOverview of business intelligence
Overview of business intelligence
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Business Intelligence Challenges 2009
Business Intelligence Challenges 2009Business Intelligence Challenges 2009
Business Intelligence Challenges 2009
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasir
 
IT Ready - DW: 1st Day
IT Ready - DW: 1st Day IT Ready - DW: 1st Day
IT Ready - DW: 1st Day
 
eLuminous Technologies - Business Intelligence
eLuminous Technologies - Business IntelligenceeLuminous Technologies - Business Intelligence
eLuminous Technologies - Business Intelligence
 
Data Provisioning & Optimization
Data Provisioning & OptimizationData Provisioning & Optimization
Data Provisioning & Optimization
 
Project+team+1 slides (2)
Project+team+1 slides (2)Project+team+1 slides (2)
Project+team+1 slides (2)
 
Team project - Data visualization on Olist company data
Team project - Data visualization on Olist company dataTeam project - Data visualization on Olist company data
Team project - Data visualization on Olist company data
 
Msbi by quontra us
Msbi by quontra usMsbi by quontra us
Msbi by quontra us
 
CTP Data Warehouse
CTP Data WarehouseCTP Data Warehouse
CTP Data Warehouse
 
Smarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with AutomationSmarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with Automation
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAP
 
Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For EcommerceDeep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
 
Taming data lake - scalable metrics model
Taming data lake - scalable metrics modelTaming data lake - scalable metrics model
Taming data lake - scalable metrics model
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project report
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Business Intelligence Services | Business Intelligence Consulting
Business Intelligence Services | Business Intelligence ConsultingBusiness Intelligence Services | Business Intelligence Consulting
Business Intelligence Services | Business Intelligence Consulting
 
Building a Data-driven Marketplace
Building a Data-driven Marketplace Building a Data-driven Marketplace
Building a Data-driven Marketplace
 

Recently uploaded

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 

Recently uploaded (20)

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 

Building a financial data warehouse: A lesson in empathy

  • 1. Building a Financial Data Warehouse: A lesson in empathy Solmaz Shahalizadeh Data Analysis Lead, Shopify @solmaz_sh
  • 2. @solmaz_sh Data Team Lead @ Shopify Statistics/ Machine Learning in Cancer Research Analytics and development in Banking
  • 3. what is this talk about?
  • 4. 1. How did our data journey start at Shopify? 2. Why we needed a massive change? 3. Change is difficult, how did we push forward? 4. After all of this, where are we now? What next? 5. Takeaways
  • 5.
  • 6.
  • 7.
  • 8. Where did our data journey start?
  • 9. What is a Data Warehouse?
  • 10. What is a Data Warehouse?
  • 11. What is a Data Warehouse?
  • 13. 1st generation data warehouse ETL: - E: xtract from Production MySQL and other sources using in-house ruby etl - !T: very little (almost no) T in the ETL - L: oad into a Vertica database * a copy of “any” operational table in the warehouse
  • 14. 1st generation reporting In-house reporting (SQL and coffeescript): - Clean the data at the report level using SQL - Define the metric in the report - use D3 for visualizations * JIT metric definition
  • 15. KPI for an e-commerce SaaS company Customers GMV (Gross Merchandise Volume) Revenue
  • 16. KPI for an e-commerce SaaS company Customers: by cohort, by monthly recurring revenue, by partnership GMV: by payment gateway, by industry, by monthly recurring revenue Revenue: by cohort, by subscription type, by partnership shop details billing information partnership information
  • 17. hard to connect silos image link https://goo.gl/aOkbXU
  • 18. Different business areas, different definitions ● What is the geographical location of a customer? ● What do we mean when we talk about cohorts? ● Did the customer churn? ● ... Things that are called the same thing, do not mean the same thing
  • 19. Difficult to talk about data image link https://goo.gl/VJwFYx
  • 20. Exploratory analysis for all: While analysts idea of question!= business idea of question: 1) write ad-hoc SQL query 2) assume analyst 100% understands the business process 3) assume the ad-hoc cleaning and conforming of values is reproducible 4) extract data and sending data dump 5) upon seeing the results, business user realizes this is not what they wanted
  • 21. sad users & analysts image link https://goo.gl/zdatv6
  • 22. “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.” - Albert Einstein
  • 23. Dimensional modelling Dimensional Modelling is a design technique for databases intended to support end-user queries in a data warehouse. It is oriented around understandability and performance. - Wikipedia Focus on a business process: - What are the dimensions? - What are the measures? - What is the grain?
  • 24. Focus on modelling a process Not a specific source tables but a business process e.g: processing a financial transaction in exchange for a product as part of GMV
  • 25. Define dimensions, facts, grain Business Process: processing financial transaction Dimensions: payment gateway that processed the transaction, Industry of the shop, monthly recurring revenue (subscription amount) of the shop Measures: $ amount of transaction
  • 26. Business Process: processing financial transaction Dimensions: payment gateway that processed the transaction, Industry of the shop, monthly recurring revenue (subscription amount) of the shop Measures: $ amount of transaction Remove anything without analytical value: name of the customer, exact address of the customer, etc. Define dimensions, facts, grain
  • 28. 2nd generation data warehouse E: in-house Ruby extracts to T: Heavy transforms in L: Load into Amazon Redshift
  • 29. Building the ETL framework in Spark
  • 30.
  • 32. KPI for an e-commerce SaaS company Customers: by cohort, by monthly recurring revenue, by partnership Sales: by payment gateway, by industry, by monthly recurring revenue Revenue: by cohort, by subscription type, by partnership shop details billing information partnership information
  • 33. A walk in the data garden Insightful reports Self-service analytics (Tableau/ ipython) New Business Analyst role Faster feature selection (ML) image link https://goo.gl/MtVWcJ
  • 34. Takeaways - Separation of business process/data model from ETL implementation - Dimensional modelling for business process/ data models - Clear data definitions to enable self service analytics - Reproducible data models for reliable (and faster) predictive analysis

Editor's Notes

  1. are we growing? are we growing as fast as we think?
  2. fragile process not much T of the ETL it was supposed to be fast, but very very slow
  3. A paradigm that has been around since 1960s
  4. Re-write the text Common comments: I know all my source tables, I have always got all my data in excel and got my own insights, why do you make me think about questions? I know how I want to define customer for my product, why do I need to agree with C{*}O on this definition?