SlideShare una empresa de Scribd logo
1 de 33
Data Modeling Trends for
Analytics
How to model data for analytics in a modern world with a data lake and
Power BI
Ike Ellis, Microsoft MVP
General Manager – Data & AI Practice
Solliance
Ike Ellis, MVP
General Manager – Data & AI Practice
Solliance
@ike_ellis
www.ikeellis.com
youtube.com/IkeEllisOnTheMic
• Founder of San Diego
Power BI and PowerApps
UserGroup
• Founder of the San Diego
Software Architecture
Group
• MVP since 2011
• Author of Developing Azure
Solutions, Power BI MVP
Book
• Speaker at PASS Summit,
SQLBits, DevIntersections,
TechEd, Craft, Microsoft
Azure & AI Conference
Agenda
• Traditional EDAs
• Problems with past EDAs
• Problems with how the business views data
• How data lakes solve this
• Different types of solutions for different problems
• Where to put what data
• The joy of copying data
Reasons to build a data system for analytics
• Alert for things like fraud
• Reporting to wall street, auditors, compliance
• Reporting to upper management, board of directors
• Tactical reporting to other management
• Data analysis, machine learning, deep learning
• Data lineage
• Data governance
• Data brokerage between transactional applications
• Historical data, archiving data
Common Enterprise Data Architecture (EDA)
source
staging ods data
warehouse
etl
etl etletl
and/or
source
source
Star schemas
• group related dimensions
into dimension tables
• group related measures into
fact tables
• relate fact tables to
dimension tables by using
foreign keys
DimSalesPerson
SalesPersonKey
SalesPersonName
StoreName
StoreCity
StoreRegion
DimProduct
ProductKey
ProductName
ProductLine
SupplierName
DimCustomer
CustomerKey
CustomerName
City
Region
FactOrders
CustomerKey
SalesPersonKey
ProductKey
ShippingAgentKey
TimeKey
OrderNo
LineItemNo
Quantity
Revenue
Cost
Profit
DimDate
DateKey
Year
Quarter
Month
Day
DimShippingAgent
ShippingAgentKey
ShippingAgentName
Considerations for fact tables
• grain:
• use the lowest level of detail that relates to all
dimensions
• create multiple fact tables if multiple grains
are required
• keys:
• the primary key is usually a composite key
that includes dimension foreign keys
• measures:
• additive: Measures that can be aggregated
across all dimensions
• nonadditive: Measures that cannot be
aggregated
• semi-additive: Measures that can be
aggregated across some dimensions, but not
others
• degenerate dimensions:
• dimensions in the fact table
FactOrders
CustomerKey
SalesPersonKey
ProductKey
Timekey
OrderNo
LineItemNo
PaymentMethod
Quantity
Revenue
Cost
Profit
Margin
FactAccountTransaction
CustomerKey
BranchKey
AccountTypeKey
AccountNo
CreditDebitAmount
AccountBalance
Additive
Nonadditive
Semi-additive
Degenerate
Dimensions
Grain =
Order Line
Item
Reasons to make a star schema
• Easy to use and understand
• One version of the truth
• Easy to create aggregations by single passing over the data
• Much smaller table count (12 – 25 tables)
• Faster queries
• Good place to feed cubes (either azure analysis services or power bi
shared datasets)
• Supported by many business intelligence tools (excel pivot tables, power
bi, tableau, etc)
• What I always say:
• “you can either start out by making a star schema, or you can one day wish you
did. those are the two choices”
Common Enterprise Data Architecture (EDA)
source
staging ods data
warehouse
etl
etl etletl
and/or
source
source
Weakness #1: let’s add a single column
source
staging ods data
warehouse
etl
etl etletl
and/or
source
source
1
2
3
4
5
6
7
8
9
So many of you have decided to just go directly to
the source!
source power query
Mayhem
source power query
source power query
source power query
source power query
source power query
source power query
source power query
source power query
source power query
source power query
source power query
source power query
source power query
source power query
• spread business logic
• when something changes, you have to
change a ton of places
• inconsistent
• repeatedly cleaning the same data
and i’ve seen these data models
Weakness #2: sql server for staging/ods
• sql actually writes data multiple times on the insert
• one write for the log (traverse the log structure)
• and then writing for the disk subsystem
• one write to mdf
• and then writing for the disk subsystem
• writing to maintain indexing
• and then writing for the disk subsystem
• sql is strongly consistent
• the write isn’t successful until all the tables and indexes represent
the consistent write and all the triggers fire
• sql is expensive
• you get charged based on the amount of cores you use
Weakness #3: great big data warehouses are very difficult
to change and maintain
• all tables need to be consistent with one another
• historical data makes queries slow
• historical data makes dws hard to backup and restore and manage
• indexes take too long to maintain
• dev and other environments are too difficult to spin up
• shared database environments are too hard to use in development
environments
• keeping track of pii and sensitive information is very difficult
• creating automated tests is very difficult
Weakness #4: very difficult to move this to the cloud
• Cloud you pay for four things, all wrapped up differently
• CPU
• Memory
• Network
• Disk
• Most expensive
• CPU/compute!
• When you have a big data warehouse, you often need a lot
of memory
• but this is anchored to the CPU
• Big things in the cloud are a lot more expensive then a lot of
small things
• Make the big things cheap, and the small things
expensive so you have lot of knobs to turn
• alert for things like fraud
• reporting to wall street, auditors, compliance
• reporting to upper management, board of directors
• tactical reporting to other management
• data analysis, machine learning, deep learning
• data lineage
• data governance
• data brokerage between transactional applications
• historical data, archiving data
No separation of concerns in the architecture
• trying to make a star schema do everything
source
staging ods
data
warehouse
etletl etletl
and/or
source
source
source
Data latency
staging ods
data
warehouse
etletl etletlsource
data movement takes a long time
The Traditional Data Lake
Don’t be afraid of files
• files are fast!
• files are flexible
• new files can have a new data structure without changing the old data
structure
• files only write the data through one data structure
• files can be indexed
• file storage is cheap
• files can have high data integrity
• files can be unstructured or non-relational
The whole idea of an analytical system is that data duplication will speed up
aggregations and reporting. Files allow for cheap duplication, which allows us to
duplicate more data more frequently.
Parquet files
• Organizing by column allows for better compression,
• The space savings are very noticeable at the scale of a Hadoop cluster.
• I/O will be reduced as we can efficiently scan only a subset of the
columns while reading the data.
• Better compression also reduces the bandwidth required to read the input.
• Splittable
• Horizontally scalable
Basic physical idea of a data lake
data mart
etletl etletlsource
staging ods
data mart
data mart
etl
Example modern data architecture
WEB
APPLICATIONS
DASHBOARDS
AZURE DATABRICKS
DATA PROCESSING
SERVING
STORAGE
SQL DB /
SQL Server
SQL DW
AZURE
ANALYSIS
SERVICES
DATA LAKE STORE/
Azure Blob Storage
LONG TERM STORAGE
ORCHESTRATION
DATA
FACTORY
Mapping Dataflows
Pipelines
SSIS Packages
Triggered & Scheduled
Pipelines
ETL Logic
Calculations
AZURE
STORAGE
DIRECT
DOWNLOAD
etlsource
Alerting
WEB
APPLICATIONS
DASHBOARDS
AZURE DATABRICKS
/ Synapse
DATA PROCESSING
DATA LAKE STORE/
Azure Blob Storage
LONG TERM STORAGE
ETL Logic
Calculations
DIRECT
DOWNLOAD
etlsource
Data virtualization as a concept
• Data stays in it’s original place
• SQL Server
• Azure blob storage
• Azure Data Lake Storage Gen 2
• Metadata repository is over the data where it is
• Data can then be queried and joined in a single location
• Spark sql
• Polybase
• Hive
• Power bi
• SQL Server big data clusters
Demo data virtualization
How to organize a data lake
• Folders!
staging = raw = bronze
How to organize a data lake
• Folders!
ods = silver
Where do we put the star schema?
• Folders!
star schema = gold
data mart
or a relational database or Power BI Dataset
Where do we put aggregations?
We can create aggregation files
data mart
or aggregation tables or DAX in Power BI
Other uses of folders
• Temporal tables or folders
• Snapshotting
• Archiving
The modern data lake
data mart
Bronze Silver Gold (star schema)
Queries and scripts in Python or
SQL – JOINING in one fashion
Shared Workspace
Azure Synapse
Conclusion
• yes, we still make star schemas
• yes, we still use slowly-changing dimensions
• yes, we still use cubes
• we need to understand their limitations
• don’t be afraid of files and just in time analytics
• don’t conflate alerting and speed with consistency
• consistent reporting should be kept to 50 reports or less
• everything else should be de-coupled and flexible so they can change quickly
• we can create analytic systems without SQL Server (but with SQL)
• file-based (parquet)
• we still primarily use SQL as a language
• cheap
• massively parallel
• easily change-able
• distilled using a star schema and data virtualization

Más contenido relacionado

La actualidad más candente

How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?James Serra
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Michael Rys
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBaseJames Serra
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsThomas Sykes
 
Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsEduardo Castro
 
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"DataConf
 
Azure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkAzure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkIke Ellis
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseJames Serra
 
Relational data modeling trends for transactional applications
Relational data modeling trends for transactional applicationsRelational data modeling trends for transactional applications
Relational data modeling trends for transactional applicationsIke Ellis
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)James Serra
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?David P. Moore
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...James Serra
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure Antonios Chatzipavlis
 
Business Intelligence with SQL Server
Business Intelligence with SQL ServerBusiness Intelligence with SQL Server
Business Intelligence with SQL ServerPeter Gfader
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
RDX Insights Presentation - Microsoft Business Intelligence
RDX Insights Presentation - Microsoft Business IntelligenceRDX Insights Presentation - Microsoft Business Intelligence
RDX Insights Presentation - Microsoft Business IntelligenceChristopher Foot
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 

La actualidad más candente (20)

How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data Flows
 
Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analytics
 
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
 
Azure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkAzure Databricks is Easier Than You Think
Azure Databricks is Easier Than You Think
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Relational data modeling trends for transactional applications
Relational data modeling trends for transactional applicationsRelational data modeling trends for transactional applications
Relational data modeling trends for transactional applications
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Business Intelligence with SQL Server
Business Intelligence with SQL ServerBusiness Intelligence with SQL Server
Business Intelligence with SQL Server
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
RDX Insights Presentation - Microsoft Business Intelligence
RDX Insights Presentation - Microsoft Business IntelligenceRDX Insights Presentation - Microsoft Business Intelligence
RDX Insights Presentation - Microsoft Business Intelligence
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 

Similar a Data modeling trends for analytics

Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptxIke Ellis
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
Data modeling trends for Analytics
Data modeling trends for AnalyticsData modeling trends for Analytics
Data modeling trends for AnalyticsIke Ellis
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Databricks
 
Taming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsTaming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsKellyn Pot'Vin-Gorman
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server DatabasesColdFusionConference
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Michael Rys
 
ITI015En-The evolution of databases (I)
ITI015En-The evolution of databases (I)ITI015En-The evolution of databases (I)
ITI015En-The evolution of databases (I)Huibert Aalbers
 
Ds03 data analysis
Ds03   data analysisDs03   data analysis
Ds03 data analysisDotNetCampus
 
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_AnalyticsPASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_AnalyticsDustin Vannoy
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseEric Bragas
 

Similar a Data modeling trends for analytics (20)

Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Data modeling trends for Analytics
Data modeling trends for AnalyticsData modeling trends for Analytics
Data modeling trends for Analytics
 
Database Technologies
Database TechnologiesDatabase Technologies
Database Technologies
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...
 
Taming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsTaming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI Options
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server Databases
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 
ITI015En-The evolution of databases (I)
ITI015En-The evolution of databases (I)ITI015En-The evolution of databases (I)
ITI015En-The evolution of databases (I)
 
Ds03 data analysis
Ds03   data analysisDs03   data analysis
Ds03 data analysis
 
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_AnalyticsPASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
 

Más de Ike Ellis

Storytelling with Data with Power BI
Storytelling with Data with Power BIStorytelling with Data with Power BI
Storytelling with Data with Power BIIke Ellis
 
Storytelling with Data with Power BI.pptx
Storytelling with Data with Power BI.pptxStorytelling with Data with Power BI.pptx
Storytelling with Data with Power BI.pptxIke Ellis
 
Power bi premium
Power bi premiumPower bi premium
Power bi premiumIke Ellis
 
Move a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudMove a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudIke Ellis
 
Pass 2018 introduction to dax
Pass 2018 introduction to daxPass 2018 introduction to dax
Pass 2018 introduction to daxIke Ellis
 
Pass the Power BI Exam
Pass the Power BI ExamPass the Power BI Exam
Pass the Power BI ExamIke Ellis
 
Slides for PUG 2018 - DAX CALCULATE
Slides for PUG 2018 - DAX CALCULATESlides for PUG 2018 - DAX CALCULATE
Slides for PUG 2018 - DAX CALCULATEIke Ellis
 
Introduction to DAX
Introduction to DAXIntroduction to DAX
Introduction to DAXIke Ellis
 
60 reporting tips in 60 minutes - SQLBits 2018
60 reporting tips in 60 minutes - SQLBits 201860 reporting tips in 60 minutes - SQLBits 2018
60 reporting tips in 60 minutes - SQLBits 2018Ike Ellis
 
14 Habits of Great SQL Developers
14 Habits of Great SQL Developers14 Habits of Great SQL Developers
14 Habits of Great SQL DevelopersIke Ellis
 
14 Habits of Great SQL Developers
14 Habits of Great SQL Developers14 Habits of Great SQL Developers
14 Habits of Great SQL DevelopersIke Ellis
 
Dive Into Azure Data Lake - PASS 2017
Dive Into Azure Data Lake - PASS 2017Dive Into Azure Data Lake - PASS 2017
Dive Into Azure Data Lake - PASS 2017Ike Ellis
 
A lap around microsofts business intelligence platform
A lap around microsofts business intelligence platformA lap around microsofts business intelligence platform
A lap around microsofts business intelligence platformIke Ellis
 
Survey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data LandscapeSurvey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data LandscapeIke Ellis
 
11 Goals of High Functioning SQL Developers
11 Goals of High Functioning SQL Developers11 Goals of High Functioning SQL Developers
11 Goals of High Functioning SQL DevelopersIke Ellis
 
SQL PASS BAC - 60 reporting tips in 60 minutes
SQL PASS BAC - 60 reporting tips in 60 minutesSQL PASS BAC - 60 reporting tips in 60 minutes
SQL PASS BAC - 60 reporting tips in 60 minutesIke Ellis
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBIke Ellis
 
Azure DocumentDB 101
Azure DocumentDB 101Azure DocumentDB 101
Azure DocumentDB 101Ike Ellis
 
Tips & Tricks SQL in the City Seattle 2014
Tips & Tricks SQL in the City Seattle 2014Tips & Tricks SQL in the City Seattle 2014
Tips & Tricks SQL in the City Seattle 2014Ike Ellis
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerIke Ellis
 

Más de Ike Ellis (20)

Storytelling with Data with Power BI
Storytelling with Data with Power BIStorytelling with Data with Power BI
Storytelling with Data with Power BI
 
Storytelling with Data with Power BI.pptx
Storytelling with Data with Power BI.pptxStorytelling with Data with Power BI.pptx
Storytelling with Data with Power BI.pptx
 
Power bi premium
Power bi premiumPower bi premium
Power bi premium
 
Move a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudMove a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloud
 
Pass 2018 introduction to dax
Pass 2018 introduction to daxPass 2018 introduction to dax
Pass 2018 introduction to dax
 
Pass the Power BI Exam
Pass the Power BI ExamPass the Power BI Exam
Pass the Power BI Exam
 
Slides for PUG 2018 - DAX CALCULATE
Slides for PUG 2018 - DAX CALCULATESlides for PUG 2018 - DAX CALCULATE
Slides for PUG 2018 - DAX CALCULATE
 
Introduction to DAX
Introduction to DAXIntroduction to DAX
Introduction to DAX
 
60 reporting tips in 60 minutes - SQLBits 2018
60 reporting tips in 60 minutes - SQLBits 201860 reporting tips in 60 minutes - SQLBits 2018
60 reporting tips in 60 minutes - SQLBits 2018
 
14 Habits of Great SQL Developers
14 Habits of Great SQL Developers14 Habits of Great SQL Developers
14 Habits of Great SQL Developers
 
14 Habits of Great SQL Developers
14 Habits of Great SQL Developers14 Habits of Great SQL Developers
14 Habits of Great SQL Developers
 
Dive Into Azure Data Lake - PASS 2017
Dive Into Azure Data Lake - PASS 2017Dive Into Azure Data Lake - PASS 2017
Dive Into Azure Data Lake - PASS 2017
 
A lap around microsofts business intelligence platform
A lap around microsofts business intelligence platformA lap around microsofts business intelligence platform
A lap around microsofts business intelligence platform
 
Survey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data LandscapeSurvey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data Landscape
 
11 Goals of High Functioning SQL Developers
11 Goals of High Functioning SQL Developers11 Goals of High Functioning SQL Developers
11 Goals of High Functioning SQL Developers
 
SQL PASS BAC - 60 reporting tips in 60 minutes
SQL PASS BAC - 60 reporting tips in 60 minutesSQL PASS BAC - 60 reporting tips in 60 minutes
SQL PASS BAC - 60 reporting tips in 60 minutes
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Azure DocumentDB 101
Azure DocumentDB 101Azure DocumentDB 101
Azure DocumentDB 101
 
Tips & Tricks SQL in the City Seattle 2014
Tips & Tricks SQL in the City Seattle 2014Tips & Tricks SQL in the City Seattle 2014
Tips & Tricks SQL in the City Seattle 2014
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute Beginner
 

Último

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Último (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Data modeling trends for analytics

  • 1. Data Modeling Trends for Analytics How to model data for analytics in a modern world with a data lake and Power BI Ike Ellis, Microsoft MVP General Manager – Data & AI Practice Solliance
  • 2. Ike Ellis, MVP General Manager – Data & AI Practice Solliance @ike_ellis www.ikeellis.com youtube.com/IkeEllisOnTheMic • Founder of San Diego Power BI and PowerApps UserGroup • Founder of the San Diego Software Architecture Group • MVP since 2011 • Author of Developing Azure Solutions, Power BI MVP Book • Speaker at PASS Summit, SQLBits, DevIntersections, TechEd, Craft, Microsoft Azure & AI Conference
  • 3. Agenda • Traditional EDAs • Problems with past EDAs • Problems with how the business views data • How data lakes solve this • Different types of solutions for different problems • Where to put what data • The joy of copying data
  • 4. Reasons to build a data system for analytics • Alert for things like fraud • Reporting to wall street, auditors, compliance • Reporting to upper management, board of directors • Tactical reporting to other management • Data analysis, machine learning, deep learning • Data lineage • Data governance • Data brokerage between transactional applications • Historical data, archiving data
  • 5. Common Enterprise Data Architecture (EDA) source staging ods data warehouse etl etl etletl and/or source source
  • 6. Star schemas • group related dimensions into dimension tables • group related measures into fact tables • relate fact tables to dimension tables by using foreign keys DimSalesPerson SalesPersonKey SalesPersonName StoreName StoreCity StoreRegion DimProduct ProductKey ProductName ProductLine SupplierName DimCustomer CustomerKey CustomerName City Region FactOrders CustomerKey SalesPersonKey ProductKey ShippingAgentKey TimeKey OrderNo LineItemNo Quantity Revenue Cost Profit DimDate DateKey Year Quarter Month Day DimShippingAgent ShippingAgentKey ShippingAgentName
  • 7. Considerations for fact tables • grain: • use the lowest level of detail that relates to all dimensions • create multiple fact tables if multiple grains are required • keys: • the primary key is usually a composite key that includes dimension foreign keys • measures: • additive: Measures that can be aggregated across all dimensions • nonadditive: Measures that cannot be aggregated • semi-additive: Measures that can be aggregated across some dimensions, but not others • degenerate dimensions: • dimensions in the fact table FactOrders CustomerKey SalesPersonKey ProductKey Timekey OrderNo LineItemNo PaymentMethod Quantity Revenue Cost Profit Margin FactAccountTransaction CustomerKey BranchKey AccountTypeKey AccountNo CreditDebitAmount AccountBalance Additive Nonadditive Semi-additive Degenerate Dimensions Grain = Order Line Item
  • 8. Reasons to make a star schema • Easy to use and understand • One version of the truth • Easy to create aggregations by single passing over the data • Much smaller table count (12 – 25 tables) • Faster queries • Good place to feed cubes (either azure analysis services or power bi shared datasets) • Supported by many business intelligence tools (excel pivot tables, power bi, tableau, etc) • What I always say: • “you can either start out by making a star schema, or you can one day wish you did. those are the two choices”
  • 9. Common Enterprise Data Architecture (EDA) source staging ods data warehouse etl etl etletl and/or source source
  • 10. Weakness #1: let’s add a single column source staging ods data warehouse etl etl etletl and/or source source 1 2 3 4 5 6 7 8 9
  • 11. So many of you have decided to just go directly to the source! source power query
  • 12. Mayhem source power query source power query source power query source power query source power query source power query source power query source power query source power query source power query source power query source power query source power query source power query • spread business logic • when something changes, you have to change a ton of places • inconsistent • repeatedly cleaning the same data
  • 13. and i’ve seen these data models
  • 14. Weakness #2: sql server for staging/ods • sql actually writes data multiple times on the insert • one write for the log (traverse the log structure) • and then writing for the disk subsystem • one write to mdf • and then writing for the disk subsystem • writing to maintain indexing • and then writing for the disk subsystem • sql is strongly consistent • the write isn’t successful until all the tables and indexes represent the consistent write and all the triggers fire • sql is expensive • you get charged based on the amount of cores you use
  • 15. Weakness #3: great big data warehouses are very difficult to change and maintain • all tables need to be consistent with one another • historical data makes queries slow • historical data makes dws hard to backup and restore and manage • indexes take too long to maintain • dev and other environments are too difficult to spin up • shared database environments are too hard to use in development environments • keeping track of pii and sensitive information is very difficult • creating automated tests is very difficult
  • 16. Weakness #4: very difficult to move this to the cloud • Cloud you pay for four things, all wrapped up differently • CPU • Memory • Network • Disk • Most expensive • CPU/compute! • When you have a big data warehouse, you often need a lot of memory • but this is anchored to the CPU • Big things in the cloud are a lot more expensive then a lot of small things • Make the big things cheap, and the small things expensive so you have lot of knobs to turn
  • 17. • alert for things like fraud • reporting to wall street, auditors, compliance • reporting to upper management, board of directors • tactical reporting to other management • data analysis, machine learning, deep learning • data lineage • data governance • data brokerage between transactional applications • historical data, archiving data No separation of concerns in the architecture • trying to make a star schema do everything source staging ods data warehouse etletl etletl and/or source source source
  • 18. Data latency staging ods data warehouse etletl etletlsource data movement takes a long time
  • 20. Don’t be afraid of files • files are fast! • files are flexible • new files can have a new data structure without changing the old data structure • files only write the data through one data structure • files can be indexed • file storage is cheap • files can have high data integrity • files can be unstructured or non-relational The whole idea of an analytical system is that data duplication will speed up aggregations and reporting. Files allow for cheap duplication, which allows us to duplicate more data more frequently.
  • 21. Parquet files • Organizing by column allows for better compression, • The space savings are very noticeable at the scale of a Hadoop cluster. • I/O will be reduced as we can efficiently scan only a subset of the columns while reading the data. • Better compression also reduces the bandwidth required to read the input. • Splittable • Horizontally scalable
  • 22. Basic physical idea of a data lake data mart etletl etletlsource staging ods data mart data mart etl
  • 23. Example modern data architecture WEB APPLICATIONS DASHBOARDS AZURE DATABRICKS DATA PROCESSING SERVING STORAGE SQL DB / SQL Server SQL DW AZURE ANALYSIS SERVICES DATA LAKE STORE/ Azure Blob Storage LONG TERM STORAGE ORCHESTRATION DATA FACTORY Mapping Dataflows Pipelines SSIS Packages Triggered & Scheduled Pipelines ETL Logic Calculations AZURE STORAGE DIRECT DOWNLOAD etlsource
  • 24. Alerting WEB APPLICATIONS DASHBOARDS AZURE DATABRICKS / Synapse DATA PROCESSING DATA LAKE STORE/ Azure Blob Storage LONG TERM STORAGE ETL Logic Calculations DIRECT DOWNLOAD etlsource
  • 25. Data virtualization as a concept • Data stays in it’s original place • SQL Server • Azure blob storage • Azure Data Lake Storage Gen 2 • Metadata repository is over the data where it is • Data can then be queried and joined in a single location • Spark sql • Polybase • Hive • Power bi • SQL Server big data clusters
  • 27. How to organize a data lake • Folders! staging = raw = bronze
  • 28. How to organize a data lake • Folders! ods = silver
  • 29. Where do we put the star schema? • Folders! star schema = gold data mart or a relational database or Power BI Dataset
  • 30. Where do we put aggregations? We can create aggregation files data mart or aggregation tables or DAX in Power BI
  • 31. Other uses of folders • Temporal tables or folders • Snapshotting • Archiving
  • 32. The modern data lake data mart Bronze Silver Gold (star schema) Queries and scripts in Python or SQL – JOINING in one fashion Shared Workspace Azure Synapse
  • 33. Conclusion • yes, we still make star schemas • yes, we still use slowly-changing dimensions • yes, we still use cubes • we need to understand their limitations • don’t be afraid of files and just in time analytics • don’t conflate alerting and speed with consistency • consistent reporting should be kept to 50 reports or less • everything else should be de-coupled and flexible so they can change quickly • we can create analytic systems without SQL Server (but with SQL) • file-based (parquet) • we still primarily use SQL as a language • cheap • massively parallel • easily change-able • distilled using a star schema and data virtualization