SlideShare una empresa de Scribd logo
1 de 36
Data Warehousing Trends,
Best Practices, and Future
Outlook
James Serra
Data Platform Architecture Lead
EY
JamesSerra3@gmail.com
Blog: JamesSerra.com
About Me
 EY, Data Platform Architecture Lead
 Was previously a Data & AI Architect at Microsoft for seven years
 In IT for 35 years, worked on many BI and DW projects
 Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW/APS developer
 Been perm employee, contractor, consultant, business owner
 Presenter at PASS Summit, SQLBits, Enterprise Data World conference, Big Data Conference
Europe, SQL Saturdays
 Blog at JamesSerra.com
 Former SQL Server MVP
 Author of book “Reporting with Microsoft SQL Server 2012”
Agenda
 Why Data Warehouse
 Data Warehouse vs Data Mart
 Kimball vs Inmon
 Data Vault
 Data Lake
 Why Data Warehouses fail
 How Data Warehouses succeed
 Data Warehouse concepts
 Modern Data Warehouse
 Data Fabric
 Data Lakehouse
 Data Mesh
I tried to build a data warehouse on my own…
And ended up passed-out drunk in a Denny’s
parking lot
Let’s prevent that from happening…
What a Data Warehouse is not
• A data warehouse is not a copy of a source database with the name prefixed with “DW”
• It is not a copy of multiple tables (i.e. customer) from various sources systems unioned
together in a view
• It is not a dumping ground for tables from various sources with not much design put into it
What is a Data Warehouse and why use one?
A data warehouse is where you store data from multiple data sources to be used for historical and
trend analysis reporting. It acts as a central repository for many subject areas and contains the
"single version of truth". It is NOT to be used for OLTP applications.
Reasons for a data warehouse:
 Reduce stress on production system
 Optimized for read access, sequential disk scans
 Integrate many sources of data
 Keep historical records (no need to save hardcopy reports)
 Restructure/rename tables and fields, model data
 Protect against source system upgrades
 Use Master Data Management, including hierarchies
 No IT involvement needed for users to create reports
 Improve data quality and plugs holes in source systems
 One version of the truth
 Easy to create BI solutions on top of it (i.e. Azure Analysis Services Cubes)
Why use a Data Warehouse?
Legacy applications + databases = chaos
Production
Control
MRP
Inventory
Control
Parts
Management
Logistics
Shipping
Raw Goods
Order Control
Purchasing
Marketing
Finance
Sales
Accounting
Management
Reporting
Engineering
Actuarial
Human
Resources
Continuity
Consolidation
Control
Compliance
Collaboration
Enterprise data warehouse = order
Single version
of the truth
Enterprise Data
Warehouse
Every question = decision
Two purposes of data warehouse: 1) save time building reports; 2) slice in dice in ways you could not do before
Enterprise Data Maturity Stages
Structured data is
transacted and
locally managed.
Data used
reactively
STAGE 2:
Informative
STAGE 1:
Reactive
Structured data is
managed and
analyzed centrally
and informs the
business
Data capture is
comprehensive and
scalable and leads
business decisions
based on advanced
analytics
STAGE 4:
Transformative
STAGE 3:
Predictive Data transforms
business to drive
desired outcomes.
Real-time
intelligence
Rear-view
mirror
Any data, any
source, anywhere at
scale
Data Warehouse vs Data Mart
 Data Warehouse: A single organizational repository of enterprise wide data
across many or all subject areas
 Holds multiple subject areas
 Holds very detailed information
 Works to integrate all data sources
 Feeds data mart
 Data Mart: Subset of the data warehouse that is usually oriented to specific
subject (finance, marketing, sales)
• The logical combination of all the data marts is a data warehouse
In short, a data warehouse as contains many subject areas, and a data mart
contains just one of those subject areas
Kimball and Inmon Methodologies
Two approaches for building data warehouses
Kimball and Inmon Myths
 Myth: Kimball is a bottom-up approach without enterprise focus
 Really top-down: BUS matrix (choose business processes/data sources), conformed
dimensions, MDM
 Myth: Inmon requires a ton of up-front design that takes a long time
 Inmon says to build DW iteratively, not big bang approach (p. 91 BDW, p. 21 Imhoff)
 Myth: Star schema data marts are not allowed in Inmon’s model
 Inmon says they are good for direct end-user access of data (p. 365 BDW), good for data
marts (p. 12 TTA)
 Myth: Very few companies use the Inmon method
 Survey had 39% Inmon vs 26% Kimball. Many have a EDW
 Myth: The Kimball and Inmon architectures are incompatible
 They can work together to provide a better solution
Kimball and Inmon Methodologies
 Relational (Inmon) vs Dimensional (Kimball)
 Relational Modeling:
 Entity-Relationship (ER) model
 Normalization rules
 Many tables using joins
 History tables, natural keys
 Good for indirect end-user access of data
 Dimensional Modeling:
 Facts and dimensions, star schema
 Less tables but have duplicate data (de-normalized)
 Easier for user to understand (but strange for IT people used to relational)
 Slowly changing dimensions, surrogate keys
 Good for direct end-user access of data
Relational Model vs Dimensional Model
Relational Model Dimensional Model
If you are a business user, which model is easier to use?
Kimball and Inmon Methodologies
• Kimball:
• Logical data warehouse (BUS), made up of subject areas (data marts)
• Business driven, users have active participation
• Decentralized data marts (not required to be a separate physical data store)
• Independent dimensional data marts optimized for reporting/analytics
• Integrated via Conformed Dimensions (provides consistency across data sources)
• 2-tier (data mart, cube), less ETL, no data duplication
• Inmon:
• Enterprise data model (CIF) that is a enterprise data warehouse (EDW)
• IT Driven, users have passive participation
• Centralized atomic normalized tables (off limit to end users)
• Later create dependent data marts that are separate physical subsets of data and can
be used for multiple purposes
• Integration via enterprise data model
• 3-tier (data warehouse, data mart, cube), duplication of data
Kimball Model
Why staging: Limit source contention (ELT), Recoverability, Backup, Auditing
Inmon Model
Reasons to add a Enterprise Data Warehouse
 Single version of the truth
 May make building dimensions easier using lightly denormalized tables in EDW instead of
going directly from the OLTP source
 Normalized EDW results in enterprise-wide consistency which makes it easier to spawn-off
the data marts at the expense of duplicated data
 Less daily ETL refresh and reconciliation if have many sources and many data marts in
multiple databases
 One place to control data (no duplication of effort and data)
 Reason not to: If have a few sources that need reporting quickly
Which model to use?
 Models are not that different, having become similar over the years, and can compliment
each other
 Boils down to Inmon creates a normalized DW before creating a dimensional data mart and
Kimball skips the normalized DW
 With tweaks to each model, they look very similar (adding a normalized EDW to Kimball,
dimensionally structured data marts to Inmon)
 Bottom line: Understand both approaches and pick parts from both for your situation – no
need to just choose just one approach
 BUT, no solution will be effective unless you possess soft skills (leadership, communication,
planning, and interpersonal relationships)
Hybrid Model
Advice: Use SQL Server Views to interface between each level in the model (default values, computations, rename fields, etc)
In the DW Bus Architecture, each data mart could be a schema (broken out by business process subject areas), all in one
database. Another option is to have each data mart in its own database with all databases on one server or spread among
multiple servers. Also, the staging areas, CIF, and DW Bus can all be on the same powerful server (MPP)
Kimball Methodology
From: Kimball’s The Microsoft Data Warehouse Toolkit
Kimball defines a development lifecycle, where Inmon is just about the data warehouse (not “how” used)
Data Vault
• Daniel Linstedt, author of Building Scalable Data Warehouse with Data Vault 2.0
• Not just a modeling technique, but entire methodology for DW projects
• Layer between Star Schema and Staging (use in combination with Kimball model, modeled
somewhere between the 3NF and star schema)
• Improves audit and historical tracking
• Hubs (business entities)
• Links (relationships)
• Satellites (descriptive attributes)
https://www.indellient.com/blog/data-vault-what-is-it-and-when-should-it-be-used/
Observation
Pattern
Theory
Hypothesis
What will
happen?
How can we
make it happen?
Predictive
Analytics
Prescriptive
Analytics
What
happened?
Why did
it happen?
Descriptive
Analytics
Diagnostic
Analytics
Confirmation
Theory
Hypothesis
Observation
Two Approaches to getting value out of data: Top-Down +
Bottoms-Up
Implement Data Warehouse
Physical Design
ETL
Development
Reporting &
Analytics
Development
Install and Tune
Reporting &
Analytics Design
Dimension Modelling
ETL Design
Setup Infrastructure
Understand
Corporate
Strategy
Data Warehousing Uses A Top-Down Approach
Data sources
Gather
Requirements
Business
Requirements
Technical
Requirements
The “data lake” Uses A Bottoms-Up Approach
Ingest all data
regardless of requirements
Store all data
in native format without
schema definition
Do analysis
Using analytic engines
like Hadoop
Interactive queries
Batch queries
Machine Learning
Data warehouse
Real-time analytics
Devices
Data Lake + Data Warehouse Better Together
Data sources
What happened?
Descriptive
Analytics
Diagnostic
Analytics
Why did it happen?
What will happen?
Predictive
Analytics
Prescriptive
Analytics
How can we make it happen?
Exactly what is a data lake?
A storage repository, usually Hadoop, that holds a vast amount of raw data in its native format until
it is needed.
Reasons to use a data lake:
• Inexpensively store unlimited data
• Centralized place for multiple subjects (single version of the truth)
• Collect all data “just in case” (data hoarding). The data lake is a good place for data that you “might” use down the road
• Easy integration of differently-structured data
• Store data with no modeling – “Schema on read”
• Complements enterprise data warehouse (EDW)
• Frees up expensive EDW resources for queries instead of using EDW resources for transformations (avoiding user contention)
• Wanting to use technologies/tools (i.e Databricks) to refine/filter data that do the refinement quicker/better than your EDW
• Quick user access to data for power users/data scientists (allowing for faster ROI)
• Data exploration to see if data valuable before writing ETL and schema for relational database, or use for one-time report
• Allows use of Hadoop tools such as ETL and extreme analytics
• Place to land IoT streaming data
• On-line archive or backup for data warehouse data
• With Hadoop/ADLS, high availability and disaster recovery built in
• It can ingest large files quickly and provide data redundancy
• ELT jobs on EDW are taking too long because of increasing data volumes and increasing rate of ingesting (velocity), so offload some of them to the Hadoop data lake
• Have a backup of the raw data in case you need to load it again due to an ETL error (and not have to go back to the source). You can keep a long history of raw data
• Allows for data to be used many times for different analytic needs and use cases
• Cost savings and faster transformations: storage tiers with lifecycle management; separation of storage and compute resources allowing multiple instances of different
sizes working with the same data simultaneously vs scaling data warehouse; low-cost storage for raw data saving space on the EDW
• Extreme performance for transformations by having multiple compute options each accessing different folders containing data
• The ability for an end-user or product to easily access the data from any location
Data Warehouse
Serving, Security & Compliance
• Business people
• Low latency
• Complex joins
• Interactive ad-hoc query
• High number of users
• Additional security
• Large support for tools
• Dashboards
• Easily create reports (Self-service BI)
• Know questions
Why Data Warehouses fail
• 70-80% corporate BI projects fail (Gartner)
• The executive who thinks data warehousing is “easy”
• Starting with an end date and working backwards
• Lack of user involvement
• Insufficient business requirements or too much
• Insufficient architect/design or too much
• Not enough planning for data governance
• Chose the wrong technology
Why Data Warehouses fail
• Poor communication between IT and the business
• Not setting up time to “validate the numbers”
• Performance issues (slow response times)
• Inexperience and lack of technical knowledge
• Modeling the data warehouse to reflect the source data rather than the
business
Why Data Warehouses fail
• The wrong consulting company is hired
• Lack of experience
• Using offshore developers
• Owning the whole project
• No mentoring or knowledge transfer
• Budget runs out
• Developers
• Testers
• DBA
• PM
• Cloud costs
• Software licenses
• Training
How DW project succeed
• Choosing the right technology
• Understand all aspects of data warehouse architectures
• Kimball vs Inmon
• Star Schema vs Relational
• Modern data warehouse vs data fabric vs data lakehouse vs data mesh
• Find a project champion/sponsor
• Have DW director who keeps everyone running at 80% efficiency
• Interative/Agile approach to delivery with frequent releases
• Break project into phases
• Start with a high-priority subject but one that is small and easy
• Get a quick win that shows lot’s of benefits and market it
• Build a solid foundation with a repeatable process
• Get people excited about DW thru training
• User Power BI to prototype/requirements
Modern Data Warehouse
Data Fabric adds: data access, data policies, data catalog, MDM, data virtualization,
data scientist tools, APIs, building blocks, products
Data Lakehouse
Data Mesh
Data Mesh
Credit to Zhamak Dehghani
Q & A ?
James Serra, Big Data Evangelist
Email me at: JamesSerra3@gmail.com
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com

Más contenido relacionado

La actualidad más candente

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data EngineeringHarald Erb
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?James Serra
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data MeshLibbySchulze
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation Brett VanderPlaats
 

La actualidad más candente (20)

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Data modelling 101
Data modelling 101Data modelling 101
Data modelling 101
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
 

Similar a Data Warehousing Trends, Best Practices, and Future Outlook

Dbms and it infrastructure
Dbms and  it infrastructureDbms and  it infrastructure
Dbms and it infrastructureprojectandppt
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesIvo Andreev
 
Business Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseBusiness Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseRussel Chowdhury
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricNathan Bijnens
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data SolutionJames Serra
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading StrategiesMongoDB
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationSunderland City Council
 
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data EstateEnable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data EstateCCG
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.pptBsMath3rdsem
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?James Serra
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBig Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBigDataExpo
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudJames Serra
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonCapgemini
 
DA_04_SQL_Modern_DW.pptx
DA_04_SQL_Modern_DW.pptxDA_04_SQL_Modern_DW.pptx
DA_04_SQL_Modern_DW.pptxAlok Mohapatra
 

Similar a Data Warehousing Trends, Best Practices, and Future Outlook (20)

Dbms and it infrastructure
Dbms and  it infrastructureDbms and  it infrastructure
Dbms and it infrastructure
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
Business Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseBusiness Intelligence and Multidimensional Database
Business Intelligence and Multidimensional Database
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
ITReady DW Day2
ITReady DW Day2ITReady DW Day2
ITReady DW Day2
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data Visualisation
 
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data EstateEnable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
 
DWBASIC.ppt
DWBASIC.pptDWBASIC.ppt
DWBASIC.ppt
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBig Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
DA_04_SQL_Modern_DW.pptx
DA_04_SQL_Modern_DW.pptxDA_04_SQL_Modern_DW.pptx
DA_04_SQL_Modern_DW.pptx
 

Más de James Serra

Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric IntroductionJames Serra
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernanceJames Serra
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI OverviewJames Serra
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AIJames Serra
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...James Serra
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsJames Serra
 
How to build your career
How to build your careerHow to build your career
How to build your careerJames Serra
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed InstanceJames Serra
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017James Serra
 
Learning to present and becoming good at it
Learning to present and becoming good at itLearning to present and becoming good at it
Learning to present and becoming good at itJames Serra
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016James Serra
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB James Serra
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBaseJames Serra
 
Overview on Azure Machine Learning
Overview on Azure Machine LearningOverview on Azure Machine Learning
Overview on Azure Machine LearningJames Serra
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)James Serra
 

Más de James Serra (20)

Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and Governance
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
 
How to build your career
How to build your careerHow to build your career
How to build your career
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed Instance
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
 
Learning to present and becoming good at it
Learning to present and becoming good at itLearning to present and becoming good at it
Learning to present and becoming good at it
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 
Overview on Azure Machine Learning
Overview on Azure Machine LearningOverview on Azure Machine Learning
Overview on Azure Machine Learning
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 

Último

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 

Último (20)

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 

Data Warehousing Trends, Best Practices, and Future Outlook

  • 1. Data Warehousing Trends, Best Practices, and Future Outlook James Serra Data Platform Architecture Lead EY JamesSerra3@gmail.com Blog: JamesSerra.com
  • 2. About Me  EY, Data Platform Architecture Lead  Was previously a Data & AI Architect at Microsoft for seven years  In IT for 35 years, worked on many BI and DW projects  Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM architect, PDW/APS developer  Been perm employee, contractor, consultant, business owner  Presenter at PASS Summit, SQLBits, Enterprise Data World conference, Big Data Conference Europe, SQL Saturdays  Blog at JamesSerra.com  Former SQL Server MVP  Author of book “Reporting with Microsoft SQL Server 2012”
  • 3. Agenda  Why Data Warehouse  Data Warehouse vs Data Mart  Kimball vs Inmon  Data Vault  Data Lake  Why Data Warehouses fail  How Data Warehouses succeed  Data Warehouse concepts  Modern Data Warehouse  Data Fabric  Data Lakehouse  Data Mesh
  • 4. I tried to build a data warehouse on my own… And ended up passed-out drunk in a Denny’s parking lot Let’s prevent that from happening…
  • 5. What a Data Warehouse is not • A data warehouse is not a copy of a source database with the name prefixed with “DW” • It is not a copy of multiple tables (i.e. customer) from various sources systems unioned together in a view • It is not a dumping ground for tables from various sources with not much design put into it
  • 6. What is a Data Warehouse and why use one? A data warehouse is where you store data from multiple data sources to be used for historical and trend analysis reporting. It acts as a central repository for many subject areas and contains the "single version of truth". It is NOT to be used for OLTP applications. Reasons for a data warehouse:  Reduce stress on production system  Optimized for read access, sequential disk scans  Integrate many sources of data  Keep historical records (no need to save hardcopy reports)  Restructure/rename tables and fields, model data  Protect against source system upgrades  Use Master Data Management, including hierarchies  No IT involvement needed for users to create reports  Improve data quality and plugs holes in source systems  One version of the truth  Easy to create BI solutions on top of it (i.e. Azure Analysis Services Cubes)
  • 7. Why use a Data Warehouse? Legacy applications + databases = chaos Production Control MRP Inventory Control Parts Management Logistics Shipping Raw Goods Order Control Purchasing Marketing Finance Sales Accounting Management Reporting Engineering Actuarial Human Resources Continuity Consolidation Control Compliance Collaboration Enterprise data warehouse = order Single version of the truth Enterprise Data Warehouse Every question = decision Two purposes of data warehouse: 1) save time building reports; 2) slice in dice in ways you could not do before
  • 8. Enterprise Data Maturity Stages Structured data is transacted and locally managed. Data used reactively STAGE 2: Informative STAGE 1: Reactive Structured data is managed and analyzed centrally and informs the business Data capture is comprehensive and scalable and leads business decisions based on advanced analytics STAGE 4: Transformative STAGE 3: Predictive Data transforms business to drive desired outcomes. Real-time intelligence Rear-view mirror Any data, any source, anywhere at scale
  • 9. Data Warehouse vs Data Mart  Data Warehouse: A single organizational repository of enterprise wide data across many or all subject areas  Holds multiple subject areas  Holds very detailed information  Works to integrate all data sources  Feeds data mart  Data Mart: Subset of the data warehouse that is usually oriented to specific subject (finance, marketing, sales) • The logical combination of all the data marts is a data warehouse In short, a data warehouse as contains many subject areas, and a data mart contains just one of those subject areas
  • 10. Kimball and Inmon Methodologies Two approaches for building data warehouses
  • 11. Kimball and Inmon Myths  Myth: Kimball is a bottom-up approach without enterprise focus  Really top-down: BUS matrix (choose business processes/data sources), conformed dimensions, MDM  Myth: Inmon requires a ton of up-front design that takes a long time  Inmon says to build DW iteratively, not big bang approach (p. 91 BDW, p. 21 Imhoff)  Myth: Star schema data marts are not allowed in Inmon’s model  Inmon says they are good for direct end-user access of data (p. 365 BDW), good for data marts (p. 12 TTA)  Myth: Very few companies use the Inmon method  Survey had 39% Inmon vs 26% Kimball. Many have a EDW  Myth: The Kimball and Inmon architectures are incompatible  They can work together to provide a better solution
  • 12. Kimball and Inmon Methodologies  Relational (Inmon) vs Dimensional (Kimball)  Relational Modeling:  Entity-Relationship (ER) model  Normalization rules  Many tables using joins  History tables, natural keys  Good for indirect end-user access of data  Dimensional Modeling:  Facts and dimensions, star schema  Less tables but have duplicate data (de-normalized)  Easier for user to understand (but strange for IT people used to relational)  Slowly changing dimensions, surrogate keys  Good for direct end-user access of data
  • 13. Relational Model vs Dimensional Model Relational Model Dimensional Model If you are a business user, which model is easier to use?
  • 14. Kimball and Inmon Methodologies • Kimball: • Logical data warehouse (BUS), made up of subject areas (data marts) • Business driven, users have active participation • Decentralized data marts (not required to be a separate physical data store) • Independent dimensional data marts optimized for reporting/analytics • Integrated via Conformed Dimensions (provides consistency across data sources) • 2-tier (data mart, cube), less ETL, no data duplication • Inmon: • Enterprise data model (CIF) that is a enterprise data warehouse (EDW) • IT Driven, users have passive participation • Centralized atomic normalized tables (off limit to end users) • Later create dependent data marts that are separate physical subsets of data and can be used for multiple purposes • Integration via enterprise data model • 3-tier (data warehouse, data mart, cube), duplication of data
  • 15. Kimball Model Why staging: Limit source contention (ELT), Recoverability, Backup, Auditing
  • 17. Reasons to add a Enterprise Data Warehouse  Single version of the truth  May make building dimensions easier using lightly denormalized tables in EDW instead of going directly from the OLTP source  Normalized EDW results in enterprise-wide consistency which makes it easier to spawn-off the data marts at the expense of duplicated data  Less daily ETL refresh and reconciliation if have many sources and many data marts in multiple databases  One place to control data (no duplication of effort and data)  Reason not to: If have a few sources that need reporting quickly
  • 18. Which model to use?  Models are not that different, having become similar over the years, and can compliment each other  Boils down to Inmon creates a normalized DW before creating a dimensional data mart and Kimball skips the normalized DW  With tweaks to each model, they look very similar (adding a normalized EDW to Kimball, dimensionally structured data marts to Inmon)  Bottom line: Understand both approaches and pick parts from both for your situation – no need to just choose just one approach  BUT, no solution will be effective unless you possess soft skills (leadership, communication, planning, and interpersonal relationships)
  • 19. Hybrid Model Advice: Use SQL Server Views to interface between each level in the model (default values, computations, rename fields, etc) In the DW Bus Architecture, each data mart could be a schema (broken out by business process subject areas), all in one database. Another option is to have each data mart in its own database with all databases on one server or spread among multiple servers. Also, the staging areas, CIF, and DW Bus can all be on the same powerful server (MPP)
  • 20. Kimball Methodology From: Kimball’s The Microsoft Data Warehouse Toolkit Kimball defines a development lifecycle, where Inmon is just about the data warehouse (not “how” used)
  • 21. Data Vault • Daniel Linstedt, author of Building Scalable Data Warehouse with Data Vault 2.0 • Not just a modeling technique, but entire methodology for DW projects • Layer between Star Schema and Staging (use in combination with Kimball model, modeled somewhere between the 3NF and star schema) • Improves audit and historical tracking • Hubs (business entities) • Links (relationships) • Satellites (descriptive attributes) https://www.indellient.com/blog/data-vault-what-is-it-and-when-should-it-be-used/
  • 22. Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why did it happen? Descriptive Analytics Diagnostic Analytics Confirmation Theory Hypothesis Observation Two Approaches to getting value out of data: Top-Down + Bottoms-Up
  • 23. Implement Data Warehouse Physical Design ETL Development Reporting & Analytics Development Install and Tune Reporting & Analytics Design Dimension Modelling ETL Design Setup Infrastructure Understand Corporate Strategy Data Warehousing Uses A Top-Down Approach Data sources Gather Requirements Business Requirements Technical Requirements
  • 24. The “data lake” Uses A Bottoms-Up Approach Ingest all data regardless of requirements Store all data in native format without schema definition Do analysis Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics Devices
  • 25. Data Lake + Data Warehouse Better Together Data sources What happened? Descriptive Analytics Diagnostic Analytics Why did it happen? What will happen? Predictive Analytics Prescriptive Analytics How can we make it happen?
  • 26. Exactly what is a data lake? A storage repository, usually Hadoop, that holds a vast amount of raw data in its native format until it is needed. Reasons to use a data lake: • Inexpensively store unlimited data • Centralized place for multiple subjects (single version of the truth) • Collect all data “just in case” (data hoarding). The data lake is a good place for data that you “might” use down the road • Easy integration of differently-structured data • Store data with no modeling – “Schema on read” • Complements enterprise data warehouse (EDW) • Frees up expensive EDW resources for queries instead of using EDW resources for transformations (avoiding user contention) • Wanting to use technologies/tools (i.e Databricks) to refine/filter data that do the refinement quicker/better than your EDW • Quick user access to data for power users/data scientists (allowing for faster ROI) • Data exploration to see if data valuable before writing ETL and schema for relational database, or use for one-time report • Allows use of Hadoop tools such as ETL and extreme analytics • Place to land IoT streaming data • On-line archive or backup for data warehouse data • With Hadoop/ADLS, high availability and disaster recovery built in • It can ingest large files quickly and provide data redundancy • ELT jobs on EDW are taking too long because of increasing data volumes and increasing rate of ingesting (velocity), so offload some of them to the Hadoop data lake • Have a backup of the raw data in case you need to load it again due to an ETL error (and not have to go back to the source). You can keep a long history of raw data • Allows for data to be used many times for different analytic needs and use cases • Cost savings and faster transformations: storage tiers with lifecycle management; separation of storage and compute resources allowing multiple instances of different sizes working with the same data simultaneously vs scaling data warehouse; low-cost storage for raw data saving space on the EDW • Extreme performance for transformations by having multiple compute options each accessing different folders containing data • The ability for an end-user or product to easily access the data from any location
  • 27. Data Warehouse Serving, Security & Compliance • Business people • Low latency • Complex joins • Interactive ad-hoc query • High number of users • Additional security • Large support for tools • Dashboards • Easily create reports (Self-service BI) • Know questions
  • 28. Why Data Warehouses fail • 70-80% corporate BI projects fail (Gartner) • The executive who thinks data warehousing is “easy” • Starting with an end date and working backwards • Lack of user involvement • Insufficient business requirements or too much • Insufficient architect/design or too much • Not enough planning for data governance • Chose the wrong technology
  • 29. Why Data Warehouses fail • Poor communication between IT and the business • Not setting up time to “validate the numbers” • Performance issues (slow response times) • Inexperience and lack of technical knowledge • Modeling the data warehouse to reflect the source data rather than the business
  • 30. Why Data Warehouses fail • The wrong consulting company is hired • Lack of experience • Using offshore developers • Owning the whole project • No mentoring or knowledge transfer • Budget runs out • Developers • Testers • DBA • PM • Cloud costs • Software licenses • Training
  • 31. How DW project succeed • Choosing the right technology • Understand all aspects of data warehouse architectures • Kimball vs Inmon • Star Schema vs Relational • Modern data warehouse vs data fabric vs data lakehouse vs data mesh • Find a project champion/sponsor • Have DW director who keeps everyone running at 80% efficiency • Interative/Agile approach to delivery with frequent releases • Break project into phases • Start with a high-priority subject but one that is small and easy • Get a quick win that shows lot’s of benefits and market it • Build a solid foundation with a repeatable process • Get people excited about DW thru training • User Power BI to prototype/requirements
  • 32. Modern Data Warehouse Data Fabric adds: data access, data policies, data catalog, MDM, data virtualization, data scientist tools, APIs, building blocks, products
  • 35. Data Mesh Credit to Zhamak Dehghani
  • 36. Q & A ? James Serra, Big Data Evangelist Email me at: JamesSerra3@gmail.com Follow me at: @JamesSerra Link to me at: www.linkedin.com/in/JamesSerra Visit my blog at: JamesSerra.com