Self-service BI empowers users to reach analytic outputs through data visualizations and reporting tools. Solution Architect and Cloud Solution Specialist, James McAuliffe, will be taking you through a journey of Azure's Modern Data Estate.
Student profile product demonstration on grades, ability, well-being and mind...
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
1. Building a Modern Data Estate
Enable Better Decision Making With The Modern Data Estate & Power BI Visualizations
2. Housekeeping
Please message Sami
with any questions,
concerns or if you need
assistance during this
workshop.
Please mute your line!
We will be applying mute.
This session will be
recorded.
If you do not want to be recorded,
please disconnect at this time.
Links:
See chat window.
Worksheet:
See handouts.
To make presentation
larger, draw the bottom
half of screen ‘up’.
3. Introductions
Introduction to the Azure Modern Data Estate
Designing the Modern Data Estate
Visualizations and Analytics through Power BI
Q & A
Agenda
4. Introduction to the Azure Modern Data Estate
What are the pressures on legacy systems and why do you need to change?
• What's wrong with good ol' SQL?
Is the Data Warehouse Still Relevant
Designing the Modern Data Estate
What is the architecture of the Modern Data Estate?
Key design principles
Do I Need a Data Lake?
Optimizing Analytics using Synapse Analytics
Data Ingestion Using Azure Data Factory
Visualizations and Analytics through Power BI
Integrating PowerBI to big data platforms for data driven insights
Agenda
Details
5. James McAuliffe,
Cloud Solution Architect
James McAuliffe is a Cloud Solution Architect with over 20 years of technology
industry experience. During this journey into data and analytics, he’s held all
of the traditional Business Intelligence Solution project roles, ranging from
design and development to complete life cycle BI implementations. He is a
Microsoft Preferred Partner Solutions expert and has worked with clients of all
sizes, from local businesses to Fortune 500 companies.
And I like old Italian cars.
linkedin.com/in/jamesmcauliffesql/
10. Data Landscape – Volume and Pressure
IDC Data Age 2025 - The Digitization of the World
11. Data Landscape - Different Types of Data
• Mobile
• Social
• Scanners
• Sensors
• RFID
• Devices - IoT
• Feeds/APIs
• Other, non-traditional sources
85%
13. 10%
of organizations are expected
to have a highly profitable
business unit specifically for
productizing and
commercializing data by 2020
$100M
The most digitally transformed
enterprises generate on
average $100 million in
additional operating income
each year
5,247GB
Approximate amount of data
for every man, woman and
child on earth in 2020
Data is a key strategic asset
14. E V E R Y I N D U S T R Y I S B E N E F I T I N G F R O M
B I G D A T A A D V A N C E D A N A L Y T I C S
Heartland Bank prevents fraud
and boosts profits
The UK NHS transforms healthcare
with faster access to information.
City of Barcelona boosts citizen
unsegmented with intelligent app
Jet.com transforms customer engagement
with truly aerosolized experience
Rolls Royce decreases costs with
Predictive Maintenance
Manufacturing
Eliminate downtime and
increase efficiency by enabling
better predictive maintenance
for your capital assets.
Banking
Minimize losses with more
accurate fraud detection and
assess exposure to asset,
credit and market risk using a
holistic approach
Boost operational efficiency
and improve patient acre
experience with intelligent
detection and in time service.
Healthcare Government
Empower citizens and
improve their engagement
with relevant information and
personalized citizen services.
Retail
Turn individual customer
interactions into contextual
engagements and increase
customer satisfaction with highly
personalized offers and content
20. The new economy
thrives on data literacy
Communicating with data is a critical skill in
the new economy
21. Users and IT must come
together in the new enterprise
Get over the IT / business divide
22. Governance and self-service
enhance decision-making
Governance is not about making the right decisions,
it is about making decisions the right way
23. The importance of data models
BI models Power BI
• Built and maintained by business users or BI developers
• Use enterprise models, departmental data, and external sources
• Focused on a single subject area, but often widely shared
Machine Learning
models
Azure
Databricks
• Built and maintained by data scientists
• Mostly developed from raw sources in the data lake
• Often experimental, needing a data engineer for production use
Azure Synapse
AnalyticsEnterprise models
• Built and maintained by IT architects
• Consolidated data from many systems
• Centralized as an authoritative source for reporting and analysis
24. Enterprise models in the
self-service environment
If business users
are tech-smart and
data literate, why
do they need
enterprise models?
Consistency
Governance
Efficiency
Enterprise modelsLine-of-Business sources
Azure Synapse
Analytics
Azure Data
Factory
Power BI
25. BI models in the enterprise
environment
If enterprise
models are so
important, why do
users need self-
service BI models? Enterprise models
Flexibility
Efficiency
BI models
Ad-hoc, departmental and
external sources
Azure Synapse
Analytics
Line-of-Business sources
Power BI
Azure Data
Factory
26. Data science models in the
enterprise environment
What is the role of
the data
warehouse with
data science?
Enterprise models
Azure
Databricks
Integrating results with enterprise models
Serving enterprise data for data scientists
Data science results
Azure Synapse
Analytics
Power BI
30. Cloud Statistics
• Cloud data centers will process 94% of workloads in 2021 (Source: Cisco)
• Main reason for cloud adoption (Source: Sysgroup)
o Access to data anytime (42%)
o Disaster recovery (38%)
o Flexibility (37%)
• The US is the most significant public cloud market with an expected spending of $124.6 billion in 2019 (Source:
IDC)
1. United States – $124.6 billion
2. China – $10.5 billion
3. UK – $10 billion
4. Germany – $9.5 billion
5. Japan – $7.4 billion
35. Structured, unstructured, and streaming data
integrated in a single, scalable, environment
The modern data warehouse
is the hub for all data models
36. Power BI and Azure Synapse Analytics
Bring together the market-leading BI platform and the industry-leading data warehouse engine in Azure Synapse
Power BI can analyze and visualize
massive volumes of data
Azure Synapse Analytics provides a scalable
platform to enable real-time BI
37.
38. Is the data warehouse
still relevant?
The data warehouse itself
Commerce and technology
What’s changed since 1988?
A 30-year-old architecture, still going strong
39. • “Relational” stores.
• Most work is on gathering from other disparate stores, and known, structured files, from 3NF into Dimensional (star)
• Typically there is an OLAP (cube, semantic) solution in the mix, consumed by a reporting layer
• Typically these are on-premise, but not always, and can be cloud based
• Technologies vary, but are usually OLEDB, ODBC, File connections. Typically interacting with some form of
• LOE varies, and tools can be disparate
Traditional RDBMS Approach to Data Warehouse Reporting
ata Wa eh se na sis e ting
41. RDBMS – Relational
Columnar data stores (very similar to relational)
Optimized for structuring sparse data efficiently
Document – String and object data –XML, YAML, JSON, BSON or plain text or binary (PDFs, Excel, Word)
Typically contains ALL the data about the entity within one “unit”
Key Value – large hash value, each key associated with a value. highly optimized for applications performing simple
lookups
Graph data stores – nodes (entities) and edges (relationships between entities)
Optimized to analyze the relationships between entities
Time series data stores: set of values organized by time
Optimal for queries described in terms of windows of time.
Object data stores
External Index
Different Types of Data Stores
44. Conceptual Framework for Data Movement, Management, and Analytics
INGEST STORE MODEL & SERVEPREP & TRAINSOURCES
45. Key Points
A place to put all kinds of data in all kinds of formats
A set of technologies that allow you to work with that data, in it’s raw form, and to make sense of it and to derive value
and meaning from the data.
Technologies to store and retrieve the data
Technologies to support exploration and analytics on the data
Generally, a data lake contains lots of data, but it is not always the reason to adopt the concept.
Some Interesting Points
A “big data” solution is one part of an overall analytic platform
The analytic platform does not usually exclude “traditional” relation tools and OLAP technologies.
A well balanced analytic solution will usually include all types of technologies, used in many fluid ways.
What Is a Data Lake?
46. When You Want To
Store and process data in volumes too large for a traditional database.
Transform unstructured data for analysis and reporting.
Capture, process, and analyze unbounded streams of data in real time, or with low latency.
The 3 “V’s” (4 really)
Volume: you have a lot of data
Variety: the data is arriving in many types of forms, not always following known patterns
Velocity: the speed of the data arrival and the need to rapidly understand how to use it, is great.
Veracity: you need to adopt new techniques to make the data usable, or trustworthy, in order to consume it.
Why Do You Want “Big Data”?
54. Introducing
Azure Synapse
Analytics
A limitless analytics service with
unmatched time to insight, that
delivers insights from all your data,
across data warehouses and big data
analytics systems, with blazing speed
Simply put, Azure Synapse is Azure SQL
Data Warehouse evolved
We have taken the same industry leading
data warehouse and elevated it to a whole
new level of performance and capabilities
56. Limitless analytics service with unmatched time to insight
Power BI
Cloud data
SaaS data
On-premises
data
Azure Data Lake Storage
SQL
Analytics Runtimes
Azure Synapse Studio
Unified experience
Integration Management Monitoring Security
PREVIEWGA
PREVIEW
Simplify Analytics with Azure Synapse
Azure Machine
Learning
57. Multiple clusters over
shared data
Online scaling Workload aware
query scheduling
Single Access and Security
for Data Warehouse and
Data Lake workloads
Spark + SQL
integrated runtime
Cluster + Serverless
Innovations in Azure Synapse
PREVIEW PREVIEW GA
PREVIEW
Spark: PREVIEW PREVIEW
SQL: GA
58. The cloud data warehouse in the data-driven business
Azure
Databricks
Azure Data
Lake Storage
Business
services
Power BI
Azure
Data Factory
Azure Synapse
Analytics
59. Data sources for analytics
Azure Synapse
Analytics
Azure
Databricks
Azure Data
Lake Storage
Business
services
Power BI
Azure
Data Factory
60. Data ingestion
LOB sources
Logs and
streams
(unstructured)
Media
(unstructured)
Files
(unstructured)
Business
services
Power BI
Azure
Databricks
Azure
Data Factory
Azure Data
Lake Storage
Azure Synapse
Analytics
61. Data storage & serving
LOB sources
Logs and
streams
(unstructured)
Media
(unstructured)
Files
(unstructured)
Business
services
Power BI
Azure Data
Lake Storage
Azure
Databricks
Azure
Data Factory
Azure Synapse
Analytics
63. Limitless scale GA Preview
Provisioned compute (data warehouse)
Materialized views
Workload importance
Workload isolation
On-demand query
Powerful insights
Power BI integration
Azure Machine Learning integration
Data lake exploration
Streaming analytics (data warehouse)
Apache Spark integration
Unified experience
Hybrid data ingestion
Azure Synapse studio
Unmatched security
Column- and row-level security
Dynamic data masking
Private endpoints
Azure Synapse
Analytics features
64. Best-in-class price
per performance
Price-performance is calculated by GigaOm as the TPC-H metric of cost of ownership divided by composite query. Results based on GigaOm’s TPC-H results, published in January 2019
Leader in price per performance
65. Results based on GigaOm’s TPC-H results, published in January 2019
$0
$10
$20
$30
$40
$50
$60
$550
$600
$40
$33
$47
$54
$48
$51
$564
Best-in-class price
per performance
Price-performance is calculated by GigaOm as the TPC-H metric of cost of ownership divided by composite query.
$103
$110
$152
$80
$100
$120
$140
66. Results based on GigaOm’s TPC-DS results, published in April y 2019
Best-in-class price
per performance
Price-performance is calculated by GigaOm as the TPC-DS metric of cost of ownership divided by composite query.
67. Most secure data
warehouse in the cloud
Multiple levels of security between the
user and the data warehouse
...at no additional cost
Threat Protection
Network Security
Authentication
Access Control
Data Protection
68. Category Feature Synapse
Analytics
Data Protection Data In Transit Yes
Data encryption at rest
(Service & User Managed Keys)
Yes
Data Discovery and Classification Yes
Native Row Level Security Yes
Table and View Security (GRANT / DENY) Yes
Column Level Security Yes
Dynamic Data Masking Yes
SQL Authentication Yes
Native Azure Active Directory Yes
Integrated Security Yes
Multi-Factor Authentication Yes
Virtual Network (VNET) Yes
SQL Firewall (server) Yes
Integration with ExpressRoute Yes
SQL Threat Detection Yes
SQL Auditing Yes
Vulnerability Assessment Yes
Access control for
complete security
70. Create a data lake and information supply chain to curate ‘business ready’
data and analytical assets published in a marketplace for users to consume
IoT
RDBMS
Office docs
Social
Cloud
clickstream
Web logs
XML, JSON
Web services
NoSQL
Files
Information
consumers access
the data marketplace
to shop for business
ready data and
analytical assets
shop for
data
Data marketplace
Info catalog
Business ready data assets
Ingestion zone Curation zone Trusted zone
Common vocabulary
Data Lake
Information supply chain
(curation process)
Data factory processing
Project
71. Unified data integration
Provision common trusted data assets in shared storage for easy consumption
Trusted data zone - common data assets on shared storage
Data lake
Compute only
or
load & computeCurate once
share everywhere
Data lake ingestion zone (Untrusted raw data)
XML,
JSON
FeedsIoT RDBMS Files Office docsSocial CloudWeb logs Web svcsNoSQL
Ingest
Curate
Common data management platform (Business & IT data integration)
Advanced analytics
(structured data)
DW appliance
MDM
C
R
U
D
Cust
Prod
Asset
NoSQL DB
e.g. Graph DB
DW & marts
EDW
mart
Streaming data
Analytical tools & applications
Logical data warehouse (Data virtualization with a common vocabulary)
Shared metadata
Commonly understood, trusted data and insights
Enterprise data
marketplace
75. Transforms data from multiple sources to
provide feature rich experiences for clients
Focuses on the data itself rather than
the logistics of ingesting
Approaching disparate data sources with
automated and scalable processes
Organizations that fully harness their data outperform
76.
77. Serverless, scalable, hybrid data integration service
Lift existing SQL Server ETL
to Azure
Use existing tools
(SSMS, SSDT)
Azure Data Factory
Cloud and hybrid w/
80+ connectors
Up to 2 GB/s ETL/ELT
in the cloud
Seamlessly span on-prem,
Azure, other clouds, SaaS
Run on-demand, scheduled,
or on-event data-availability
Programmability with
multi-language SDK
Visual tools
Data movement
and transformation
at scale
Hybrid
pipeline model
Author
and monitor
SSIS package
execution
78. The data warehouse in the data-driven business
Azure Synapse
Analytics
Azure
Databricks
Azure Data
Lake Storage
Business
services
Power BI
Transform
and enrich
PrepareIngest
Azure
Data Factory
79. F’s exec ti n engine
• Data movement
• Pipeline activity execution
• SSIS package execution
Azure
Integration runtime
Self-hosted
Integration runtime
Cloud services
Apps & Data
Pipeline SSIS package
Command
and control
LEGEND
Data
Integration Runtime (IR)
Azure Data Factory v2 Service Scheduling | Orchestration | Monitoring
UX & SDK Authoring | Monitoring/Management
80. No-code data transformation at scale
Focus on building business
logic and transforming data
• Data cleansing, transformation,
aggregation, conversion, etc.
• Cloud scale via Spark execution
• Resilient data flows with ease
82. Use templates to quickly get started
Quickly build data
integration solutions
Avoid rebuilding workflows—
instantiate a template
Improve developer productivity
and reducing development
time for repeat processes
83. Best-in-class monitoring and management
Monitor pipeline and activity runs
Query runs with rich language
Operational lineage between
parent-child pipelines
Azure Monitor Integration
• Diagnostics logging
• Metrics and alerts
• Events
Restate pipeline and activities
85. New Tooling in Azure Data Factory v2
Source New Branch
Join Conditional Split
Union Lookup
Derived Column Surrogate Key
Pivot Unpivot
Window Exists
Select Filter
Sort Alter Row
Sink
Mapping Data Flow
86. Real-time data warehouse
Integrating Azure Synapse with streaming data in Azure Data Lake via PolyBase
File/event store/NoSQL
SQL
BI Tool
Azure Data FactoryStreaming data
Azure Data Lake Storage
Data virtualization
DW
External table Internal table
90. 2020 Gartner Magic Quadrant for Analytics and Business Intelligence Platforms
91. BI is more than just reporting & dashboards
Prescriptive Analytics
Algorithms
Predictive Analytics
Machine
Learning
Self-Service
Analytics
Interactive
Data
Exploration
Reporting
Dashboards
92. The data warehouse + Power BI in the data-driven business
Azure Synapse
Analytics
Azure
Databricks
Azure
Data Factory
Azure Data
Lake Storage
Business
services
Power BI
93. Customers are searching for ways to level-up their data
H weve , thei data emains…
Siloed Under-utilized Slow to provide insights
Dashboards Ad-hoc
Reporting
Advanced
Visualization
94. Combining BI tools and data warehouses expands
analytics capabilities
BI tools + Azure Synapse Analytics
Unify all
your data
One source
of truth
Instant insights
on-the-fly
95. Data silos and incompatible tooling inhibit collaboration
Marketing Sales Product Ops Finance
IT tools Data science toolsBI tools
Incompatible tooling
Siloed data
97. Unleash data value with Power BI and Azure Data Services
Azure Data ServicesPower BI
Unified data
Powerful and
integrated tooling
Business analysts IT professionals Data scientists
Frictionless
collaboration
Petabyte-scale
analytics
Advanced analytics
and AI
Powerful visualization
and reporting
Unmatched
capabilities
Business value
Common Data Model on Azure Data Lake Storage
98. Leverage powerful visualization and reporting
Power BI
• Scale and govern efficiently
• Semantic modeling
• Improve management
• Open-platform connectivity
Visualize &
report
Power BI
Azure Data
Lake Storage
CDM
folders
99. Deliver petabyte-scale analytics
• Power BI Aggregations
• DirectQuery
• 14x faster and 94% cheaper
• One-click connection
Power BI + Azure Synapse Analytics
Visualize &
report
Power BI
Model &
serve
Azure Synapse
Analytics
Azure Data
Lake Storage
CDM
folders
100. • Solve complex challenges
• Reduce time-to-insight
• Start with no-code ML
• Extend with Azure
Machine Learning
Visualize &
report
Power BI
Train &
predict
Azure Machine
Learning
Azure Data
Lake Storage
CDM
folders
Gain deep insights with advanced analytics and AI
Power BI + Azure Machine Learning
104. BI Embedded
Embedded Analytics = Self-
service BI + Dashboards + ML +
AI + Cognitive Services
Enterprise BI (IaaS)
Business Intelligence = Dashboards
+ Scorecards + Collaboration +
Canned Reports
Enterprise BI (SaaS)
Business Intelligence = Dashboards +
Scorecards + Collaboration + Mobile
Departmental BI
Data Science = Data Wrangling +
Data Exploration + Data Curation +
Data Visualizations + Machine
Learning
Power BI
Service
Power BI
Premium
Power BI
Embed
Power BI
Report
Server
106. Power BI service
Cloud-based SaaS solutions
Get started quickly
Secure, live connection to your data sources,
on-premises and in the cloud
Auto insights and intuitive data exploration using
natural language query
Deliver insights through other services such as
SharePoint, PowerApps & Teams
Pre-built dashboards and reports for popular SaaS
solutions
Sharing and collaboration of dashboards, reports & datasets
Live, real-time dashboard updates
107. Deliver insights through other services
Collaborate and share insights with teams in your
organization using existing services
Fully interactive reports integrated into your service
108. Data Connectivity Modes in Power BI Desktop
Import DirectQuery Live/Exploration
Overview
• ETL
• Data download
• Select specific tables
• No data download
• Queries triggered from
Report visuals
• Explore source objects from
Report surface
• No data download
• Queries triggered from
Report visuals
Supported Data Sources • All sources (>80 sources)
• SQL Server
• Azure SQL Database
• Azure SQL Data Warehouse
• SAP HANA
• Oracle
• Teradata
• SQL Server Analysis Services
(Tabular & Multidimensional)
Max # of data sources per report • Unlimited • One One
Data Transformations • All transformations (100’s)
• Partial support
(varies by data source)
None
Mashup Capabilities
• Merge (Joins)
• Append (Union)
• Parameterized queries
• Merge (Joins)
• Append (Union)
None
Modeling Capabilities
• Relationships
• Calculated Columns & Tables
• Measures
• Hierarchies
• Calculated Columns
• Measures
• Change Column Types
None
With Power BI Desktop,
you can connect to
your data in three ways:
• Import
• DirectQuery
• LiveConnect
109. Dedicated resources in the cloud
Flexibility to license by capacity
Greater scale and performance
Extending on-premises capabilities
Premium capacity – P3
Premium capacity – P2
Premium capacity – P1
My workspace
User 2
My workspace
User 3
App workspace
Marketing
App workspace
Sales
My workspace
User 1
APIs
Custom app
Power BI service – Contoso organization
Power BI Premium
113. Thank You!!
The Modern Data Estate
Enable Better Decision Making With The Modern Data Estate & Power BI Visualizations
114. Azure Synapse Analytics
Overview of Microsoft Azure compliance
Microsoft Compliance Offerings
Azure Integration Services Whitepaper
Azure Data Factory Overview
Automate Data Flow Governance
Power BI Governance Admin
2020 Gartner Magic Quadrant for Analytics and Business Intelligence Platforms
References and Links