SlideShare una empresa de Scribd logo
1 de 49
Azure Storage
Options for
Analytics
Dustin Vannoy
Data Engineer
Cloud + Streaming
Please silence
cell phones
everything PASS
has to offer
Free online
webinar events
Free 1-day local
training events
Local user groups
around the world
Online special
interest user groups
Business analytics
training
Get involved
Free Online Resources
Newsletters
PASS.org
Explore
Dustin Vannoy
Data Engineering Consultant
Co-founder Data Engineering San
Diego
/dustinvannoy
@dustinvannoy
dustin@dustinvannoy.com
Technologies
• Azure & AWS
• Spark
• Kafka
• Python
Modern Data Systems
• Data Lakes
• Analytics in Cloud
• Streaming
PASS Summit Learning Pathway:
Becoming an Azure Data Engineer
Roles and Responsibilities of the Azure Data Engineer
Jes Borland
Wednesday November 06, 10:15 AM
Room: TCC Tahoma 2
Azure Storage Options for Analytics
Dustin Vannoy
Wednesday, November 06, 3:15 PM
Room: TCC Skagit 4
An Azure Data Engineer’s ETL Toolkit
Simon Whiteley
Thursday, November 07, 3:15 PM
Room: TCC Tahoma 4
Data Modeling Trends for 2019 and Beyond
Ike Ellis
Friday, November 08, 9:30 AM
Room: 2AB
Azure Storage
for Analytics 1. Data Lakes
2. Data Warehouses
3. Analytics
Data Lakes in
Azure
Data Lake Defined
Varied Data
Raw, intermediate,
and fully
processed
Ready for Analysts
Query layer, other
analytic tools access
Big Data Capable
Store first,
evaluate and
model later
* Not just a file system
Store Everything
Why Data Lakes?
• CSV, JSON, Logs, Text
• No schema on write
• Cheaper storage
Reason #1
Massive Scale (Big Data)
Why Data Lakes?
• Serverless Hadoop
• Span hot and cold
storage
• Pay for what you use
Reason #2
Reason #3
Storage + Compute
Separate
Why Data Lakes?
• Cost savings
• Multiple analytics tools /
same data
D E M O
Example Data
Lake Querying
Data Lake Best Practices
• Metadata portal
• Not just raw data
• Dataset certification
• Not too much governance
Azure Blob Storage
• Storage for pretty much
anything
• Can choose from Block blob,
Append blob, or Page blob
• Low cost: $
Azure Blob Storage
Structure
Storage Account
Containers
Blobs
ADLS Gen 1 ADLS Gen 2
Azure Data Lake Storage
File system semantics
Granular security
Scale
Benefits from Gen 1
+ Low cost
+ Hierarchical namespace
Data Lake Storage, Gen 2
• Built on Azure Blob Storage
• Hadoop compatible access
• Optimized for cloud analytics
• Low cost: $$
ADLS Gen 2
Structure
Storage Account
File System
Files
Options for Import
Getting Data into ADLS Gen 2
• Azure Databricks
• Azure Data Factory
• AzCopy
• Azure Storage Explorer
Options for Access
Accessing Data From ADLS Gen 2
• Azure Databricks
• HD Insight
• Polybase (SQL DW / SQL Server)
• Power BI
D E M O
ADLS Gen 2:
Setup and Upload
Archive Storage
• Still part of Azure Blob Storage
• Seamless integration with hot/cool
• Keep everything
• Very low cost
but...
• High read cost
• Early deletion charges
Cost Comparison – Hot LRS
Type
Storage
(Dollars/GB)
Reads
(per 10,000)
Writes
(per 10,000)
Blob Storage (Hot) .021 .004 .055
ADLS Gen 2 (Hot) .021 .006 .072
* for ADLS every 4MB is considered an operation
Cost Comparison – Cool LRS
Type
Storage
(Dollars/GB)
Reads
(per 10,000)
Writes
(per 10,000)
Blob Storage (Cool) .015 .010 .100
ADLS Gen 2 (Cool) .015 .013 .130
* for ADLS every 4MB is considered an operation
Cost Comparison – Archive LRS
Type
Storage
(Dollars/GB)
Reads
(per 10,000)
Writes
(per 10,000)
Blob Storage
(Archive)
.002 5.500 .110
ADLS Gen 2 (Archive) .002 7.15 .143
* for ADLS every 4MB is considered an operation
Storage Redundancy Options
Review redundancy and cost implications: https://azure.microsoft.com/en-
us/pricing/details/storage/
Data Warehouses
in Azure
Data Warehouse Defined
Structured Data
Processed and
modeled for
analytics use
Interactive queries
Analysts can get
answers to
questions quickly
BI tool support
Reporting tools
can query
efficiently
Speed of thought
Why Data Warehouses?
• Fast query response
• Indexing or column store
• SQL with analytic functions
Reason #1
Reason #2
Ready to use data
Why Data Warehouses?
• Useful column names
• Cleaned and standardized
• Focused
Update/Delete
Why Data Warehouses?
• Support for real-time
ingestion
• Keep latest view or
manage history
Reason #3
Data Warehouse Best Practices
• Staging data off limits
• Star schema design
• Indexing strategies
• Read replicas
Azure SQL DB
• Good ole relational database
• Less DBA work required
• Scalable on demand
• Medium cost: $$ - $$$$
Managed SQL Server
Azure SQL DB – Elastic pools
• DBs can auto-scale within the pool
• Can move DB to different pool
• Want DBs peak usage at different times
• Important to understand utilization of DBs
Resources shared among DBs
Azure SQL DB – Managed Instances
Most on-premise features supported
• SQL Agent jobs
• Change Data Capture
• Enabled CLR
• Cross database queries
• DB Mail
• Service Broker
• Transactional Replication
Best for migrations
Azure SQL DB – Hyperscale
• Storage, Compute, and Log scale separately
• Backups, restores and scaling not tied to volume of data
• Optimized for OLTP, but supports analytical workloads
• One way migration
Highly scalable storage and compute
Hyperscale
Architecture
http://aka.ms/
SQLDB_Hyperscale
D E M O
Azure SQL DB:
Analytics querying
Azure Synapse Analytics - SQL DW
• MPP - fast reads, many users
• Supports Polybase
• Scalable on demand
• High cost: $$$$
High performance Analytic DB
D E M O
Synapse Analytics
(SQL DW):
Analytics querying
Cosmos DB
• Useful for in-app analytics
• Best with known search key, e.g. CustomerID
• Key-value, Column-family, Document, Graph
• SQL, Cassandra, MongoDB, Gremlin, Table, etcd, Spark
• Medium cost: $$ - $$$
Managed NoSQL
Analytics in
Azure
Shared semantic model Cache data
Azure Analysis Services
Build calculations and
aggregations into a model
that can be used by many
analytics tools
Improve query speeds by
caching data
Visual report tool Supports most sources
Power BI
Build interactive dashboards
and reports or do
exploratory data analysis
Connects to everything
Azure and many other
source types
D E M O
Power BI:
Connect to Data
Lake
Final Thoughts
Keep Learning!
Databricks / ETL
10 Cool Things You Can Do With Azure Databricks – Ike, Simon, Dustin
An Azure Data Engineer's ETL Toolkit – Simon Whiteley
Code Like a Snake Charmer - Introduction to Python! – Jamey Johnston
Code Like a Snake Charmer – Advanced Data Modeling in Python! – Jamey Johnston
Cosmos
Cosmic DBA - Cosmos DB for SQL Server Admins and Developers – Michael Donnelly
CosmosDB - Designing and Troubleshooting Lessons – Neil Hambly
Data Modeling
Data Modeling Trends for 2019 and Beyond – Ike Ellis
Innovative Data Modeling for Cool Data Warehouses – Jeff Renz, Leslie Weed
Data Warehouse / SQL DB
Best, Better, Hyperscale! The Last Database You will Ever Need in the Cloud – Denzil Ribeiro
Introducing Azure Synapse Analytics: The End-to-End Analytics Platform Built for Every Data Professional – Saveen
Reddy
Azure SQL Database: Maximizing Cloud Performance and Availability – Joe Sack, Denzil Ribeiro
Delivering a Data Warehouse in the Cloud – Jeff Renz
Data Warehousing: Which of the Many Cloud Products is the Right One for You? – Ginger Grant
Session
Evaluations
Submit by 5pm Friday,
November 15th to
win prizes.
Download the GuideBook App and
search: PASS Summit 2019
Follow the QR code link on session
signage
Go to PASSsummit.com
3 W A Y S T O A C C E S S
Thank You
Dustin Vannoy
@dustinvannoy
dustin@dustinvannoy.com

Más contenido relacionado

La actualidad más candente

AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS Amazon Web Services
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSAmazon Web Services
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSightAmazon Web Services
 
Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...
Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...
Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...Databricks
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
Building your first Data lake platform
Building your first Data lake platform Building your first Data lake platform
Building your first Data lake platform Amazon Web Services
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseData Con LA
 
Dipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsDipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsBob Pusateri
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFAmazon Web Services
 
Zero to 60 with Azure Cosmos DB
Zero to 60 with Azure Cosmos DBZero to 60 with Azure Cosmos DB
Zero to 60 with Azure Cosmos DBAdnan Hashmi
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
(SOV202) Choosing Among AWS Managed Database Services | AWS re:Invent 2014
(SOV202) Choosing Among AWS Managed Database Services | AWS re:Invent 2014(SOV202) Choosing Among AWS Managed Database Services | AWS re:Invent 2014
(SOV202) Choosing Among AWS Managed Database Services | AWS re:Invent 2014Amazon Web Services
 
Azure Data services
Azure Data servicesAzure Data services
Azure Data servicesRajesh Kolla
 
Best Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSBest Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSAmazon Web Services
 
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsAvere Systems
 

La actualidad más candente (20)

AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSight
 
Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...
Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...
Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database Syst...
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
 
Building your first Data lake platform
Building your first Data lake platform Building your first Data lake platform
Building your first Data lake platform
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Dipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsDipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAs
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Zero to 60 with Azure Cosmos DB
Zero to 60 with Azure Cosmos DBZero to 60 with Azure Cosmos DB
Zero to 60 with Azure Cosmos DB
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
(SOV202) Choosing Among AWS Managed Database Services | AWS re:Invent 2014
(SOV202) Choosing Among AWS Managed Database Services | AWS re:Invent 2014(SOV202) Choosing Among AWS Managed Database Services | AWS re:Invent 2014
(SOV202) Choosing Among AWS Managed Database Services | AWS re:Invent 2014
 
DynamodbDB Deep Dive
DynamodbDB Deep DiveDynamodbDB Deep Dive
DynamodbDB Deep Dive
 
Azure Data services
Azure Data servicesAzure Data services
Azure Data services
 
Best Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSBest Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWS
 
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for Analysts
 

Similar a PASS_Summit_2019_Azure_Storage_Options_for_Analytics

Afternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data ServicesAfternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data ServicesCCG
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Con LA
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeTorsten Steinbach
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseEric Bragas
 
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB DayChoosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB DayAmazon Web Services Korea
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenJ1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenMS Cloud Summit
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseJames Serra
 
AWS Community Day 2022 Brock DeLong_Serverless is More_ The scale and benefi...
AWS Community Day 2022  Brock DeLong_Serverless is More_ The scale and benefi...AWS Community Day 2022  Brock DeLong_Serverless is More_ The scale and benefi...
AWS Community Day 2022 Brock DeLong_Serverless is More_ The scale and benefi...AWS Chicago
 
Move your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in CloudMove your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in CloudCAMMS
 
2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with AzureMarco Parenzan
 
Scaling the Platform for Your Startup
Scaling the Platform for Your StartupScaling the Platform for Your Startup
Scaling the Platform for Your StartupAmazon Web Services
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage CCG
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AITorsten Steinbach
 

Similar a PASS_Summit_2019_Azure_Storage_Options_for_Analytics (20)

Afternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data ServicesAfternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data Services
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Create cloud service on AWS
Create cloud service on AWSCreate cloud service on AWS
Create cloud service on AWS
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
 
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB DayChoosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenJ1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
AWS Community Day 2022 Brock DeLong_Serverless is More_ The scale and benefi...
AWS Community Day 2022  Brock DeLong_Serverless is More_ The scale and benefi...AWS Community Day 2022  Brock DeLong_Serverless is More_ The scale and benefi...
AWS Community Day 2022 Brock DeLong_Serverless is More_ The scale and benefi...
 
Move your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in CloudMove your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in Cloud
 
2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure
 
Scaling the Platform for Your Startup
Scaling the Platform for Your StartupScaling the Platform for Your Startup
Scaling the Platform for Your Startup
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AI
 

Último

2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 

Último (20)

2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 

PASS_Summit_2019_Azure_Storage_Options_for_Analytics

  • 1. Azure Storage Options for Analytics Dustin Vannoy Data Engineer Cloud + Streaming
  • 3. everything PASS has to offer Free online webinar events Free 1-day local training events Local user groups around the world Online special interest user groups Business analytics training Get involved Free Online Resources Newsletters PASS.org Explore
  • 4. Dustin Vannoy Data Engineering Consultant Co-founder Data Engineering San Diego /dustinvannoy @dustinvannoy dustin@dustinvannoy.com Technologies • Azure & AWS • Spark • Kafka • Python Modern Data Systems • Data Lakes • Analytics in Cloud • Streaming
  • 5. PASS Summit Learning Pathway: Becoming an Azure Data Engineer Roles and Responsibilities of the Azure Data Engineer Jes Borland Wednesday November 06, 10:15 AM Room: TCC Tahoma 2 Azure Storage Options for Analytics Dustin Vannoy Wednesday, November 06, 3:15 PM Room: TCC Skagit 4 An Azure Data Engineer’s ETL Toolkit Simon Whiteley Thursday, November 07, 3:15 PM Room: TCC Tahoma 4 Data Modeling Trends for 2019 and Beyond Ike Ellis Friday, November 08, 9:30 AM Room: 2AB
  • 6. Azure Storage for Analytics 1. Data Lakes 2. Data Warehouses 3. Analytics
  • 8. Data Lake Defined Varied Data Raw, intermediate, and fully processed Ready for Analysts Query layer, other analytic tools access Big Data Capable Store first, evaluate and model later * Not just a file system
  • 9. Store Everything Why Data Lakes? • CSV, JSON, Logs, Text • No schema on write • Cheaper storage Reason #1
  • 10. Massive Scale (Big Data) Why Data Lakes? • Serverless Hadoop • Span hot and cold storage • Pay for what you use Reason #2
  • 11. Reason #3 Storage + Compute Separate Why Data Lakes? • Cost savings • Multiple analytics tools / same data
  • 12. D E M O Example Data Lake Querying
  • 13. Data Lake Best Practices • Metadata portal • Not just raw data • Dataset certification • Not too much governance
  • 14. Azure Blob Storage • Storage for pretty much anything • Can choose from Block blob, Append blob, or Page blob • Low cost: $
  • 15. Azure Blob Storage Structure Storage Account Containers Blobs
  • 16. ADLS Gen 1 ADLS Gen 2 Azure Data Lake Storage File system semantics Granular security Scale Benefits from Gen 1 + Low cost + Hierarchical namespace
  • 17. Data Lake Storage, Gen 2 • Built on Azure Blob Storage • Hadoop compatible access • Optimized for cloud analytics • Low cost: $$
  • 18. ADLS Gen 2 Structure Storage Account File System Files
  • 19. Options for Import Getting Data into ADLS Gen 2 • Azure Databricks • Azure Data Factory • AzCopy • Azure Storage Explorer
  • 20. Options for Access Accessing Data From ADLS Gen 2 • Azure Databricks • HD Insight • Polybase (SQL DW / SQL Server) • Power BI
  • 21. D E M O ADLS Gen 2: Setup and Upload
  • 22. Archive Storage • Still part of Azure Blob Storage • Seamless integration with hot/cool • Keep everything • Very low cost but... • High read cost • Early deletion charges
  • 23. Cost Comparison – Hot LRS Type Storage (Dollars/GB) Reads (per 10,000) Writes (per 10,000) Blob Storage (Hot) .021 .004 .055 ADLS Gen 2 (Hot) .021 .006 .072 * for ADLS every 4MB is considered an operation
  • 24. Cost Comparison – Cool LRS Type Storage (Dollars/GB) Reads (per 10,000) Writes (per 10,000) Blob Storage (Cool) .015 .010 .100 ADLS Gen 2 (Cool) .015 .013 .130 * for ADLS every 4MB is considered an operation
  • 25. Cost Comparison – Archive LRS Type Storage (Dollars/GB) Reads (per 10,000) Writes (per 10,000) Blob Storage (Archive) .002 5.500 .110 ADLS Gen 2 (Archive) .002 7.15 .143 * for ADLS every 4MB is considered an operation
  • 26. Storage Redundancy Options Review redundancy and cost implications: https://azure.microsoft.com/en- us/pricing/details/storage/
  • 28. Data Warehouse Defined Structured Data Processed and modeled for analytics use Interactive queries Analysts can get answers to questions quickly BI tool support Reporting tools can query efficiently
  • 29. Speed of thought Why Data Warehouses? • Fast query response • Indexing or column store • SQL with analytic functions Reason #1
  • 30. Reason #2 Ready to use data Why Data Warehouses? • Useful column names • Cleaned and standardized • Focused
  • 31. Update/Delete Why Data Warehouses? • Support for real-time ingestion • Keep latest view or manage history Reason #3
  • 32. Data Warehouse Best Practices • Staging data off limits • Star schema design • Indexing strategies • Read replicas
  • 33. Azure SQL DB • Good ole relational database • Less DBA work required • Scalable on demand • Medium cost: $$ - $$$$ Managed SQL Server
  • 34. Azure SQL DB – Elastic pools • DBs can auto-scale within the pool • Can move DB to different pool • Want DBs peak usage at different times • Important to understand utilization of DBs Resources shared among DBs
  • 35. Azure SQL DB – Managed Instances Most on-premise features supported • SQL Agent jobs • Change Data Capture • Enabled CLR • Cross database queries • DB Mail • Service Broker • Transactional Replication Best for migrations
  • 36. Azure SQL DB – Hyperscale • Storage, Compute, and Log scale separately • Backups, restores and scaling not tied to volume of data • Optimized for OLTP, but supports analytical workloads • One way migration Highly scalable storage and compute
  • 38. D E M O Azure SQL DB: Analytics querying
  • 39. Azure Synapse Analytics - SQL DW • MPP - fast reads, many users • Supports Polybase • Scalable on demand • High cost: $$$$ High performance Analytic DB
  • 40. D E M O Synapse Analytics (SQL DW): Analytics querying
  • 41. Cosmos DB • Useful for in-app analytics • Best with known search key, e.g. CustomerID • Key-value, Column-family, Document, Graph • SQL, Cassandra, MongoDB, Gremlin, Table, etcd, Spark • Medium cost: $$ - $$$ Managed NoSQL
  • 43. Shared semantic model Cache data Azure Analysis Services Build calculations and aggregations into a model that can be used by many analytics tools Improve query speeds by caching data
  • 44. Visual report tool Supports most sources Power BI Build interactive dashboards and reports or do exploratory data analysis Connects to everything Azure and many other source types
  • 45. D E M O Power BI: Connect to Data Lake
  • 47. Keep Learning! Databricks / ETL 10 Cool Things You Can Do With Azure Databricks – Ike, Simon, Dustin An Azure Data Engineer's ETL Toolkit – Simon Whiteley Code Like a Snake Charmer - Introduction to Python! – Jamey Johnston Code Like a Snake Charmer – Advanced Data Modeling in Python! – Jamey Johnston Cosmos Cosmic DBA - Cosmos DB for SQL Server Admins and Developers – Michael Donnelly CosmosDB - Designing and Troubleshooting Lessons – Neil Hambly Data Modeling Data Modeling Trends for 2019 and Beyond – Ike Ellis Innovative Data Modeling for Cool Data Warehouses – Jeff Renz, Leslie Weed Data Warehouse / SQL DB Best, Better, Hyperscale! The Last Database You will Ever Need in the Cloud – Denzil Ribeiro Introducing Azure Synapse Analytics: The End-to-End Analytics Platform Built for Every Data Professional – Saveen Reddy Azure SQL Database: Maximizing Cloud Performance and Availability – Joe Sack, Denzil Ribeiro Delivering a Data Warehouse in the Cloud – Jeff Renz Data Warehousing: Which of the Many Cloud Products is the Right One for You? – Ginger Grant
  • 48. Session Evaluations Submit by 5pm Friday, November 15th to win prizes. Download the GuideBook App and search: PASS Summit 2019 Follow the QR code link on session signage Go to PASSsummit.com 3 W A Y S T O A C C E S S

Notas del editor

  1. Part of “Becoming an Azure Data Engineer” learning pathway - https://www.pass.org/summit/2019/Learn/LearningPathways.aspx#AzureDataEngineer Azure Storage Options for Analytics - https://www.pass.org/summit/2019/Learn/SessionDetails.aspx?sid=94120
  2. I’m easy to find – just look for my full name or go to dustinvannoy.com
  3. Things a data lake will have: Varied data – raw, intermediate, and fully processed data all included. Varied type – normally multiple file formats and includes data that isn’t fully structured/modeled Usable by analysts – some type of query layer or other analytic access should be available Large capacity – assumed that a data lake isn’t a place where we question the value of every file and field, typically the history kept here is large Where does the analogy come from: James Dixon from Pentaho in 2010 – if thinking about data marts or analytic data tables, they are your bottled water – structured and refined, ready to go. The data lake is a place that data streams in and people can come to examine it, dive in, or take a sample. Reference: Stacia Varga on RunAs Radio podcast.
  4. Data Marts need to be cleaned. Too much data flowing in is imposible to clean, so we store it all in raw form and do some processing in the data lake layer. Instead of a backlog of data that needs cleaned and structured for analytics, we make the data available prior to much cleaning happening
  5. 10:00 Duration: 5 minutes Overview of querying a data lake in Azure without explaining the storage and tools involved. Quick overview of Azure Databricks as a place for data lake analytics. Show using azure databricks, use million songs dataset and nyc trips Describe how storage is separate from querying, data_lake_sql_demo: show different ways of using SQL only in Databricks – discuss that data is actually stored in Azure Storage create_spark_tables_v2: show how by learning a little bit of PySpark code you can create tables or transform data using data frames
  6. Metadata portal – some type of data discovery and documentation is really beneficial. The tools out there to enable this are never out of the box, a lot of work has to happen to get enough metadata captured for users to actually find what they want. Some processing as done and that processed data is stored back into the lake. Not necessarily all processed data needs to go back to the lake, but just dumping data into Azure Storage is not enough to expect the results you desire from building a Data Lake. Have some certified data sets – this is one that Finance has used for their monthly reporting so you can count on it to be maintained and align with what stakeholders have seen as top level numbers Balanced access - few users have access to ALL data, but a good amount of data is available by default for analysts and users trained in data privacy and confidentiality. If you put all the data in the lake and make it a pain to get to, you will not get the experimentation and unplanned discoveries that are possible when data is made available to smart people.
  7. Blobs can be one of three types: Block blobs Append blobs Page blobs
  8. To store data we have to create an Azure Storage Account. You may think of these as a namespace or root directory. Within each storage account we may create many containers, similar to directories that help us organize data. Within a container we can store our data in blobs which is easiest to think of as a file, though it is a bit more complex than that. Reference: https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction
  9. "The hierarchical namespace organizes objects/files into a hierarchy of directories for efficient data access. A common object store naming convention uses slashes in the name to mimic a hierarchical directory structure. This structure becomes real with Data Lake Storage Gen2. Operations such as renaming or deleting a directory become single atomic metadata operations on the directory rather than enumerating and processing all objects that share the name prefix of the directory.” - https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction
  10. ” Hadoop compatible access: Data Lake Storage Gen2 allows you to manage and access data just as you would with a Hadoop Distributed File System (HDFS). The new ABFS driver is available within all Apache Hadoop environments, including Azure HDInsight, Azure Databricks, and SQL Data Warehouse to access data stored in Data Lake Storage Gen2.” https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction
  11. The options for analytics will be discussed more in a later section but a quick mention of how we expect to access ADLS data Polybase - There is no pushdown computation support, so PolyBase is mostly used for data loading from ADLS Gen2 - https://www.jamesserra.com/archive/2019/09/ways-to-access-data-in-adls-gen2/ Power BI – directly (beta) or in dataflows (preview)
  12. 25:00 Duration 5 min Should show uploading data and using databricks
  13. https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-storage-tiers
  14. See FAQ on billing scenario for example: https://azure.microsoft.com/en-us/pricing/details/storage/data-lake/
  15. See FAQ on billing scenario for example: https://azure.microsoft.com/en-us/pricing/details/storage/data-lake/
  16. See FAQ on billing scenario for example: https://azure.microsoft.com/en-us/pricing/details/storage/data-lake/
  17. Raw and intermediate data is normally in a backend staging area (separate database or schema) that only the data warehouse development team can get to Star schema design to balance storage, indexing, and joining Conformed dimensions – one version of customer dimension data, common calendar table, etc. Slow cahnging dimensions – there are techniques used for tracking history to dimension values. An example is if a product is re-assigned to a new product category. You can choose to just overwrite product category which simplifies new queries but means all reports built using product category in the past will not be possible to recreate. Indexing or partitioning carefully considered Read replicas to reduce locking and resource contention for those reading data and the jobs writing data
  18. Elastic querying with external tables
  19. When to use: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-elastic-pool#when-should-you-consider-a-sql-database-elastic-pool
  20. SQL Server Agent jobs Change Data Capture  * Enabled CLR * Cross database queries * DB Mail enabled * Service Broker Transactional Replication References: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-paas-vs-sql-server-iaas https://docs.microsoft.com/en-us/azure/sql-database/sql-database-features
  21. Scaling transactional systems horizontally is something that the industry has struggled with forever. Hyperscale is going to keep your data consistent while at the same time scaling storage and compute. Hyperscale – really cool technology where they separate the storage engine used by SQL Server and scale that out – called Page Servers instead of Storage Engine. Each Page server stores up to 128 GB of data pages and has secondary. Scales out horizontally by adding more page servers. Multi tiered architecture, SSD based caching on compute later, SSD based cache on page server. Scale up by adding more cores very rapidly (spin up new compute in a couple minutes and failover to the new compute near instantaneous) Scale out with ready only compute Built on SQL Server engine so same experience you are used to 100 TB storage (will expand) Compute scales fast and independently of storage References: Kevin Farlee - https://www.youtube.com/watch?v=Z9AFnKI7sfI
  22. Hyperscale – really cool technology where they separate the storage engine used by SQL Server and scale that out – called Page Servers instead of Storage Engine. Each Page server stores up to 128 GB of data pages and has secondary. Scales out horizontally by adding more page servers. Multi tiered architecture, SSD based caching on compute later, SSD based cache on page server. Scale up by adding more cores very rapidly (spin up new compute in a couple minutes and failover to the new compute near instantaneous) Scale out with ready only compute Built on SQL Server engine so same experience you are used to 100 TB storage (will expand) Compute scales fast and independently of storage References: Kevin Farlee - https://www.youtube.com/watch?v=Z9AFnKI7sfI
  23. 40:00 Duration: 5 min Show options of general purpose, business critical, and hypserscale
  24. SQL DW – trade off some of the SQL features but able to scale as MPP – going to lose some things like foreign keys which may not be required for analytics but consider that carefully to make sure you are comfortable without the features that don’t fit in this MPP service. Usually cost is the main factor, expecting you to need to query multiple terabytes of structured data and get much faster performance than a standard database solution provides. Will not be best option for random seeks, such as looking up a single item or small amount of items in a large dataset. Expects you to do operations that would require table scans and is built to handle those way better by parallelizing the load.
  25. 50:00 Duration: 5 min
  26. Document – Microsoft Document (recommended) or MongoDB (migrations). Set at collection level, have to use that API for that collection. SQL API – work on top of Microsoft Document Cassandra – eventually consistent option, different tradeoffs than Document option Graph – Gremlin, etc KeyValue – Azure Table Storage API – highly consistent
  27. Typically you will build out a star schema in SQL DB and then import to analysis services
  28. Can import data into its own data model so may skip analysis services cube and only store in Power BI dataset. Possible to share via Power BI Shared Datasets, but development experience will be different than with cubes and you can only use from Power BI (though some additional options if you have Power BI Premium, may be a good option for larger organizations).
  29. 65:00 Duration: 5 min https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-use-power-bi Path and key in presentationfolder/powerbi_adls_info.txt https://dvtrainingadls.dfs.core.windows.net/demo/spotify/
  30. Can import data into its own data model so may skip analysis services cube and only store in Power BI dataset. Possible to share via Power BI Shared Datasets, but development experience will be different than with cubes and you can only use from Power BI (though some additional options if you have Power BI Premium, may be a good option for larger organizations).
  31. See FAQ on billing scenario for example: https://azure.microsoft.com/en-us/pricing/details/storage/data-lake/