SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
Designing a
Modern Data Warehouse
in Azure
Antonios
Chatzipavlis
Data Solutions
Consultant & Trainer
1988 Beginning of my professional career
1996 I started working with SQL Server 6.0
1998 Certified as MCSD (3rd in Greece)
1999 Became an MCT
2010 Microsoft MVP on Data Platform
Created www.sqlschool.gr
2012 Became MCT Regional Lead by Microsoft Learning
2013 Certified as MCSE : Data Platform and
MCSE : Business Intelligence
2016 Certified as MCSE: Data Management & Analytics
2018 Certified as MCSA : Machine Learning
Recertified as MCSE: Data Management & Analytics
• Articles
• SQL Server in Greek
• SQL Nights
• Webcasts
• SQL Server News
• Downloads
• Resources
What we are doing Follow us
fb/sqlschoolgr
fb/groups/sqlschool
@antoniosch
@sqlschool
yt/c/SqlschoolGr
SQLschool.gr Group
A community for
Greek professionals
who use the
Microsoft
Data Platform
Ask your question at help@sqlschool.gr
Explore
everything
PASS has
to offer
Free Online Resources
Newsletters
PASS.org
Get involved
Free online
webinar
events
Local user groups
around the world
Free 1-day local
training events
Online special
interest user
groups
Business analytics
training
bit.ly/AAB2019Evaluation
A data warehouse is a subject-oriented,
integrated, time-variant and
non-volatile collection
of data in support of management’s
decision making process.
WHAT IS A DATA WAREHOUSE?
TRADITIONAL DATA WAREHOUSE
SELF-SERVICE DATA WAREHOUSE
TRADITIONAL DW LIMITATIONS
Data
sources
User
Competition
Scaling up
Data
platforms
CURRENT DW CHALLENGES
Timeliness Flexibility Quality Findability
RECENT RESEARCH SURVEYS
of responders reports that
they will replace their
primary DW platform and
analytics tools within 3
years
+50%
The data tsunami
WHY YOU NEED A
MODERN DATA
WAREHOUSE
Customer
experience
Quality
assurance
Operational
efficiency
Innovation
THE CRITERIA
FOR SELECTING A
MODERN DW
Meets Current
and Future
Needs
ON-PREMISES
VS.
CLOUD DW
• Evaluating Time to Value
• Accounting for Storage and Computing Costs
• Sizing, Balancing and Tuning
• Considering Data Preparation and ETL Costs
• Cost of Specialized Business Analytic Tools
• Scaling and Elasticity
• Delays and Downtime
• Cost of Security Breaches
• Data Protection and Recovery
STEPS TO
GETTING STARTED
WITH CLOUD DW
• Evaluate your data warehousing needs.
• Migrate or start fresh.
• Establish success criteria.
• Evaluate cloud data warehouse solutions.
• Calculate your total cost of ownership.
• Set up a proof of concept (POC).
Azure Modern Data Warehouse
MODERN DATA WAREHOUSE
MODERN DATA WAREHOUSE IN AZURE
ADVANCED ANALYTICS ON BIG DATA
REAL-TIME ANALYTICS
SQL SERVER 2019 BIG DATA CLUSTERS
INGEST DATA
ADF
• PaaS
• Mapping Data Flow transform data (ETL)
• Copy Data tool easily copy from source
to destination
• Templates
• Any new project
• Converting SSIS packages
• Row by row ETL can be slower
• Data needs to be moved to Databricks –
limited by compute size
• Mapping Data flow takes time to startup
SSIS
• SSDT – Visual Studio
• Very popular product
• Used for on-prem ETL for may year
• Too big of an effort to migrate existing
packages
• Skillset staying on-prem
• Change to IR in ADF
• Row by row ETL can be slower
• Data need to moved to IR
• Limited by node size/number of SSIS IR
STORE DATA
ADLS Gen 2
• PaaS
• Best features of blob
storage
• Not all features are
available yet
• Some products not support
yet
• 5TB file size limit
Blob Storage
• PaaS
• Original storage
• Most popular
• Don’t use for new projects
• Account limit 2 PB for US
and Europe
• 4,75TB file size limit
SQL Server 2019 Big Data
Cluster
• IaaS
• Combines SQL Server
database engine, Spark,
HDFS (ADLS Gen2) into a
unified data platform
• Deployed as containers on
Kubernetes
• Polybase
• Hybrid cloud
• Data virtualization
• AI Platform
PREP DATA
Azure Databricks
• PaaS
• Processing massive
amounts of data
• Training & deploy
models
• Manage workflows
• Spark & notebooks
• Integration with
ADLS, SQL DW, PBI
• Writing Code
• High learning curve
Azure HDInsight
• PaaS
• Deploys &
provisions Apache
Hadoop clusters
• No integration with
SQL DW
• Always running and
incurring cost
• Hortonworks
merged with
Cloudera
Polybase & Stored
Procedures in SQL
DW
• IaaS
• T-SQL queries via
external tables
• Tuning queries
• Increase storage
space
PowerBI Dataflow
• PowerBI service
• Power Query
• Self-service data
prep
• Individual solution
• Small workloads
• Don’t use this to
replace a DW or
ADF
MODEL & SERVE DATA
Azure SQL DW
• PaaS
• Fully managed
petabyte scale
cloud DW
• Can scale compute
and storage
independently
• Can be paused
• MPP
Azure Analysis
Services
• PaaS
• Tabular model
• Fast queries
• High concurrency
• Semantic layer
• Vertical scale-out
• High availability
• Advanced time-
calculations
• Time to process
the cube
Azure SQL
Database
• PaaS
• Suitable for small
DW
• Size limits/tier
• Optimized for
OLTP
SQL Server in
VM
• IaaS
• MDX models
Cosmos DB
• PaaS
• Globally
distributed
• Multi-model
database service
• Spark to Cosmos
DB connector for
DW aggregations
ETL vs ELT
ETL ELT
Time – Load Uses staging area and system, extra time to load data All in one system, load only once
Time – Transformation
Need to wait, especially for big data sizes - as data grows,
transformation time increases
All in one system, speed is not dependent on data size
Time – Maintenance
High maintenance - choice of data to load and transform and
must do it again if deleted or want to enhance the main data
repository
Low maintenance - all data is always available
Implementation complexity At early stage, requires less space and result is clean
Requires in-depth knowledge of tools and expert design of the
main large repository
Analysis & Processing style
Based on multiple scripts to create the views - deleting view
means deleting data
Creating adhoc views - low cost for building and maintaining
Data limitation or restriction By presuming and choosing data a priori By HW (none) and data retention policy
DW Support
Prevalent legacy model used for on-premises and relational,
structured data
Tailored to using in scalable cloud infrastructure to support
structured, unstructured such
big data sources
Data Lake Support Not part of approach Enables use of lake with unstructured data supported
Usability Fixed tables, Fixed timeline, Used mainly by IT
Ad Hoc, Agility, Flexibility, Usable by everyone from developer to
citizen integrator
Cost-effective Not cost-effective for small and medium businesses
Scalable and available to all business sizes using online SaaS
solutions
LAMBDA ARCHITECTURE
LAMBDA
ARCHITECTURE IN
AZURE
COMMON
DATA MODEL
Antonios
Chatzipavlis
Data Solutions
Consultant & Trainer
./sqlschoolgr - ./groups/sqlschool
@antoniosch - @sqlschool
yt/c/SqlschoolGr
SQLschool.gr Group
Thank you!
A community for Greek professionals who use the Microsoft Data Platform
Copyright © 2018 SQLschool.gr. All right reserved. PRESENTER MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION

Más contenido relacionado

La actualidad más candente

Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 

La actualidad más candente (20)

Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Microsoft Azure Technical Overview
Microsoft Azure Technical OverviewMicrosoft Azure Technical Overview
Microsoft Azure Technical Overview
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed Instance
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Introducing Azure SQL Database
Introducing Azure SQL DatabaseIntroducing Azure SQL Database
Introducing Azure SQL Database
 
CAF presentation 09 16-2020
CAF presentation 09 16-2020CAF presentation 09 16-2020
CAF presentation 09 16-2020
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Let's Talk About: Database Migration Service
Let's Talk About: Database Migration ServiceLet's Talk About: Database Migration Service
Let's Talk About: Database Migration Service
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Azure Service Endpoints vs. Private Links
Azure Service Endpoints vs. Private LinksAzure Service Endpoints vs. Private Links
Azure Service Endpoints vs. Private Links
 
Migrate to Microsoft Azure with Confidence
Migrate to Microsoft Azure with ConfidenceMigrate to Microsoft Azure with Confidence
Migrate to Microsoft Azure with Confidence
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview Slides
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Cloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for PartnersCloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for Partners
 

Similar a Designing a modern data warehouse in azure

How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQL
DataStax
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
Cloudera, Inc.
 

Similar a Designing a modern data warehouse in azure (20)

Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
SQL Server 2019 hotlap - WARDY IT Solutions
SQL Server 2019 hotlap - WARDY IT SolutionsSQL Server 2019 hotlap - WARDY IT Solutions
SQL Server 2019 hotlap - WARDY IT Solutions
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
 
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQL
 
A Successful Journey to the Cloud with Data Virtualization
A Successful Journey to the Cloud with Data VirtualizationA Successful Journey to the Cloud with Data Virtualization
A Successful Journey to the Cloud with Data Virtualization
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life Easier
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life EasierWebinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life Easier
Webinar: DataStax Enterprise 5.0 What’s New and How It’ll Make Your Life Easier
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Big Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace ImagesBig Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace Images
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Exploring sql server 2016
Exploring sql server 2016Exploring sql server 2016
Exploring sql server 2016
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 

Más de Antonios Chatzipavlis

Más de Antonios Chatzipavlis (20)

Data virtualization using polybase
Data virtualization using polybaseData virtualization using polybase
Data virtualization using polybase
 
SQL server Backup Restore Revealed
SQL server Backup Restore RevealedSQL server Backup Restore Revealed
SQL server Backup Restore Revealed
 
Migrate SQL Workloads to Azure
Migrate SQL Workloads to AzureMigrate SQL Workloads to Azure
Migrate SQL Workloads to Azure
 
Machine Learning in SQL Server 2019
Machine Learning in SQL Server 2019Machine Learning in SQL Server 2019
Machine Learning in SQL Server 2019
 
Workload Management in SQL Server 2019
Workload Management in SQL Server 2019Workload Management in SQL Server 2019
Workload Management in SQL Server 2019
 
Loading Data into Azure SQL DW (Synapse Analytics)
Loading Data into Azure SQL DW (Synapse Analytics)Loading Data into Azure SQL DW (Synapse Analytics)
Loading Data into Azure SQL DW (Synapse Analytics)
 
Introduction to DAX Language
Introduction to DAX LanguageIntroduction to DAX Language
Introduction to DAX Language
 
Building diagnostic queries using DMVs and DMFs
Building diagnostic queries using DMVs and DMFs Building diagnostic queries using DMVs and DMFs
Building diagnostic queries using DMVs and DMFs
 
Exploring T-SQL Anti-Patterns
Exploring T-SQL Anti-Patterns Exploring T-SQL Anti-Patterns
Exploring T-SQL Anti-Patterns
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019
 
SQLServer Database Structures
SQLServer Database Structures SQLServer Database Structures
SQLServer Database Structures
 
Sqlschool 2017 recap - 2018 plans
Sqlschool 2017 recap - 2018 plansSqlschool 2017 recap - 2018 plans
Sqlschool 2017 recap - 2018 plans
 
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018 Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
 
Microsoft SQL Family and GDPR
Microsoft SQL Family and GDPRMicrosoft SQL Family and GDPR
Microsoft SQL Family and GDPR
 
Statistics and Indexes Internals
Statistics and Indexes InternalsStatistics and Indexes Internals
Statistics and Indexes Internals
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 
Azure SQL Data Warehouse
Azure SQL Data Warehouse Azure SQL Data Warehouse
Azure SQL Data Warehouse
 
Introduction to azure document db
Introduction to azure document dbIntroduction to azure document db
Introduction to azure document db
 
Introduction to Machine Learning on Azure
Introduction to Machine Learning on AzureIntroduction to Machine Learning on Azure
Introduction to Machine Learning on Azure
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Designing a modern data warehouse in azure

  • 1.
  • 2. Designing a Modern Data Warehouse in Azure
  • 3. Antonios Chatzipavlis Data Solutions Consultant & Trainer 1988 Beginning of my professional career 1996 I started working with SQL Server 6.0 1998 Certified as MCSD (3rd in Greece) 1999 Became an MCT 2010 Microsoft MVP on Data Platform Created www.sqlschool.gr 2012 Became MCT Regional Lead by Microsoft Learning 2013 Certified as MCSE : Data Platform and MCSE : Business Intelligence 2016 Certified as MCSE: Data Management & Analytics 2018 Certified as MCSA : Machine Learning Recertified as MCSE: Data Management & Analytics
  • 4. • Articles • SQL Server in Greek • SQL Nights • Webcasts • SQL Server News • Downloads • Resources What we are doing Follow us fb/sqlschoolgr fb/groups/sqlschool @antoniosch @sqlschool yt/c/SqlschoolGr SQLschool.gr Group A community for Greek professionals who use the Microsoft Data Platform Ask your question at help@sqlschool.gr
  • 5. Explore everything PASS has to offer Free Online Resources Newsletters PASS.org Get involved Free online webinar events Local user groups around the world Free 1-day local training events Online special interest user groups Business analytics training
  • 7. A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process. WHAT IS A DATA WAREHOUSE?
  • 11. CURRENT DW CHALLENGES Timeliness Flexibility Quality Findability
  • 12. RECENT RESEARCH SURVEYS of responders reports that they will replace their primary DW platform and analytics tools within 3 years +50%
  • 13.
  • 15. WHY YOU NEED A MODERN DATA WAREHOUSE Customer experience Quality assurance Operational efficiency Innovation
  • 16. THE CRITERIA FOR SELECTING A MODERN DW Meets Current and Future Needs
  • 17. ON-PREMISES VS. CLOUD DW • Evaluating Time to Value • Accounting for Storage and Computing Costs • Sizing, Balancing and Tuning • Considering Data Preparation and ETL Costs • Cost of Specialized Business Analytic Tools • Scaling and Elasticity • Delays and Downtime • Cost of Security Breaches • Data Protection and Recovery
  • 18. STEPS TO GETTING STARTED WITH CLOUD DW • Evaluate your data warehousing needs. • Migrate or start fresh. • Establish success criteria. • Evaluate cloud data warehouse solutions. • Calculate your total cost of ownership. • Set up a proof of concept (POC).
  • 19. Azure Modern Data Warehouse
  • 24. SQL SERVER 2019 BIG DATA CLUSTERS
  • 25. INGEST DATA ADF • PaaS • Mapping Data Flow transform data (ETL) • Copy Data tool easily copy from source to destination • Templates • Any new project • Converting SSIS packages • Row by row ETL can be slower • Data needs to be moved to Databricks – limited by compute size • Mapping Data flow takes time to startup SSIS • SSDT – Visual Studio • Very popular product • Used for on-prem ETL for may year • Too big of an effort to migrate existing packages • Skillset staying on-prem • Change to IR in ADF • Row by row ETL can be slower • Data need to moved to IR • Limited by node size/number of SSIS IR
  • 26. STORE DATA ADLS Gen 2 • PaaS • Best features of blob storage • Not all features are available yet • Some products not support yet • 5TB file size limit Blob Storage • PaaS • Original storage • Most popular • Don’t use for new projects • Account limit 2 PB for US and Europe • 4,75TB file size limit SQL Server 2019 Big Data Cluster • IaaS • Combines SQL Server database engine, Spark, HDFS (ADLS Gen2) into a unified data platform • Deployed as containers on Kubernetes • Polybase • Hybrid cloud • Data virtualization • AI Platform
  • 27. PREP DATA Azure Databricks • PaaS • Processing massive amounts of data • Training & deploy models • Manage workflows • Spark & notebooks • Integration with ADLS, SQL DW, PBI • Writing Code • High learning curve Azure HDInsight • PaaS • Deploys & provisions Apache Hadoop clusters • No integration with SQL DW • Always running and incurring cost • Hortonworks merged with Cloudera Polybase & Stored Procedures in SQL DW • IaaS • T-SQL queries via external tables • Tuning queries • Increase storage space PowerBI Dataflow • PowerBI service • Power Query • Self-service data prep • Individual solution • Small workloads • Don’t use this to replace a DW or ADF
  • 28. MODEL & SERVE DATA Azure SQL DW • PaaS • Fully managed petabyte scale cloud DW • Can scale compute and storage independently • Can be paused • MPP Azure Analysis Services • PaaS • Tabular model • Fast queries • High concurrency • Semantic layer • Vertical scale-out • High availability • Advanced time- calculations • Time to process the cube Azure SQL Database • PaaS • Suitable for small DW • Size limits/tier • Optimized for OLTP SQL Server in VM • IaaS • MDX models Cosmos DB • PaaS • Globally distributed • Multi-model database service • Spark to Cosmos DB connector for DW aggregations
  • 29.
  • 30. ETL vs ELT ETL ELT Time – Load Uses staging area and system, extra time to load data All in one system, load only once Time – Transformation Need to wait, especially for big data sizes - as data grows, transformation time increases All in one system, speed is not dependent on data size Time – Maintenance High maintenance - choice of data to load and transform and must do it again if deleted or want to enhance the main data repository Low maintenance - all data is always available Implementation complexity At early stage, requires less space and result is clean Requires in-depth knowledge of tools and expert design of the main large repository Analysis & Processing style Based on multiple scripts to create the views - deleting view means deleting data Creating adhoc views - low cost for building and maintaining Data limitation or restriction By presuming and choosing data a priori By HW (none) and data retention policy DW Support Prevalent legacy model used for on-premises and relational, structured data Tailored to using in scalable cloud infrastructure to support structured, unstructured such big data sources Data Lake Support Not part of approach Enables use of lake with unstructured data supported Usability Fixed tables, Fixed timeline, Used mainly by IT Ad Hoc, Agility, Flexibility, Usable by everyone from developer to citizen integrator Cost-effective Not cost-effective for small and medium businesses Scalable and available to all business sizes using online SaaS solutions
  • 34. Antonios Chatzipavlis Data Solutions Consultant & Trainer ./sqlschoolgr - ./groups/sqlschool @antoniosch - @sqlschool yt/c/SqlschoolGr SQLschool.gr Group Thank you!
  • 35. A community for Greek professionals who use the Microsoft Data Platform Copyright © 2018 SQLschool.gr. All right reserved. PRESENTER MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION