Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Analytics in a Day Virtual Workshop

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 172 Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a Analytics in a Day Virtual Workshop (20)

Anuncio

Más de CCG (20)

Más reciente (20)

Anuncio

Analytics in a Day Virtual Workshop

  1. 1. Analytics in a Day Cloud analytics in the age of self-service and data science
  2. 2. Housekeeping Please message Sami with any questions, concerns or if you need assistance during this workshop. Please mute your line! We will be applying mute. This session will be recorded. If you do not want to be recorded, please disconnect at this time. Links: See chat window. Worksheet: See handouts. To make presentation larger, draw the bottom half of screen ‘up’.
  3. 3. Agenda 9:00 – 10:00 The heart of analytics 10:00 – 11:00 Optimizing analytics with Azure Synapse Analytics 11:00 – 11:30 Data Integration in Azure Synapse Analytics 11:30 – 12:00 Insights for all with Power BI + Azure Synapse Times are approximate and will be fluid with the workshop
  4. 4. James McAuliffe, Cloud Solution Architect James McAuliffe is a Cloud Solution Architect with over 20 years of technology industry experience. During this journey into data and analytics, he’s held all of the traditional Business Intelligence Solution project roles, ranging from design and development to complete life cycle BI implementations. He is a Microsoft Preferred Partner Solutions expert and has worked with clients of all sizes, from local businesses to Fortune 500 companies. And I like old Italian cars. linkedin.com/in/jamesmcauliffesql/
  5. 5. My Spare Time
  6. 6. CCG At A Glance What we value Learning – We are curious learners with a thirst for knowledge Serving – We are passionate about serving our customers, colleagues and community Teaming – We accomplish great things through teamwork Who we are CCG is an award- winning analytics solutions & services firm that values our Culture, Our Community, and our Client’s Success What we do Our core services capabilities are rooted in data and analytics solution development. This includes all services necessary to help clients turn their data into actionable insights Microsoft Gold Partner Certified Great Place to Work Tampa Bay Business Journal Fastest Growing Companies Florida Companies to Watch Inc. 5000 Florida State Seminole 100 What we achieve
  7. 7. Strategy & Management • Rapid Data Governance Solution • Rapid Analytics Roadmap Solution Services • Health Assessments • Strategic Roadmaps • Master Data Management • Meta Data Management • Data Governance Information Management • Platform Modernization Solution • Cloud Migration Solution Services • Data Integration • Data Architecture • Data Warehouses and Lakes • PowerApps • Cloud Management • Cloud Migration • DR/BC through Azure • Azure Governance/Security Analytics • Leadership Development • Customer Analytics Services • Dashboards and Visualizations • Operational Reporting • Self-Service • Training • Data Exploration • Location Intelligence (GIS) Data Science and AI • RapidInsight with Machine Learning Prototype Solution Services • Model as a Service • Data Science as a Service • Predictive Analytics • Natural Language Processing Machine Learning • Artificial Intelligence • Machine Learning Ops CCG – Solutions and Services
  8. 8. Virtual Introductions ▸ Name, Company & Title ▸ What do you hope to get out of today’s workshop?
  9. 9. Why Modern Data Estate? Why Unified Data Platform?
  10. 10. 10% of organizations are expected to have a highly profitable business unit specifically for productizing and commercializing data by 2020 $100M The most digitally transformed enterprises generate on average $100 million in additional operating income each year 5,247GB Approximate amount of data for every man, woman and child on earth in 2020 Data is a key strategic asset
  11. 11. Data Landscape – Volume and Pressure IDC Data Age 2025 - The Digitization of the World
  12. 12. Data Landscape - Different Types of Data • Mobile • Social • Scanners • Sensors • RFID • Devices - IoT • Feeds/APIs • Other, non-traditional sources 85%
  13. 13. DATA AI CLOUD
  14. 14. The heart of analytics Section 1 Data businesses need data warehouses Section 2 Data warehouses & data lakes come together Section 3 BI & DW come together Section 4 The cloud for modern analytics Section 5 A new class of analytics
  15. 15. Section 1 Data businesses need data warehouses
  16. 16. Is the data warehouse still relevant? What’s changed since 1988? A 30-year-old architecture, still going strong Commerce and technology The data warehouse itself
  17. 17. Today, all businesses are data businesses Data is the lifeblood of modern work
  18. 18. All data businesses need to be analytic businesses Without analytics data is a cost center, not a resource
  19. 19. Analytic businesses need to evolve data science Every business has opportunities to make analytics faster, easier, and more insightful
  20. 20. Store Data Ingestion Big Data Data Warehousing The cloud data warehouse in the data-driven business
  21. 21. Data Ingestion Big Data Data Warehousing Store The cloud data warehouse in the data-driven business
  22. 22. Store Data Ingestion Big Data Data Warehousing Cloud data SaaS data On-premises data Devices data The cloud data warehouse in the data-driven business
  23. 23. Store Data Ingestion Big Data Data Warehousing Cloud data SaaS data On-premises data Devices data The cloud data warehouse in the data-driven business
  24. 24. Store Data Ingestion Big Data Data Warehousing Cloud data SaaS data On-premises data Devices data The cloud data warehouse in the data-driven business
  25. 25. Workflow Architecture
  26. 26. Azure Synapse Analytics Store The cloud data warehouse in the data-driven business Data Ingestion Big Data Data Warehousing
  27. 27. Store Azure Synapse Analytics Data Ingestion Big Data Data Warehousing The cloud data warehouse in the data-driven business
  28. 28. Section 2 Data warehouses & data lakes come together
  29. 29. 80% report struggling to become mature users of data* 55% report data silos and data management difficulties as roadblocks* * Harvard Business Review (2019), Understanding why analytics strategies fall short for some, but not for others Analytics & AI is the #1 investment for business leaders, however they struggle to maximize ROI
  30. 30. Big data Experimentation Fast exploration Semi-structured Data science OR Relational data Proven security & privacy Dependable performance Structured Business analytics Data lake Data warehouse Businesses are forced to maintain two critical, yet independent analytics systems
  31. 31. ©Microsoft Corporation Azure It’s a challenge to integrate these areas with security Big data Experimentation Fast exploration Semi-structured Data science OR Relational data Proven security & privacy Dependable performance Structured Business analytics Data lake Data warehouse Securing a business-critical environment needs enterprise- class features such as row and column security Relying on views only increases the number of artifacts to be managed
  32. 32. ©Microsoft Corporation Azure It’s a challenge to integrate these areas with security Big data Experimentation Fast exploration Semi-structured Data science OR Relational data Proven security & privacy Dependable performance Structured Business analytics Data lake Data warehouse Securing a business-critical environment needs enterprise- class features such as row and column security Relying on views only increases the number of artifacts to be managed Securing a platform used for both experimentation and production demands the ability to discover sensitive data You can’t secure what you don’t know
  33. 33. ©Microsoft Corporation Azure It’s a challenge to integrate these areas with security Big data Experimentation Fast exploration Semi-structured Data science OR Relational data Proven security & privacy Dependable performance Structured Business analytics Data lake Data warehouse Securing a business-critical environment needs enterprise- class features such as row and column security Relying on views only increases the number of artifacts to be managed Securing a platform used for both experimentation and production demands the ability to discover sensitive data You can’t secure what you don’t know User management in an environment with diverse use cases requires integrated enterprise authentication
  34. 34. ©Microsoft Corporation Azure It’s a challenge to manage these diverse workloads Big data Experimentation Fast exploration Semi-structured Data science OR Relational data Proven security & privacy Dependable performance Structured Business analytics Data lake Data warehouse In this mixed environment, not all workloads have the same priority Mission-critical warehouse jobs are demanding and predictable Exploratory analytics and data science are important but unpredictable Executive queries are high-profile, although rare
  35. 35. ©Microsoft Corporation Azure It’s a challenge to manage these diverse workloads Big data Experimentation Fast exploration Semi-structured Data science OR Relational data Proven security & privacy Dependable performance Structured Business analytics Data lake Data warehouse In this mixed environment, not all workloads have the same priority Mission-critical warehouse jobs are demanding and predictable Exploratory analytics and data science are important but unpredictable Executive queries are high-profile, although rare Throwing multiple clusters at these scenarios is easy, especially in lightweight scenarios, however: Each cluster has a cost Bringing a cluster online has a lag, which can impact important, if rare, workloads It takes compute to maintain large caches
  36. 36. ©Microsoft Corporation Azure It’s a challenge to build integrated lifecycle management Big data Experimentation Fast exploration Semi-structured Data science OR Relational data Proven security & privacy Dependable performance Structured Business analytics Data lake Data warehouse Data Scientists need a real data lake Query over open file formats such as Parquet and ORC natively without loading the data to a proprietary cluster Enable data science platforms and the EDW to share a common data set
  37. 37. ©Microsoft Corporation Azure It’s a challenge to build integrated lifecycle management Big data Experimentation Fast exploration Semi-structured Data science OR Relational data Proven security & privacy Dependable performance Structured Business analytics Data lake Data warehouse Developers need real development tools Version control Continuous integration and deployment Unit testing, integration testing and load testing Data Scientists need a real data lake Query over open file formats such as Parquet and ORC natively without loading the data to a proprietary cluster Enable data science platforms and the EDW to share a common data set
  38. 38. ©Microsoft Corporation Azure Welcome to limitless Ease of use Fast exploration Quick to start Proven security Airtight privacy Dependable performance Data warehousing & big data analytics—all in one service Azure meets these challenges, with a single service to provide limitless analytics
  39. 39. Section 3 BI & DW come together Azure Synapse Analytics Azure meets these challenges, with a single service to provide limitless analytics
  40. 40. Section 3 BI & DW come together
  41. 41. • “Relational” stores. • Most work is on gathering from other disparate stores, and known, structured files, from 3NF into Dimensional (star) • Typically there is an OLAP (cube, semantic) solution in the mix, consumed by a reporting layer • Typically these are on-premise, but not always, and can be cloud based • Technologies vary, but are usually OLEDB, ODBC, File connections. Typically interacting with some form of • LOE varies, and tools can be disparate Traditional RDBMS Approach to Data Warehouse Reporting ata Warehouse nal sis e orting
  42. 42. Advanced, Mature (Legacy) Hub and Spoke Architecture
  43. 43. DATA AI CLOUD
  44. 44. Basic Reporting Unit - Star “Platform” or Star “Fabric”
  45. 45. The new economy thrives on data literacy Communicating with data is a critical skill in the new economy
  46. 46. Users and IT must come together in the new enterprise Get over the IT / business divide
  47. 47. Governance and self-service enhance decision-making Governance is not about making the right decisions, it is about making decisions the right way
  48. 48. The importance of data models BI models Power BI • Built and maintained by business users or BI developers • Use enterprise models, departmental data, and external sources • Focused on a single subject area, but often widely shared Machine Learning models Azure Synapse Analytics • Built and maintained by data scientists • Mostly developed from raw sources in the data lake • Often experimental, needing a data engineer for production use Azure Synapse AnalyticsEnterprise models • Built and maintained by IT architects • Consolidated data from many systems • Centralized as an authoritative source for reporting and analysis
  49. 49. Enterprise models in the self-service environment If business users are tech-smart and data literate, why do they need enterprise models? Consistency Some business processes can be built once and shared as a corporate standard Governance Certain data sets need complex security and privacy controls Efficiency No need to repeat design, preparing, and loading or securing Line-of-business sources Data ingestion & transformation Enterprise models Azure Synapse Analytics Power BI
  50. 50. BI models in the enterprise environment If enterprise models are so important, why do users need self- service BI models? Flexibility Some data sets are temporary, external, or ad-hoc don’t need to be consolidated Efficiency Tech-smart business users have fresh and innovative ideas they need to explore with agility Ad-hoc, departmental and external sources Line-of-business sources Data ingestion & transformation Power BI Enterprise models Azure Synapse Analytics BI models
  51. 51. Section 4 The cloud for modern analytics Data science models in the enterprise environment What is the role of the data warehouse with data science? Integrating results with enterprise models Making the results of data science easily available for business functions Serving enterprise data for data scientists Helps ensure consistency across diverse analyses Power BI Azure Synapse Analytics Azure Databricks Enterprise models Azure Synapse Analytics Data science results
  52. 52. Section 4 The cloud for modern analytics
  53. 53. DATA AI CLOUD
  54. 54. Cloud Statistics • Cloud data centers will process 94% of workloads in 2021 (Source: Cisco) • Main reason for cloud adoption (Source: Sysgroup) o Access to data anytime (42%) o Disaster recovery (38%) o Flexibility (37%) • The US is the most significant public cloud market with an expected spending of $124.6 billion in 2019 (Source: IDC) 1. United States – $124.6 billion 2. China – $10.5 billion 3. UK – $10 billion 4. Germany – $9.5 billion 5. Japan – $7.4 billion
  55. 55. Management Responsibilities
  56. 56. Modern businesses succeed in the cloud The cloud is the default environment for new technology initiatives
  57. 57. Cloud security offers a new level of protection Businesses benefit from built-in security found only in the cloud
  58. 58. Price, performance, and agility A cloud analytics platform is an economic breakthrough
  59. 59. Structured, unstructured, and streaming data integrated in a single, scalable, environment A cloud analytics platform is the hub for all data models
  60. 60. BI Bring together the best of both worlds with the market- leading BI service and the industry-leading analytics platform Power BI can analyze and visualize massive volumes of data Azure Synapse Analytics provides a scalable platform to enable real-time BI Analytics
  61. 61. Section 5 A new class of analytics Power BI can analyze and visualize massive volumes of data Azure Synapse Analytics provides a scalable platform to enable real-time BI Azure Machine Learning natively integrates with Azure Synapse & Power BI to democratize AI across your business BI Analytics Machine learning Bring together the best of both worlds with the market- leading BI service and the industry-leading analytics platform
  62. 62. Section 5 A new class of analytics
  63. 63. DATA AI CLOUD
  64. 64. Is the data warehouse still relevant? The data warehouse itself Commerce and technology What’s changed since 1988? A 30-year-old architecture, still going strong
  65. 65. Unified experience Azure Synapse Studio Integration Management Monitoring Security Analytics runtimes SQL Azure Data Lake Storage Azure Machine Learning On-premises data Cloud data SaaS data Streaming data Power BI Azure Synapse lies at the heart of business, AI, and BI Azure Synapse Analytics
  66. 66. Unified experienceAzure Synapse Studio Integration Management Monitoring SecuritySQL Azure Data Lake Storage Azure Machine Learning On-premises data Cloud data SaaS data Streaming data Cloud analytics has taken a leap forward with a unified, unmatched platform Azure Synapse Analytics Power BI
  67. 67. Break
  68. 68. Azure Synapse Analytics Limitless analytics service with unmatched time to insight
  69. 69. Introducing Azure Synapse Analytics A limitless analytics service with unmatched time to insight, that delivers insights from all your data, across data warehouses and big data analytics systems, with blazing speed Simply put, Azure Synapse is Azure SQL Data Warehouse evolved We have taken the same industry leading data warehouse and elevated it to a whole new level of performance and capabilities
  70. 70. Azure Synapse Analytics Snowflake Standard Amazon Redshift Google BigQuery per byte $33 $103 $48 …$564 94% less TPC-H benchmark comparison Price-performance | Lower is better * GigaOm TPC-H benchmark report, January 2019, “GigaOm report: Data Warehouse in the Cloud Benchmark With the best price-performance in the business Up to 14x faster and costs 94% less than other cloud providers A breakthrough in the cost of enterprise analytics
  71. 71. Data consolidation using Azure Synapse Analytics Migration to the cloud for efficient business operations Using Azure Synapse Analytics for predictive analytics Organizations that fully harness their data outperform
  72. 72. t the core of all use cases is…Azure Synapse Analytics Real-time analytics Modern data warehousing Advanced analytics "We want to analyze data coming from multiple sources and in varied formats" "We want to leverage the analytics platform for advanced fraud detection" “We’re trying to get insights from our devices in real-time” Cloud-scale analytics
  73. 73. Store Ingest Transform Model & serve Visualize Modern Data Warehouse
  74. 74. Store Azure Synapse Analytics
  75. 75. Synapse SQL Apache Spark for Synapse Synapse Pipelines Synapse Studio Azure Synapse Analytics
  76. 76. Query and analyze data with T-SQL using both provisioned and serverless models Quickly create notebooks with your choice of Python, Scala, SparkSQL, and .NET for Apache Spark Build end-to-end workflows for your data movement and data processing scenarios Execute all data tasks with a simple UI and unified environment Azure Synapse Analytics Synapse SQL Apache Spark for Synapse Synapse Pipelines Synapse Studio
  77. 77. Integrated analytics platform for AI, BI, and continuous intelligence Platform Azure Data Lake Storage Common Data Model Enterprise Security Optimized for Analytics Data lake integrated and Common Data Model aware METASTORE SECURITY MANAGEMENT MONITORING Integrated platform services for, management, security, monitoring, and metastore DATA INTEGRATION Analytics Runtimes Integrated analytics runtimes available provisioned and serverless Synapse SQL offering T-SQL for batch, streaming, and interactive processing Synapse Spark for big data processing with Python, Scala, R and .NET PROVISIONED (DW) SERVERLESS Form Factors SQL Languages Python .NET Java Scala R Multiple languages suited to different analytics workloads Experience Synapse Studio SaaS developer experiences for code free and code first Artificial Intelligence / Machine Learning / Internet of Things Intelligent Apps / Business Intelligence Designed for analytics workloads at any scale Azure Synapse Analytics
  78. 78. Integrated analytics platform for AI, BI, and continuous intelligence Platform Azure Data Lake Storage Common Data Model Enterprise Security Optimized for Analytics METASTORE SECURITY MANAGEMENT MONITORING DATA INTEGRATION Analytics Runtimes PROVISIONED (DW) SERVERLESS Form Factors SQL Languages Python .NET Java Scala R Experience Synapse Studio Artificial Intelligence / Machine Learning / Internet of Things Intelligent Apps / Business Intelligence Azure Synapse Analytics Connected Services Azure Data Catalog Azure Data Lake Storage Azure Data Share Azure Databricks Azure HDInsight Azure Machine Learning Power BI 3rd Party Integration
  79. 79. Synapse SQL Apache Spark for Synapse Synapse Pipelines Synapse Studio Azure Synapse Analytics
  80. 80. Azure Synapse Analytics Synapse Studio
  81. 81. Synapse Studio is divided into Activity hubs Hubs organize the tasks needed for building analytics solutions Synapse Studio Overview Data Monitor Manage Quick-access to common gestures, most-recently used items, and links to tutorials and documentation. Explore structured and unstructured data Centralized view of all resource usage and activities in the workspace. Configure the workspace, pool, access to artifacts Develop Write code and the define business logic of the pipeline via notebooks, SQL scripts, Data flows, etc. Orchestrate Design pipelines that that move and transform data.
  82. 82. Overview hub
  83. 83. Start coding immediately Begin with SQL scripts, notebook, data flow and more Overview hub
  84. 84. Synapse Studio Data hub
  85. 85. Explore data inside the workspace and in linked storage accounts Data Hub
  86. 86. Explore data inside the workspace and in linked storage accounts Data Hub ADLS Gen2 Account Container (filesystem) Filepath
  87. 87. Preview a sample of your data Data Hub – Storage accounts
  88. 88. Manage access and configure standard POSIX ACLs on files and folders Data Hub – Storage accounts
  89. 89. Analyze SQL scripts or notebooks with two simple actions Autogenerate T-SQL or PySpark Data Hub – Storage accounts
  90. 90. SQL pool SQL serverless Apache Spark Explore workspace databases Databases
  91. 91. Synapse Studio Develop Hub
  92. 92. Author SQL Scripts Execute SQL script on provisioned SQL Pool or SQL Serverless Publish individual SQL script or multiple SQL scripts through Publish all feature Support for languages and Intellisense Develop hub - SQL scripts
  93. 93. View results in table or chart form and export results in several popular formats Develop hub - SQL scripts
  94. 94. Data flows are a visual way of specifying how to transform data, providing a code-free experience Develop hub - Data flows
  95. 95. Develop hub – Power BI Create Power BI reports in the workspace Provide access to published reports in the workspace Update reports in real time from Synapse workspace and show on Power BI service Visually explore and analyze data
  96. 96. Azure Synapse Analytics Synapse SQL
  97. 97. Best-in-class Price-performance is calculated by GigaOm as the TPC-H metric of cost of ownership divided by composite query. Results based on GigaOm’s TPC-H results, published in January 2019 Leader in price per performance
  98. 98. Amazon Redshift $0 $10 $20 $30 $40 $50 $60 $550 $600 $40 $33 $47 $54 $48 $51 $564 Price-performance @ 30TB Lower is Better Google BigQueryAzure Synapse Analytics Snowflake $103 $110 $152 $80 $100 $120 $140 Best-in-class Price-performance is calculated by GigaOm as the TPC-H metric of cost of ownership divided by composite query. Results based on GigaOm’s TPC-H results, published in January 2019
  99. 99. Price-performance @ 30TB Lower is Better Amazon Redshift Google BigQuery Flat Rate Azure Synapse Analytics Google BigQuery Flat Rate Snowflake Standard $1310 $570 $309 $206 $286 $153 $0 $100 $200 $300 $400 $500 $600 Snowflake Standard Best-in-class Price-performance is calculated by GigaOm as the TPC-H metric of cost of ownership divided by composite query. Results based on GigaOm’s TPC-H results, published in January 2019
  100. 100. Benchmark Data Warehouse in the Cloud Benchmark
  101. 101. --T-SQL syntax for scoring data in SQL DW SELECT d.*, p.Score FROM PREDICT(MODEL = @onnx_model, DATA = dbo.mytable AS d) WITH (Score float) AS p; Upload models Machine learning enabled DW Native PREDICT-ion T-SQL based experience (interactive/batch scoring) Interoperability with other models built elsewhere Scoring executed where the data lives T-SQL Language Data Warehouse Data + Score models Model Predictions = Synapse SQL Create models
  102. 102. Event Hubs IoT Hub T-SQL language Built-in streaming ingestion & analytics Streaming Ingestion Data Warehouse Synapse SQL Heterogenous data preparation and ingestion Native SQL streaming High throughput ingestion (up to 200MB/sec) Delivery latencies in seconds Ingestion throughput scales with compute scale Analytics capabilities
  103. 103. Empower more users per data warehouse Leverage up to 128 concurrent slots, simultaneously, on a single data warehouse Number of simultaneous workloads increases with data warehouse capacity Utilize preset functions to allocate resources that need them the most
  104. 104. Intra cluster workload isolation (Scale in) Marketing CREATE WORKLOAD GROUP Sales WITH ( [ MIN_PERCENTAGE_RESOURCE = 60 ] [ CAP_PERCENTAGE_RESOURCE = 100 ] [ MAX_CONCURRENCY = 6 ] ) 40% Data warehouse Local In-Memory + SSD Cache Compute 1000c DWU 60% Sales 60% 100% Workload aware query execution Workload isolation Multiple workloads share deployed resources Reservation or shared resource configuration Online changes to workload policies
  105. 105. Cluster N Multi-clusters (Scale out) Sales Marketing Finance Data Warehouses Workload Management Scale-out Clusters Independent elasticity, pause, and resume Highest performance Physical workload isolation Highest concurrency Chargeback per cluster
  106. 106. Benefits: • Most predictable cost • Most efficient for unpredictable workloads • No cache eviction for scaling (no performance cliff) • Workload isolation • Single endpoint (auto isolation with classification) Benefits: • Maximize cluster throughput • Workload aware query scheduling • Fine grained cluster scaling Benefits: • Best performance • Physical workload isolation • Chargeback • Highest concurrency Intra-cluster workload isolation (scale in) Marketing Sales 60% 40% Data Warehouse Autonomous workload balancing Cluster 1 Cluster 2 Cluster 3 Data Warehouse Cluster N Multi-clusters (scale out) Data Warehouse
  107. 107. CREATE MATERIALZIED VIEW vw_ProductSales WITH (DISTRIBUTION = HASH(ProductKey)) AS SELECT ProductName ProductKey, SUM(Amount) AS TotalSales FROM FactSales fs INNER JOIN DimProduct dp ON fs.prodkey = dp.prodkey GROUP BY ProductName, ProductKey See more by scaling to petabytes
  108. 108. ProductName ProductKey TotalSales Product A 5453 784,943.00 Product B 763 48,723.00 … … … FactSales Table 10B Records DimProduct Table 1,000 Records Materialized View (1000 Records) See more by scaling to petabytes FactInventory Table mvw_ProductSales 1,000 Records CREATE MATERIALZIED VIEW mvw_ProductSales WITH (DISTRIBUTION = HASH(ProductKey)) AS SELECT ProductName ProductKey, SUM(Amount) AS TotalSales FROM FactSales fs INNER JOIN DimProduct dp ON fs.prodkey = dp.prodkey GROUP BY ProductName, ProductKey SELECT <COLUMNS> FROM FactSales fs INNER JOIN SELECT ProductName ProductKey, SUM(Amount) AS TotalSales FROM FactSales fs INNER JOIN DimProduct dp GROUP BY ProductName, ProductKey ) ps INNER JOIN FactInventory GROUP BY …
  109. 109. Execution 2 Cache Hit ~.2 seconds Execution 1 Cache Miss Regular Execution SELECT ProductName ProductKey, SUM(Amount) AS TotalSales FROM Fact Sales INNER JOIN DimProduct GROUP BY ProductName, ProductKey Build confidence in your data with result set cache Data Warehouse Resultset Cache
  110. 110. Most secure data warehouse in the cloud Multiple levels of security between the user and the data warehouse ...at no additional cost Threat Protection Network Security Authentication Access Control Data Protection Customer Data
  111. 111. Enterprise-grade security
  112. 112. HIPAA / HITECH IRS 1075 Section 508 VPAT ISO 27001 PCI DSS Level 1SOC 1 Type 2 SOC 2 Type 2 ISO 27018Cloud Controls Matrix Content Delivery and Security Association Singapore MTCS Level 3 United Kingdom G-Cloud China Multi Layer Protection Scheme China CCCPPF China GB 18030 European Union Model Clauses EU Safe Harbor ENISA IAF Shared Assessments ITAR-ready Japan Financial Services FedRAMP JAB P-ATO FIPS 140-2 21 CFR Part 11 DISA Level 2FERPA CJIS Australian Signals Directorate New Zealand GCIO Industry-leading compliance
  113. 113. Threat Protection Threat Protection - Business requirements Network Security Authentication Access Control Data ProtectionHow do we enumerate and track potential SQL vulnerabilities? To mitigate any security misconfigurations before they become a serious issue. How do we discover and alert on suspicious database activity? To detect and resolve any data exfiltration or SQL injection attacks.
  114. 114. ✓ Automatic discovery of columns with sensitive data ✓ Add persistent sensitive data labels ✓ Audit and detect access to the sensitive data ✓ Manage labels for your entire Azure tenant using Azure Security Center SQL Data Discovery & Classification Discover, classify, protect and track access to sensitive data
  115. 115. SQL Data Discovery & Classification - setup Step 1: Enable Advanced Data Security on the logical SQL Server Step 2: Use recommendations and/or manual classification to classify all the sensitive columns in your tables
  116. 116. SQL Data Discovery & Classification – audit sensitive data access Step 1: Configure auditing for your target Data warehouse. This can be configured for just a single data warehouse or all databases on a server. Step 2: Navigate to audit logs in storage account and download ‘xel’ log files to local machine. Step 3: Open logs using extended events viewer in SSMS. Configure viewer to include ‘data_sensitivity_information’ column
  117. 117. Single Sign-On Implicit authentication - User provides login credentials once to access Azure Synapse Workspace AAD authentication - Azure Synapse Studio will request token to access each linked services as user. A separate token is acquired for each of the below services: 1. ADLS Gen2 2. Azure Synapse Analytics 3. Power BI 4. Spark – Spark Livy API 5. management.azure.com – resource provisioning 6. Develop artifacts – dev.workspace.net 7. Graph endpoints MSI authentication - Orchestration uses MSI auth for automation
  118. 118. Comprehensive security Category Feature Data protection Data in transit Data encryption at rest Data discovery and classification Access control Object level security (tables/views) Row level security Column level security Dynamic data masking SQL login Authentication Azure active directory Multi-factor authentication Virtual networks Network Ssecurity Firewall Azure ExpressRoute Threat detection Threat protection Auditing Vulnerability assessment
  119. 119. Azure Synapse Analytics Synapse SQL (serverless)
  120. 120. Discovery and exploration What’s in this file? How many rows are there? What’s the max value? SQL serverless reduces data lake exploration to the right-click Data transformation How to convert CSVs to Parquet quickly? How to transform the raw data? Use the full power of T-SQL to transform the data in the data lake
  121. 121. Overview An interactive query service that provides T-SQL queries over high scale data in Azure Storage. Benefits Serverless No infrastructure Pay only for query execution No ETL Offers security Data integration with Databricks, HDInsight T-SQL syntax to query data Supports data in various formats (Parquet, CSV, JSON) Support for BI ecosystem Azure Storage SQL Serverless Query Power BI Azure Data Studio SSMS DW Read and write data files Curate and transform data Sync table definitions Read and write data files Azure Synapse Analytics > SQL > SQL serverless
  122. 122. Azure Synapse Analytics Apache Spark for Synapse
  123. 123. Allows multiple languages in one notebook %%<Name of language> Offers use of temporary tables across languages Support for syntax highlight, syntax error, syntax code completion, smart indent, and code folding Export results Quickly create & configure notebooks
  124. 124. As notebook cells run, the underlying Apache Spark application status is shown, providing immediate feedback and progress tracking. Quickly create & configure notebooks
  125. 125. Break
  126. 126. Azure Synapse Analytics Data Integration and Synapse Pipelines
  127. 127. The data warehouse in the data-driven business Azure Synapse Analytics Azure Databricks Azure Data Lake Storage Business services Power BI Transform and enrich PrepareIngest Azure Data Factory
  128. 128. F’s execution engine • Data movement • Pipeline activity execution • SSIS package execution Azure Integration runtime Self-hosted Integration runtime Cloud services Apps & Data Pipeline SSIS package Command and control LEGEND Data Integration Runtime (IR) Azure Data Factory v2 Service Scheduling | Orchestration | Monitoring UX & SDK Authoring | Monitoring/Management
  129. 129. Serverless, scalable, hybrid data integration service Lift existing SQL Server ETL to Azure Use existing tools (SSMS, SSDT) Azure Data Factory Cloud and hybrid w/ 80+ connectors Up to 2 GB/s ETL/ELT in the cloud Seamlessly span on-prem, Azure, other clouds, SaaS Run on-demand, scheduled, or on-event data-availability Programmability with multi-language SDK Visual tools Data movement and transformation at scale Hybrid pipeline model Author and monitor SSIS package execution
  130. 130. Overview Linked services defines the connection information needed for pipelines to connect to external resources Benefits Offers 85+ pre-built connectors Allows easy cross platform data migration Represents data store or compute resources
  131. 131. No-code data transformation at scale Focus on building business logic and transforming data • Data cleansing, transformation, aggregation, conversion, etc. • Cloud scale via Spark execution • Resilient data flows with ease
  132. 132. Wrangling dataflow Code-free data preparation @scale
  133. 133. Best-in-class monitoring and management Monitor pipeline and activity runs Query runs with rich language Operational lineage between parent-child pipelines Azure Monitor Integration • Diagnostics logging • Metrics and alerts • Events Restate pipeline and activities
  134. 134. Use templates to quickly get started Quickly build data integration solutions Avoid rebuilding workflows— instantiate a template Improve developer productivity and reducing development time for repeat processes
  135. 135. Pipelines Overview It provides ability to load data from storage account to desired linked service. Load data by manual execution of pipeline or by orchestration Benefits Supports common loading patterns Fully parallel loading into data lake or SQL tables Graphical development experience
  136. 136. Triggers Overview Triggers represent a unit of processing that determines when a pipeline execution needs to be kicked off. Data Integration offers 3 trigger types as – 1. Schedule – gets fired at a schedule with information of start date, recurrence, end date 2. Event – gets fired on specified event 3. Tumbling window – gets fired at a periodic time interval from a specified start date, while retaining state It also provides ability to monitor pipeline runs and control trigger execution.
  137. 137. Handle upserts, updates, deletes on sql sinks Add new partition methods Add schema drift support Add file handling (move files after read, write files to file names described in rows, etc.) New inventory of functions (e.g. Hash functions for row comparison) Commonly used ETL patterns (Sequence generator/Lookup transformation/SCD…) Data lineage – Capturing sink column lineage & impact analysis (invaluable if this is for enterprise deployment) Implement commonly used ETL patterns as templates (SCD type1, type2, data vault) Data flow Capabilities
  138. 138. Prep and transform data Mapping dataflow Code free data transformation at scale Wrangling dataflow Code free data preparation at scale
  139. 139. Insights for all with Power BI + Azure Power up your BI with Azure Synapse
  140. 140. 2020 Gartner Magic Quadrant for Analytics and Business Intelligence Platforms
  141. 141. Where do you find yourself on the curve? Hindsight Insight Foresight Value Difficulty What happened? Descriptive Analysis Why did it happen? Diagnostic Analysis What will happen? Predictive Analysis How can we make it happen? Prescriptive Analysis
  142. 142. Where do you find yourself on the curve? Hindsight Insight Foresight Value Difficulty What happened? Descriptive Analysis Why did it happen? Diagnostic Analysis What will happen? Predictive Analysis How can we make it happen? Prescriptive Analysis BI
  143. 143. BI + Analytics unlock the door to AI, machine learning, and real-time insights Hindsight Insight Foresight Value Difficulty What happened? Descriptive Analysis Why did it happen? Diagnostic Analysis What will happen? Predictive Analysis How can we make it happen? Prescriptive Analysis AnalyticsBI
  144. 144. BI Bring together the best of both worlds with the market- leading BI service and the industry-leading analytics platform Power BI can analyze and visualize massive volumes of data Azure Synapse Analytics provides a scalable platform to enable real-time BI Analytics
  145. 145. Power BI can analyze and visualize massive volumes of data Azure Synapse Analytics provides a scalable platform to enable real-time BI Azure Machine Learning natively integrates with Azure Synapse & Power BI to democratize AI across your business BI Analytics Machine learning Bring together the best of both worlds with the market- leading BI service and the industry-leading analytics platform
  146. 146. Accelerate business value with a powerful analytics platform Business analysts IT professionals Data scientists Frictionless collaboration Unified analytics platform Advanced analytics and AI Powerful visualization and reporting Unmatched capabilities Business value Common Data Model on Azure Data Lake StorageUnified data Azure Synapse AnalyticsPower BI Powerful and integrated tooling Azure Machine Learning
  147. 147. Visualize and report Power BI Model & serve Azure Synapse Analytics CDM folders Azure Data Lake Storage Respond instantly Enable instant response times with Power BI Aggregations on massive datasets when querying at the aggregated level Get granular with your data Queries at the granular level are sent to Azure Synapse Analytics with DirectQuery leveraging its industry-leading performance Save money with industry- leading performance Azure Synapse Analytics is up to 14x faster and 94% cheaper than other cloud providers View reports with a single pane of glass Skip the configuration when connecting to Power BI with integrated Power BI-authoring directly in the Azure Synapse Studio Accelerate business value with a powerful analytics platform
  148. 148. Customers using Azure Synapse & Power BI today are transforming their business with purpose 27% Faster time to insights 271% Average ROI 26% Lower total cost of ownership 60% Increased customer satisfaction * Forrester, October 2019, “The Total Economic Impact of Microsoft Azure Analytics with Power BI”
  149. 149. Build Power BI dashboards directly from Azure Synapse Azure Synapse + Power BI integration
  150. 150. View published reports in Power BI workspace Azure Synapse + Power BI
  151. 151. Edit reports in Synapse workspace Azure Synapse + Power BI
  152. 152. Real-time publish on save Azure Synapse + Power BI
  153. 153. Power BI On Common Data Model
  154. 154. Coming Later This Summer Synapse will collect query patterns in order to create materialized views Composite Models Microsoft Information Protection improvements
  155. 155. Power BI Product Portfolio
  156. 156. Power BI service Cloud-based SaaS solutions Get started quickly Secure, live connection to your data sources, on-premises and in the cloud Auto insights and intuitive data exploration using natural language query Deliver insights through other services such as SharePoint, PowerApps & Teams Pre-built dashboards and reports for popular SaaS solutions Sharing and collaboration of dashboards, reports & datasets Live, real-time dashboard updates
  157. 157. Deliver insights through other services Collaborate and share insights with teams in your organization using existing services Fully interactive reports integrated into your service
  158. 158. Data Connectivity Modes in Power BI Desktop Import DirectQuery Live/Exploration Overview • ETL • Data download • Select specific tables • No data download • Queries triggered from Report visuals • Explore source objects from Report surface • No data download • Queries triggered from Report visuals Supported Data Sources • All sources (>80 sources) • SQL Server • Azure SQL Database • Azure SQL Data Warehouse • SAP HANA • Oracle • Teradata • SQL Server Analysis Services (Tabular & Multidimensional) Max # of data sources per report • Unlimited • One One Data Transformations • All transformations (100’s) • Partial support (varies by data source) None Mashup Capabilities • Merge (Joins) • Append (Union) • Parameterized queries • Merge (Joins) • Append (Union) None Modeling Capabilities • Relationships • Calculated Columns & Tables • Measures • Hierarchies • Calculated Columns • Measures • Change Column Types None With Power BI Desktop, you can connect to your data in three ways: • Import • DirectQuery • LiveConnect
  159. 159. Dedicated resources in the cloud Flexibility to license by capacity Greater scale and performance Extending on-premises capabilities Premium capacity – P3 Premium capacity – P2 Premium capacity – P1 My workspace User 2 My workspace User 3 App workspace Marketing App workspace Sales My workspace User 1 APIs Custom app Power BI service – Contoso organization Power BI Premium
  160. 160. Power BI Capacity Tiers
  161. 161. Collaboration vs. Consumption
  162. 162. Compare Reporting Options
  163. 163. A Walk Around Azure Synapse Studio My Azure Synapse Studio
  164. 164. Hands-on lab – Coming Soon! Build an end-to-end analytics solution in the Azure Synapse Studio
  165. 165. Exercise 1 - Explore the data lake with Azure Synapse SQL On-demand and Azure Synapse Spark Exercise 2 - Build a Modern Data Warehouse with Azure Synapse Pipelines Exercise 3 - Power BI integration Exercise 4 - High Performance Analysis with Azure Synapse SQL Pools Exercise 5 - Data Science with Azure Synapse Spark
  166. 166. Hands On Workshop Lab Sample
  167. 167. Analytics in a Day Thank You! James McAuliffe jmcauliffe@ccganalytics.com https://www.linkedin.com/in/jamesmcauliffesql/ https://ccganalytics.com/
  168. 168. Get Started Today Create a free Azure account and get started with Azure Synapse Analytics: https://azure.microsoft.com/en-us/free/synapse-analytics/ Get in touch with us: https://info.microsoft.com/ww-landing-contact-me-azure-analytics.html Learn more: https://aka.ms/synapse Get the Azure Synapse Analytics Toolkit
  169. 169. Power BI COVID Crisis Response Resources Power BI & COVID-19 Keeping citizens informed Find out more at: https://aka.ms/pbicovid19 Crisis Communications App https://aka.ms/crisis-communication-app-docs Emergency Response Solution https://aka.ms/emergency-response-doc
  170. 170. The Ignite Book of News https://news.microsoft.com/wp-content/uploads/prod/sites/563/2019/11/Ignite-2019-Book-of-News-2.pdf
  171. 171. Azure Synapse Analytics Get the Azure Synapse Analytics Toolkit Azure Synapse is Azure SQL Data Warehouse evolved Analytics Primer in 60 minutes with Microsoft Azure Accelerate Time to Analytics with Azure Synapse Analytics Build 2020 Data Warehouse in the Cloud Benchmark Overview of Microsoft Azure compliance Microsoft Compliance Offerings 2020 Gartner Magic Quadrant for Analytics and Business Intelligence Platforms The Digitization of the World from Edge to Core The Total Economic Impact of Microsoft Azure Analytics with Power BI Azure Data Factory Overview Power BI Governance Admin References and Links
  172. 172. Learning Links

×