Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

DAC4B 2015 - Polybase

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 37 Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a DAC4B 2015 - Polybase (20)

Anuncio

Más de Łukasz Grala (20)

Más reciente (20)

Anuncio

DAC4B 2015 - Polybase

  1. 1. Polybase for SQL Server 2016 Lukasz Grala Architect Data Platform & BI Solutions | MVP SQL Server
  2. 2. Łukasz Grala MVP SQL Server | MCT | MCSE • Architekt i trener - Data Platform & Business Intelligence Solutions • Prelegent na licznych konferencjach • Wykładowca i autor publikacji • Posiada liczne certyfikaty • Od 2010 roku wyróżniony nagrodą MVP w kategorii SQL Server • Lider PLSSUG Poznań (lukasz.grala@plssug.org.pl) • Doktorant na Politechnice Poznańskiej Łukasz Grala - Architekt i Trener - MVP SQL Server - Kontakt: lukasz@grala.it lub lukasz@sqlexpert.pl
  3. 3. There’s an opportunity to drive smarter decisions with data The explosion of data sources... …drives an explosion of data …which drives businesses to learn more 2013-2020 CAGR = 41% 25B 4.0B1.3B 2010 2013 2020 Source: Forecast: Internet of Things, Endpoints and Associated Services, Worldwide, 2014. Gartner. Oct 20 2014 Source: IDC “Digital Universe”, Dec. 2012
  4. 4. Operational Database Management Systems Data Warehouse Database Management Systems Business Intelligence and Analytics Platforms x86 Server Virtualization Cloud Infrastructure as a Service Enterprise Application Platform as a Service Public Cloud Storage Leader in 2014 for Gartner Magic Quadrants Microsoft platform leads the way on-premises and cloud
  5. 5. The Microsoft data platform Internal & external DashboardsReports Ask Mobile Information managementOrchestration Extract, transform, load Prediction Relational Non-relational Analytical Apps Streaming 
  6. 6. Do more. Achieve more.
  7. 7. Evolving Approaches to Analytics ETL Tool (SSIS, etc) EDW (SQL Svr, Teradata, etc) Extract Original Data Load Transformed Data Transform BI Tools Data Marts Data Lake(s) Dashboards Apps
  8. 8. ETL Tool (SSIS, etc) EDW (SQL Svr, Teradata, etc) Extract Original Data Load Transformed Data Transform BI Tools Ingest (EL) Original Data Data Marts Data Lake(s) Dashboards Apps Evolving Approaches to Analytics
  9. 9. ETL Tool (SSIS, etc) EDW (SQL Svr, Teradata, etc) Extract Original Data Load Transformed Data Transform BI Tools Ingest (EL) Original Data Scale-out Storage & Compute (HDFS, Blob Storage, etc) Transform & Load Data Marts Data Lake(s) Dashboards Apps Streaming data Evolving Approaches to Analytics
  10. 10. ETL Tool (SSIS, etc) EDW (SQL Svr, Teradata, etc) Extract Original Data Load Transformed Data Transform BI Tools Ingest (EL) Original Data Scale-out Storage & Compute (HDFS, Blob Storage, etc) Transform & Load Data Marts Data Lake(s) Dashboards Apps Streaming data Evolving Approaches to Analytics
  11. 11. SQL Server and Azure DocumentDB Schema-free NoSQL document store Scalable transactional processing for rapidly changing apps Premium relational DB capable to exchange data with modern apps & services Derives unified insights from structured/unstructured data JSON JS JSJSON
  12. 12. PolyBase for SQL Server 2016
  13. 13. The Hadoop Ecosystem Management & Monitoring (Ambari) Coordination (ZooKeeper) Workflow&Scheduling (Oozie) Scripting (Pig) Machine Learning (Mahout) Query (Hive) Distributed Processing (MapReduce) Distributed Storage (HDFS) NoSQLDatabase (HBase) DataIntegration (Sqoop/REST/ODBC)
  14. 14. Query relational and non-relational data, on-premises and in Azure Apps T-SQL query SQL Server Hadoop PolyBase Query relational and non-relational data with T-SQL
  15. 15. Prerequisites for installing PolyBase 64-bit SQL Server Evaluation edition Microsoft .NET Framework 4.0. Oracle Java SE RunTime Environment (JRE) version 7.51 or higher NOTE: Java JRE version 8 does not work. Minimum memory: 4GB Minimum hard disk space: 2GB
  16. 16. Using the installation wizard for PolyBase Run SQL Server Installation Center. (Insert SQL Server installation media and double-click Setup.exe.) Click Installation, then click New Standalone SQL Server installation or add features On the feature selection page, select PolyBase Query Service for External Data. On the Server Configuration Page, configure the PolyBase Engine Service and PolyBase Data Movement Service to run under the same account.
  17. 17. -- Run sp_configure ‘hadoop connectivity’ -- and set an appropriate value sp_configure @configname = 'hadoop connectivity', @configvalue = 7; GO RECONFIGURE GO -- List the configuration settings for -- one configuration name sp_configure @configname='hadoop connectivity'; GO Option values 0: Disable Hadoop connectivity 1: Hortonworks HDP 1.3 on Windows Server Azure blob storage (WASB[S]) 2: Hortonworks HDP 1.3 on Linux 3: Cloudera CDH 4.3 on Linux 4: Hortonworks HDP 2.0 on Windows Server Azure blob storage (WASB[S]) 5: Hortonworks HDP 2.0 on Linux 6: Cloudera 5.1 on Linux 7: Hortonworks 2.1 and 2.2 on Linux Hortonworks 2.2 on Windows Server Azure blob storage (WASB[S]) Choose Hadoop data source with sp_configure
  18. 18. Start the PolyBase services After running for sp_configure, you must stop and restart the SQL Server engine service Run services.msc Find the services shown below and stop each one Restart the services
  19. 19. -- Using credentials on database requires enabling -- traceflag DBCC TRACEON(4631,-1) -- Create a master key CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'S0me!nfo'; CREATE CREDENTIAL WASBSecret ON DATABASE WITH IDENTITY = 'pdw_user', Secret = 'mykey=='; -- Create an external data source (Azure Blob Storage) -- with the credential CREATE EXTERNAL DATA SOURCE Azure_Storage WITH ( TYPE = HADOOP, LOCATION ='wasb[s]://mycontainer@test.blob.core.windows.net/pat h’, CREDENTIAL = WASBSecret ) Type methods for providing credentials Core-site.xml in installation path of SQL Server - <SqlBinRoot>PolybaseHadoopConf Credential object in SQL Server for higher security NOTE: The syntax for a database-scoped credential (CREATE CREDENTIAL … ON DATABASE) is temporary and will change in the next release. This new feature is documented only in the examples in the CTP2 content, and will be fully documented in the next release. Configure PolyBase for Azure blob storage
  20. 20. -- Create an external data source (Hadoop) CREATE EXTERNAL DATA SOURCE hdp2 with ( TYPE = HADOOP, LOCATION ='hdfs://10.xxx.xx.xxx:xxxx', RESOURCE_MANAGER_LOCATION='10.xxx.xx.xxx:xxxx') CTP2 supports the following Hadoop distributions Hortonworks HDP 1.3, 2.0, 2.1, 2.2 for both Windows and Linux Cloudera CDH 4.3, 5.1 on Linux Create a reference to a Hadoop cluster
  21. 21. -- Create an external file format -- (delimited text file) CREATE EXTERNAL FILE FORMAT ff2 WITH ( FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT = TRUE)) CTP2 supports the following file formats Delimited text Hive RCFile Hive ORC Define the external file format
  22. 22. -- Create an external table pointing to file stored in Hadoop CREATE EXTERNAL TABLE [dbo].[CarSensor_Data] ( [SensorKey] int NOT NULL, [CustomerKey] int NOT NULL, [GeographyKey] int NULL, [Speed] float NOT NULL, [YearMeasured] int NOT NULL ) WITH (LOCATION='/Demo/car_sensordata.tbl', DATA_SOURCE = hdp2, FILE_FORMAT = ff2, REJECT_TYPE = VALUE, REJECT_VALUE = 0 The external table provides a T- SQL reference to the data source used to: Query Hadoop or Azure blob storage data with Transact-SQL statements Import and store data from Hadoop or Azure blob storage into your SQL Server database Create an external table to the data source
  23. 23. -- Create statistics on an external table. CREATE STATISTICS StatsForSensors ON CarSensor_Data(CustomerKey, Speed) In CTP2, you can optimize query execution against the external tables using statistics Optimize queries by adding statistics
  24. 24. SELECT DISTINCT C.FirstName, C.LastName, C.MaritalStatus FROM Insurance_Customer_SQL INNER JOIN ( SELECT * FROM SensorData_ExternalHDP WHERE Speed > 35 UNION ALL SELECT * FROM SensorData_ExternalHDP2 WHERE Speed > 35 ) AS SensorD ON C.CustomerKey = SensorD.CustomerKey External tables referring to data in 2 HDP Hadoop clusters SQL Server table Query Capabilities (1) Joining relational and external data
  25. 25. SELECT DISTINCT C.FirstName, C.LastName, C.MaritalStatus FROM Insurance_Customer_SQL -- table in SQL Server … OPTION (FORCE EXTERNALPUSHDOWN) – push-down computation CREATE EXTERNAL DATA SOURCES ds-hdp WITH .( TYPE = Hadoop, LOCATION = “hdfs://10.193.27.52:8020”, Resources_Manager_Location = ‘10.193.27.52:8032’); Query Capabilities (2) Push-Down Computation
  26. 26. Compute Nodes Engine Service PDW Appliance DMS DB DMS DB DMS DB DMS Control Node Namenode (HDFS) File System File System File System File System File System File System Linux ClusterWindows Cluster Hadoop cluster can be either on premise or in the cloudHadoop cluster can be either: Many popular HDFS file formats supported (text, RC, ORC, … ) Hadoop Cluster Hortonworks or Cloudera Hadoop distributions Typical PolyBase Setup Text Format RCFile Format ORCFile Format Parquet Format (future)…
  27. 27. Hadoop Cluster #2 CREATE EXTERNAL DATA SOURCE AV_DFS_CLUSTER WITH (TYPE= HADOOP, …) CREATE EXTERNAL DATA SOURCE GSL_HDFS_CLUSTER WITH (TYPE= HADOOP, …) Azure Storage Volume Azure Storage Volume Azure Storage Volume Azure Attach a Hadoop Cluster
  28. 28. 1) Use MR job to find Orders > $1000 A Better Query Plan 2) Import result into SQL vNext instances on Compute nodes 4) Perform join in parallel on compute nodes 3) Redistribute Customers table from Head Node to Compute Nodes Select C.Name from Customers C, Orders O where C.id = O.CustId and P.price > $1000
  29. 29. e.g., cleansing data before loading it e.g., joining relational tables w/ streams of tweets e.g., mine sensor data for predictive analytics
  30. 30. Example #1: Price policies Structured Customer data (Non-Relational Sensor data Sensor data from cars (kept in Hadoop) Relational data (kept in SQL Server PDW/APS) (based on driver behavior)
  31. 31. Example #2: Basket Analysis of online shoppers Structured Product data (Non-Relational Social Media (kept in Hadoop) Product data (kept in SQL Server PDW/APS) (based on social media behavior)
  32. 32. Example #3: Drilling rig analysis Rig new (‘hot’) data Rig old sensor data Monitoring & functioning of rig (kept in Hadoop) More recent data (kept in SQL Server PDW/APS)
  33. 33. Mobile BI apps for SQL Server – formally Datazen For on-premises implementations - optimized for SQL Server. Rich, interactive data visualization on all major mobile platforms No additional cost for SQL Server Enterprise Edition customers 2008 or later and Software Assurance
  34. 34. Data visualization and publishing Create beautiful visualizations and KPIs with a touch-based designer Connect to the Mobile BI for SQL Server server to access SQL Server data Publish for access by others
  35. 35. Datazen architecture overview Server, Publisher and Mobile apps SQL Server Analysis Services File Data sources Authentication Internet boundary Viewer Apps Publisher App Web browser Datazen Enterprise Server Enterprise environment
  36. 36. Mobile BI for SQL Server summary Beautiful visualizations KPI repository Create once - publish to any device Native apps for all platforms Perfect scaling to any form factor Custom branding Collaborate on the go
  37. 37. © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Notas del editor

  • Some key trends that are impacting how we are designing the Microsoft data platform. One of the most important is how fast data is growing, which has surprised many with a rate of 41% CAGR. But why? Its cyclical as there is an explosion of data sources through more devices combined with a new hunger business have for more data to better understand and predict their customers needs, so they can make smarter decisions with data.
  • Magic Quadrant leader in Operational Database Management Systems http://www.datastax.com/gartner-magic-quadrant-odbms
    https://www.gartner.com/doc/2877117/magic-quadrant-operational-database-management (paywall)

    Magic Quadrant leader in Data Warehouse Database Management Systems
    http://www.odbms.org/2014/03/2014-gartner-magic-quadrant-data-warehouse-database-management-systems/
    https://www.gartner.com/doc/2678018/magic-quadrant-data-warehouse-database (paywall)

    Magic Quadrant leader in Business Intelligence and Analytics Platforms
    http://www.tableausoftware.com/gartner-magic-quadrant-2014
    https://www.gartner.com/doc/2668318/magic-quadrant-business-intelligence-analytics (paywall)

    Magic Quadrant for x86 Server Virtualization
    http://tcwd.net/vblog/wp-content/uploads/2014/07/2014-3year.png
    http://www.gartner.com/technology/reprints.do?id=1-1WR6HLK&ct=140703&st=sb

    Magic Quadrant for Cloud Infrastructure as a Service
    http://www.gartner.com/technology/reprints.do?id=1-1UM941C&ct=140529&st=sb

    Magic Quadrant for Enterprise Application Platform as a Service
    http://www.gartner.com/technology/reprints.do?id=1-1P502BX&ct=140108&st=sb

    Gartner Magic Quadrant for Public Cloud Storage
    http://www.gartner.com/technology/reprints.do?id=1-1WWSLMM&ct=140709&st=sb


  • With SQL Server v-next we are continuing to invest in three major areas. We are going to continue push the envelope on mission critical performance, with SQL Server 2014 we began to pull away from the Tier 1 competitors with innovative in-memory technology built-in. Next release we will further our lead with new innovations across many mission critical components. When it comes to providing insights on data, we are making significant investment both on premises and complimentary services via Azure to help you gain deeper insights across your data. Finally we are adding new hybrid capabilities that will compliment your on-prem investments and give you the ability to take advantage of our hyperscale cloud.
  • See TechNet Virtual Lab “Exploring SQL Server 2016 support for JSON data” - http://go.microsoft.com/?linkid=9898458
  • When it comes to key BI investments we are making it much easier to manage relational and non-relational data with Polybase technology that allows you to query Hadoop data and SQL Server relational data through single T-SQL query. One of the challenges we see with Hadoop is there are not enough people out there with Hadoop and Map Reduce skillset and this technology simplifies the skillset needed to manage Hadoop data. This can also work across your on-premises environment or SQL Server running in Azure.
  • Source: https://msdn.microsoft.com/en-us/library/mt163689(v=sql.130).aspx
  • Source: https://msdn.microsoft.com/en-us/library/mt163689(v=sql.130).aspx
  • Source: https://msdn.microsoft.com/en-us/library/mt163689(v=sql.130).aspx
  • Source: https://msdn.microsoft.com/en-us/library/mt163689(v=sql.130).aspx
  • Source: https://msdn.microsoft.com/en-us/library/mt163689(v=sql.130).aspx

  • Source: https://msdn.microsoft.com/en-us/library/dn935022(v=sql.130).aspx

    Creates a PolyBase external data source for data stored in Hadoop File System (HDFS) or Azure blob storage (WASB). Use this for PolyBase scenarios that integrate SQL-based products with Hadoop or Azure blob storage (WASB).
  • Source: https://msdn.microsoft.com/en-us/library/dn935026(v=sql.130).aspx

    Creates a PolyBase external file format definition for external data stored in Hadoop or Azure blob storage. Like the external data source, creating an external file format is a prerequisite for creating an external table. By creating an external file format, you specify the actual layout of the data referenced by an external table.
    PolyBase supports these file formats:
    Delimited text,
    Hive RCFile, and
    Hive ORC
  • Source: https://msdn.microsoft.com/en-us/library/dn935021(v=sql.130).aspx

    Creates a PolyBase external table that references data stored in a Hadoop cluster or Azure blob storage.

    Use an external table to:
    Query Hadoop or Azure blob storage data with Transact-SQL statements.
    Import and store data from Hadoop or Azure blob storage into your SQL Server database.
  • Source: https://msdn.microsoft.com/en-us/library/ms188038(v=sql.130).aspx

    Creates query optimization statistics on one or more columns of a table, an indexed view, or an external table. For most queries, the query optimizer already generates the necessary statistics for a high-quality query plan; in a few cases, you need to create additional statistics with CREATE STATISTICS or modify the query design to improve query performance.

    To learn more, see Statistics.

  • Source: http://blogs.technet.com/b/dbtechresource/archive/2015/04/26/business-intelligence-reporting-through-datazen.aspx

    Introduction
    Datazen Software – industry leader in mobile business intelligence and data analytics – was acquired by Microsoft in April of 2015. The acquisition accelerates Microsoft's strategy to help every company create a data culture and ensure insights reach every individual in every organization. Datazen’s product team has joined Microsoft as a part of this acquisition and continues to develop and evolve this technology. Datazen Windows 8 app enables dashboard creation and publishing based on Excel, cloud and enterprise data sources. After publishing to a Datazen Server, dashboards and KPIs are accessible on any device via its native app, or through any major browser.

×