SlideShare una empresa de Scribd logo
1 de 25
MS SQL 2019:
Big Data Processing
Andrii Zrobok
Chief Database Developer, EPAM
azrobok@gmail.com
Agenda
 MS SQL 2019 overview
 PolyBase: History, What, Why, Demo
 Big Data Cluster
 Scenarios
About me
25 + years of experience in database development: development data-centric
applications from scratch, support of legacy databases/applications, data migration
tasks, performance tuning, SSIS/ETL tasks, consulting, database trainer, etc.
Databases: FoxPro 2.0 for DOS (Fox Software), MS SQL Server (from version 6.5,
1996), Oracle, Sybase ASE, MySQL, PostgreSQL
Co-leader of Lviv Data Platform UG (PASS Local Chapter) (http://lvivsqlug.pass.org/)
Speaker at:
• PASS SQLSaturday conferences (Lviv, Kyiv, Dnipro, Odessa, Kharkiv; since 2013)
• PASS L’viv/Vinnitsa/Virtual SQL Server User Groups;
• EPAM IT Week 2015-2017
Nowadays challenges
 Unified access to all your data with unparalleled performance
 Easily and securely manage data big and small
 Build intelligent Apps and AI with all your data
MS SQL 2019 Preview
Windows: Standard version with PolyBase
Linux: Linux version without PolyBase
Docker: Database Engine Container Image (Ubuntu, Red Hat)
Big Data Analytics: Linux container on Kubernetes
https://www.microsoft.com/en-us/sql-server/sql-server-2019#Install
PolyBase: What?
SQL Server
PolyBase external tables / external data source
T-SQLApplications Analytics
Microsoft's newest technology for connecting to remote servers.
https://docs.microsoft.com/uk-ua/sql/relational-databases/polybase/polybase-
guide?view=sqlallproducts-allversions
PolyBase: History
 Introduced in SQL Server Parallel Data Warehouse (PDW) edition, back
in 2010
 Expanded in SQL Server Analytics Platform System (APS) in 2012.
 Released to the "general public" in SQL Server 2016, with most support
being in Enterprise Edition.
 Extended support for additional technologies (like Oracle, MongoDB,
etc.) will be available in SQL Server 2019.
PolyBase: Why?
 Without PolyBase
 Transfer half your data so that all your data was in one format or the other
 Query both sources of data, then write custom query logic to join and
integrate the data at the client level.
 With PolyBase
 using T-SQL to join the data (external table, statistics for external table)
 Usage
 Querying / Import (into table) / Export (into data storage)
 Performance
 Use computation on Target server (OPTION (FORCE EXTERNALPUSHDOWN))
PolyBase: Demo - tools
1) PolyBase should be installed and enabled
2) Using Management Studio (scripts, no visibility)
OR
3) Using Azure Data Studio + SQL Server 2019 (Preview) Extension
https://docs.microsoft.com/en-us/sql/azure-data-studio/download?view=sql-
server-2017
https://docs.microsoft.com/en-us/sql/azure-data-studio/sql-server-2019-
extension?view=sqlallproducts-allversions
PolyBase: Demo - steps
 Create master key (needed for password encryption)
 Create database scoped credential (access to remote database
server)
 Create external data source (address of remote database server)
 Create schema for external data (optional)
 Create external tables / statistics on external tables
PolyBase: Demo – external tables
CREATE DATABASE SCOPED CREDENTIAL OracleCredentials
WITH IDENTITY = 'system', Secret = '0x7ORA18c';
CREATE EXTERNAL DATA SOURCE OracleInstance
WITH (
LOCATION = 'oracle://192.168.1.103:1521',
CREDENTIAL = OracleCredentials);
CREATE EXTERNAL TABLE pb_oracle.countries
( country_id CHAR(2) NOT NULL
, country_name VARCHAR(40)
, region_id INTEGER )
WITH ( LOCATION='XE.EDU.COUNTRIES',
DATA_SOURCE=OracleInstance);
PolyBase: select from remote servers
SELECT
e.employee_id,
e.first_name,
e.last_name
,d.department_name
,l.city
,c.country_name
,r.region_name
FROM dbo.employees e
INNER JOIN dbo.departments d ON e.department_id = d.department_id
INNER JOIN dbo.locations l ON d.location_id = l.location_id
INNER JOIN pb_oracle.countries c ON c.country_id = l.country_id
INNER JOIN pb_sqlserver.regions r ON r.region_id = c.region_id
PolyBase: Remote Query
PolyBase: statistics
CREATE STATISTICS
CustomerCustKeyStatistics
ON pb_sqlserver.address
(stateprovinceid) WITH FULLSCAN;
SELECT DISTINCT a.city
from [pb_sqlserver].[address] a
where a.stateprovinceid = 9
PolyBase: externalpushdown
select stateprovinceid, count(*) from
pb_sqlserver.address group by stateprovinceid
select stateprovinceid, count(*) from
pb_sqlserver.address group by stateprovinceid
OPTION (DISABLE EXTERNALPUSHDOWN)
PolyBase: Scale – out groups
One node – up to 8 readers
Polybase extends the idea of
Massively Parallel Processing
(MPP) to SQL Server.
SQL Server is a classic "scale-up"
technology: if you want more
power, add more
RAM/CPUs/resources to the
single server.
Hadoop is a great example of an
MPP system: if you want more
power, add more servers; the
system will coordinate
processing.
Kubernetes Concepts
https://medium.com/@tsuyoshiushio/kubernetes-in-three-diagrams-6aba8432541c
Big data cluster architecture
Big data cluster component
Component Description
Control Plane The control plane provides management and security for the cluster.
It contains the Kubernetes master, the SQL Server master instance,
and other cluster-level services such as the Hive Metastore and Spark Driver.
Compute plane The compute plane provides computational resources to the cluster. It contains nodes running
SQL Server on Linux pods. The pods in the compute plane are divided into compute pools for
specific processing tasks. A compute pool can act as a PolyBase scale-out group for
distributed queries over different data sources-such as HDFS, Oracle, MongoDB, or Teradata.
Data plane The data plane is used for data persistence and caching. The SQL data pool consists of one or
more pods running SQL Server on Linux. It is used to ingest data from SQL queries or Spark
jobs. SQL Server big data cluster data marts are persisted in the data pool. The storage pool
consists of storage pool pods comprised of SQL Server on Linux, Spark, and HDFS. All the
storage nodes in a SQL Server big data cluster are members of an HDFS cluster.
Management
 Easy deploy and manage because of benefits of containers and
Kubernetes
 Fast to deploy
 Self contained (no installations required, images)
 Easy upgrade – new image uploading
 Scalable, multi-tenant
Scenarios: Data virtualization
By leveraging SQL Server
PolyBase SQL Server big data
clusters can query external
data sources without moving or
copying the data
Scenarios: Data Lake
A SQL Server big data cluster includes
a scalable HDFS storage pool. This can
be used to store big data, potentially
ingested from multiple external
sources. Once the big data is stored in
HDFS in the big data cluster, you can
analyze and query the data and
combine it with your relational data.
Scenarios: Scale-out datamart
SQL Server big data clusters provide
scale-out compute and storage to
improve the performance of analyzing
any data. Data from a variety of
sources can be ingested and
distributed across data pool nodes as a
cache for further analysis.
Scenarios: Integrated AI and ML
SQL Server big data clusters enable AI and machine learning tasks on the data
stored in HDFS storage pools and the data pools. You can use Spark as well as
built-in AI tools in SQL Server, using R, Python, Scala, or Java.
MS SQL Server 2019 & Big Data Processing
The end
Q&A
THANK YOU

Más contenido relacionado

La actualidad más candente

Qubole hadoop-summit-2013-europe
Qubole hadoop-summit-2013-europeQubole hadoop-summit-2013-europe
Qubole hadoop-summit-2013-europe
Joydeep Sen Sarma
 
NoSQL Options Compared
NoSQL Options ComparedNoSQL Options Compared
NoSQL Options Compared
Sergey Bushik
 

La actualidad más candente (20)

Azure Data Lake and U-SQL
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQL
 
Qubole hadoop-summit-2013-europe
Qubole hadoop-summit-2013-europeQubole hadoop-summit-2013-europe
Qubole hadoop-summit-2013-europe
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
Azure SQL DWH
Azure SQL DWHAzure SQL DWH
Azure SQL DWH
 
Azure data lake sql konf 2016
Azure data lake   sql konf 2016Azure data lake   sql konf 2016
Azure data lake sql konf 2016
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloud
 
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
SQOOP - RDBMS to Hadoop
SQOOP - RDBMS to HadoopSQOOP - RDBMS to Hadoop
SQOOP - RDBMS to Hadoop
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
 
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
 
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoDElephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
 
NoSQL Options Compared
NoSQL Options ComparedNoSQL Options Compared
NoSQL Options Compared
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
 
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
 
Visualizing big data in the browser using spark
Visualizing big data in the browser using sparkVisualizing big data in the browser using spark
Visualizing big data in the browser using spark
 
Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentation
 

Similar a Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"

Professional Portfolio
Professional PortfolioProfessional Portfolio
Professional Portfolio
MoniqueO Opris
 

Similar a Andriy Zrobok "MS SQL 2019 - new for Big Data Processing" (20)

Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionMicrosoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
 
DAC4B 2015 - Polybase
DAC4B 2015 - PolybaseDAC4B 2015 - Polybase
DAC4B 2015 - Polybase
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Azure Data platform
Azure Data platformAzure Data platform
Azure Data platform
 
Big Data: SQL on Hadoop from IBM
Big Data:  SQL on Hadoop from IBM Big Data:  SQL on Hadoop from IBM
Big Data: SQL on Hadoop from IBM
 
Exploring sql server 2016 bi
Exploring sql server 2016 biExploring sql server 2016 bi
Exploring sql server 2016 bi
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Whats New Sql Server 2008 R2 Cw
Whats New Sql Server 2008 R2 CwWhats New Sql Server 2008 R2 Cw
Whats New Sql Server 2008 R2 Cw
 
Whats New Sql Server 2008 R2
Whats New Sql Server 2008 R2Whats New Sql Server 2008 R2
Whats New Sql Server 2008 R2
 
Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018  SQL Server 2019 big data clusters - intro sessionMicrosoft ignite 2018  SQL Server 2019 big data clusters - intro session
Microsoft ignite 2018 SQL Server 2019 big data clusters - intro session
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure Infrastructures
 
SQL Server 2019 Big Data Cluster
SQL Server 2019 Big Data ClusterSQL Server 2019 Big Data Cluster
SQL Server 2019 Big Data Cluster
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)Data Integration through Data Virtualization (SQL Server Konferenz 2019)
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
 
Resume
ResumeResume
Resume
 
Introduction to microsoft sql server 2008 r2
Introduction to microsoft sql server 2008 r2Introduction to microsoft sql server 2008 r2
Introduction to microsoft sql server 2008 r2
 
Mysql
MysqlMysql
Mysql
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Professional Portfolio
Professional PortfolioProfessional Portfolio
Professional Portfolio
 

Más de Lviv Startup Club

Más de Lviv Startup Club (20)

Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
 
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
 
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
 
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
 
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
 
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
 
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
 
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
 
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
 
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
 
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
 
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
 
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
 
Nataliya Kryvonis: Essential soft skills to lead your team (UA)
Nataliya Kryvonis: Essential soft skills to lead your team (UA)Nataliya Kryvonis: Essential soft skills to lead your team (UA)
Nataliya Kryvonis: Essential soft skills to lead your team (UA)
 
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
 
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
 
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
 
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
 
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
 

Último

unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
Abortion pills in Kuwait Cytotec pills in Kuwait
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
dlhescort
 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
dlhescort
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
daisycvs
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
amitlee9823
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 

Último (20)

unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023
 
Falcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in indiaFalcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in india
 
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort ServiceEluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors Data
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Century
 
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLBAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentation
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptx
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 

Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"

  • 1. MS SQL 2019: Big Data Processing Andrii Zrobok Chief Database Developer, EPAM azrobok@gmail.com
  • 2. Agenda  MS SQL 2019 overview  PolyBase: History, What, Why, Demo  Big Data Cluster  Scenarios
  • 3. About me 25 + years of experience in database development: development data-centric applications from scratch, support of legacy databases/applications, data migration tasks, performance tuning, SSIS/ETL tasks, consulting, database trainer, etc. Databases: FoxPro 2.0 for DOS (Fox Software), MS SQL Server (from version 6.5, 1996), Oracle, Sybase ASE, MySQL, PostgreSQL Co-leader of Lviv Data Platform UG (PASS Local Chapter) (http://lvivsqlug.pass.org/) Speaker at: • PASS SQLSaturday conferences (Lviv, Kyiv, Dnipro, Odessa, Kharkiv; since 2013) • PASS L’viv/Vinnitsa/Virtual SQL Server User Groups; • EPAM IT Week 2015-2017
  • 4. Nowadays challenges  Unified access to all your data with unparalleled performance  Easily and securely manage data big and small  Build intelligent Apps and AI with all your data
  • 5. MS SQL 2019 Preview Windows: Standard version with PolyBase Linux: Linux version without PolyBase Docker: Database Engine Container Image (Ubuntu, Red Hat) Big Data Analytics: Linux container on Kubernetes https://www.microsoft.com/en-us/sql-server/sql-server-2019#Install
  • 6. PolyBase: What? SQL Server PolyBase external tables / external data source T-SQLApplications Analytics Microsoft's newest technology for connecting to remote servers. https://docs.microsoft.com/uk-ua/sql/relational-databases/polybase/polybase- guide?view=sqlallproducts-allversions
  • 7. PolyBase: History  Introduced in SQL Server Parallel Data Warehouse (PDW) edition, back in 2010  Expanded in SQL Server Analytics Platform System (APS) in 2012.  Released to the "general public" in SQL Server 2016, with most support being in Enterprise Edition.  Extended support for additional technologies (like Oracle, MongoDB, etc.) will be available in SQL Server 2019.
  • 8. PolyBase: Why?  Without PolyBase  Transfer half your data so that all your data was in one format or the other  Query both sources of data, then write custom query logic to join and integrate the data at the client level.  With PolyBase  using T-SQL to join the data (external table, statistics for external table)  Usage  Querying / Import (into table) / Export (into data storage)  Performance  Use computation on Target server (OPTION (FORCE EXTERNALPUSHDOWN))
  • 9. PolyBase: Demo - tools 1) PolyBase should be installed and enabled 2) Using Management Studio (scripts, no visibility) OR 3) Using Azure Data Studio + SQL Server 2019 (Preview) Extension https://docs.microsoft.com/en-us/sql/azure-data-studio/download?view=sql- server-2017 https://docs.microsoft.com/en-us/sql/azure-data-studio/sql-server-2019- extension?view=sqlallproducts-allversions
  • 10. PolyBase: Demo - steps  Create master key (needed for password encryption)  Create database scoped credential (access to remote database server)  Create external data source (address of remote database server)  Create schema for external data (optional)  Create external tables / statistics on external tables
  • 11. PolyBase: Demo – external tables CREATE DATABASE SCOPED CREDENTIAL OracleCredentials WITH IDENTITY = 'system', Secret = '0x7ORA18c'; CREATE EXTERNAL DATA SOURCE OracleInstance WITH ( LOCATION = 'oracle://192.168.1.103:1521', CREDENTIAL = OracleCredentials); CREATE EXTERNAL TABLE pb_oracle.countries ( country_id CHAR(2) NOT NULL , country_name VARCHAR(40) , region_id INTEGER ) WITH ( LOCATION='XE.EDU.COUNTRIES', DATA_SOURCE=OracleInstance);
  • 12. PolyBase: select from remote servers SELECT e.employee_id, e.first_name, e.last_name ,d.department_name ,l.city ,c.country_name ,r.region_name FROM dbo.employees e INNER JOIN dbo.departments d ON e.department_id = d.department_id INNER JOIN dbo.locations l ON d.location_id = l.location_id INNER JOIN pb_oracle.countries c ON c.country_id = l.country_id INNER JOIN pb_sqlserver.regions r ON r.region_id = c.region_id
  • 14. PolyBase: statistics CREATE STATISTICS CustomerCustKeyStatistics ON pb_sqlserver.address (stateprovinceid) WITH FULLSCAN; SELECT DISTINCT a.city from [pb_sqlserver].[address] a where a.stateprovinceid = 9
  • 15. PolyBase: externalpushdown select stateprovinceid, count(*) from pb_sqlserver.address group by stateprovinceid select stateprovinceid, count(*) from pb_sqlserver.address group by stateprovinceid OPTION (DISABLE EXTERNALPUSHDOWN)
  • 16. PolyBase: Scale – out groups One node – up to 8 readers Polybase extends the idea of Massively Parallel Processing (MPP) to SQL Server. SQL Server is a classic "scale-up" technology: if you want more power, add more RAM/CPUs/resources to the single server. Hadoop is a great example of an MPP system: if you want more power, add more servers; the system will coordinate processing.
  • 18. Big data cluster architecture
  • 19. Big data cluster component Component Description Control Plane The control plane provides management and security for the cluster. It contains the Kubernetes master, the SQL Server master instance, and other cluster-level services such as the Hive Metastore and Spark Driver. Compute plane The compute plane provides computational resources to the cluster. It contains nodes running SQL Server on Linux pods. The pods in the compute plane are divided into compute pools for specific processing tasks. A compute pool can act as a PolyBase scale-out group for distributed queries over different data sources-such as HDFS, Oracle, MongoDB, or Teradata. Data plane The data plane is used for data persistence and caching. The SQL data pool consists of one or more pods running SQL Server on Linux. It is used to ingest data from SQL queries or Spark jobs. SQL Server big data cluster data marts are persisted in the data pool. The storage pool consists of storage pool pods comprised of SQL Server on Linux, Spark, and HDFS. All the storage nodes in a SQL Server big data cluster are members of an HDFS cluster.
  • 20. Management  Easy deploy and manage because of benefits of containers and Kubernetes  Fast to deploy  Self contained (no installations required, images)  Easy upgrade – new image uploading  Scalable, multi-tenant
  • 21. Scenarios: Data virtualization By leveraging SQL Server PolyBase SQL Server big data clusters can query external data sources without moving or copying the data
  • 22. Scenarios: Data Lake A SQL Server big data cluster includes a scalable HDFS storage pool. This can be used to store big data, potentially ingested from multiple external sources. Once the big data is stored in HDFS in the big data cluster, you can analyze and query the data and combine it with your relational data.
  • 23. Scenarios: Scale-out datamart SQL Server big data clusters provide scale-out compute and storage to improve the performance of analyzing any data. Data from a variety of sources can be ingested and distributed across data pool nodes as a cache for further analysis.
  • 24. Scenarios: Integrated AI and ML SQL Server big data clusters enable AI and machine learning tasks on the data stored in HDFS storage pools and the data pools. You can use Spark as well as built-in AI tools in SQL Server, using R, Python, Scala, or Java.
  • 25. MS SQL Server 2019 & Big Data Processing The end Q&A THANK YOU

Notas del editor

  1. Big Data Clusters The latest version simplifies big data analytics for SQL Server users. The new SQL server combines HDFS (the Hadoop Distributed File System) and Apache Spark and provides one integrated system. It provides the facility of data virtualization by integrating data without extracting , transforming and loading it. Big data clusters are difficult to deploy but if you have Kubernetes infrastructure, a single command will deploy your big data cluster in about half an hour.
  2. Polybase is Microsoft's newest technology for connecting to remote servers. It started by letting you connect to Hadoop and has expanded since then to include Azure Blob Storage. Polybase is also the best method to load data into Azure SQL Data Warehouse. The PolyBase product which was in earlier version too has been expanded. Sql server can now support queries from external sources like Oracle, Teradata, MongoDB which as a result increases the flexibility of the sql server
  3. Polybase lets SQL Server compute nodes talk directly to Hadoop data nodes, perform aggregations, and then return results to the head node. This removes the classic SQL Server single point of contention.
  4. Kubernetes enable you to use the cluster as if it is single PC. You don’t need to care the detail of the infrastructure. Just declare the what you want in yaml file, you will get what you want Cluster A Kubernetes cluster is a set of machines, known as nodes. One node controls the cluster and is designated the master node; the remaining nodes are worker nodes. The Kubernetes master is responsible for distributing work between the workers, and for monitoring the health of the cluster. Node A node runs containerized applications. It can be either a physical machine or a virtual machine. A Kubernetes cluster can contain a mixture of physical machine and virtual machine nodes. Pod A pod is the atomic deployment unit of Kubernetes. A pod is a logical group of one or more containers-and associated resources-needed to run an application. Each pod runs on a node; a node can run one or more pods. The Kubernetes master automatically assigns pods to nodes in the cluster. In SQL Server big data clusters, Kubernetes is responsible for the state of the SQL Server big data clusters; Kubernetes builds and configures the cluster nodes, assigns pods to nodes, and monitors the health of the cluster.
  5. Big Data Clusters The latest version simplifies big data analytics for SQL Server users. The new SQL server combines HDFS (the Hadoop Distributed Filing System) and Apache Spark and provides one integrated system. It provides the facility of data virtualization by integrating data without extracting , transforming and loading it. Big data clusters are difficult to deploy but if you have Kubernetes infrastructure, a single command will deploy your big data cluster in about half an hour. A SQL Server big data cluster is a cluster of Linux containers orchestrated by Kubernetes. Starting with SQL Server 2019 preview, SQL Server big data clusters allow you to deploy scalable clusters of SQL Server, Spark, and HDFS containers running on Kubernetes. These components are running side by side to enable you to read, write, and process big data from Transact-SQL or Spark, allowing you to easily combine and analyze your high-value relational data with high-volume big data. Control plane The control plane provides management and security for the cluster. It contains the Kubernetes master, the SQL Server master instance, and other cluster-level services such as the Hive Metastore and Spark Driver. Compute plane The compute plane provides computational resources to the cluster. It contains nodes running SQL Server on Linux pods. The pods in the compute plane are divided into compute pools for specific processing tasks. A compute pool can act as a PolyBase scale-out group for distributed queries over different data sources-such as HDFS, Oracle, MongoDB, or Teradata. Data plane The data plane is used for data persistence and caching. It contains the SQL data pool, and storage pool. The SQL data pool consists of one or more pods running SQL Server on Linux. It is used to ingest data from SQL queries or Spark jobs. SQL Server big data cluster data marts are persisted in the data pool. The storage pool consists of storage pool pods comprised of SQL Server on Linux, Spark, and HDFS. All the storage nodes in a SQL Server big data cluster are members of an HDFS cluster.
  6. Data virtualization: By leveraging SQL Server PolyBase, SQL Server big data clusters can query external data sources without moving or copying the data.
  7. Data lake A SQL Server big data cluster includes a scalable HDFS storage pool. This can be used to store big data, potentially ingested from multiple external sources. Once the big data is stored in HDFS in the big data cluster, you can analyze and query the data and combine it with your relational data.
  8. Data virtualization: By leveraging SQL Server PolyBase, SQL Server big data clusters can query external data sources without moving or copying the data. SQL Server 2019 preview introduces new connectors to data sources. Data lake A SQL Server big data cluster includes a scalable HDFS storage pool. This can be used to store big data, potentially ingested from multiple external sources. Once the big data is stored in HDFS in the big data cluster, you can analyze and query the data and combine it with your relational data. Scale-out data mart SQL Server big data clusters provide scale-out compute and storage to improve the performance of analyzing any data. Data from a variety of sources can be ingested and distributed across data pool nodes as a cache for further analysis.
  9. Integrated AI and Machine Learning SQL Server big data clusters enable AI and machine learning tasks on the data stored in HDFS storage pools and the data pools. You can use Spark as well as built-in AI tools in SQL Server, using R, Python, Scala, or Java.