SlideShare una empresa de Scribd logo
1 de 25
It takes two to tango! Is SQL-on-Hadoop the next big step?
Big Data Crunching A Retrospective
Three Phases
What was it like before Hadoop?
ThePhylogeneticTreeofElephants
Partitioned or Sharded RDBMSs
Data Warehouses
Massively Parallel Databases
Tech before Hadoop
Massively Parallel Databases
Shared Nothing Architecture
Hadoop - Early days
Acceptance Life Cycle
Acceptance
Exploration
Resistance
Complementary over Competitive
Split by Structure
What’s the best way to answer questions that span these
two worlds?
Can we interface SQL atop Hadoop?
Can we combine the strengths of parallel databases with
those of Hadoop?
SQL-on-Hadoop : Technology
Distributed Query Processing
Cloudera’s Impala
MapR supported Apache Drill and more..
Split Query Processing
Microsoft Polybase
Hadapt
SQL-on-Hadoop : Technical Approaches
Faster Hive
Hortonworks’ Stinger initiative
Qubole’s Hive-on-the-Cloud
Distributed Query Processing
Cloudera Impala : Architecture
Clients
Impala Shell JDBC/ODBC Client SQL Tools
Data Node Data Node
Impala Daemon Impala Daemon Impala Daemon
Data Node
Query Execution
Query Planning
Query Coordination
Query Execution
Query Planning
Query Coordination
Query Execution
Query Planning
Query Coordination
State StoreMetadata Catalog HDFS Name Node
Unified Metadata Store
Life Cycle of an Impala Query
Clients
Impala Shell JDBC/ODBC Client SQL Tools
Impala Daemon
Data Node
State StoreMetadata Catalog HDFS Name Node
Impala Daemon
Data Node
Impala Daemon
Data Node
Impala Daemon
Data Node
Coordinate Execution
Plan and Optimize
Parse Query
Split Query Processing
Polybase + PDW : Architecture
Clients
ADO.NET JDBC/ODBC Client OLEDB
PDW Engine Service DMS Controller Loader Manager SQL Server
HDFS Bridge
Compute Node
Data Move Service
SQL Server
Job Tracker
Hadoop Cluster
Name Node
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
PDW Cluster
SQL Server
Compute Node
Data Move Service
HDFS Bridge
Compute Node
Data Move Service
SQL Server
SQL Server
Compute Node
Data Move Service
SQL Server PDW : Architecture
Control Node
CREATE HADOOP_CLUSTER GSL_CLUSTER WITH
(namenode=‘hadoop-head’,namenode_port=9000,
jobtracker=‘hadoop-head’,jobtracker_port=9010);
Register the Hadoop Cluster with PDW
Map HDFS File to External Tables in PDW
CREATE EXTERNAL TABLE hdfsCustomer
( c_custkey!! bigint not null,
c_name!! varchar(25) not null,
c_address!! varchar(40) not null,
c_nationkey! integer not null,
c_phone! ! char(15) not null,
c_acctbal!! decimal(15,2) not null,
c_mktsegment! char(10) not null,
c_comment!! varchar(117) not null)
WITH (LOCATION='/tpch1gb/customer.tbl',
FORMAT_OPTIONS (EXTERNAL_CLUSTER = GSL_CLUSTER,
EXTERNAL_FILEFORMAT = TEXT_FORMAT));
Life Cycle of a Split Query
Clients
ADO.NET JDBC/ODBC Client OLEDB
Loader Manager
Control Node
DMS Controller
Engine Service SQL Server
HDFS Bridge
Compute Node
Data Move Service
SQL Server
Hadoop Cluster
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
PDW Cluster
HDFS Bridge
Compute Node
Data Move Service
SQL Server
Plan
Job Tracker
Name Node
Data Node
Task Tracker
SQL-on-Hadoop : The Technology
Faster Hive
Distributed Query Processors
Split Query Processors
SQL-on-Hadoop or Map Reduce?
</presentation>
More on
www.systemswemake.com
Follow : @systems_we_make

Más contenido relacionado

La actualidad más candente

Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreModern Data Stack France
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop EcosystemLior Sidi
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterEdureka!
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionChirag Ahuja
 
An introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveAn introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveMike Frampton
 
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionHadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionEdureka!
 
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQSQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQModern Data Stack France
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovVasil Remeniuk
 
Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudEdureka!
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemCloudera, Inc.
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesKelly Technologies
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari
 
Hw09 Sqoop Database Import For Hadoop
Hw09   Sqoop Database Import For HadoopHw09   Sqoop Database Import For Hadoop
Hw09 Sqoop Database Import For HadoopCloudera, Inc.
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemRajkumar Singh
 
Where does hadoop come handy
Where does hadoop come handyWhere does hadoop come handy
Where does hadoop come handyPraveen Sripati
 

La actualidad más candente (20)

Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop Cluster
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
SQOOP - RDBMS to Hadoop
SQOOP - RDBMS to HadoopSQOOP - RDBMS to Hadoop
SQOOP - RDBMS to Hadoop
 
An introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveAn introduction to Apache Hadoop Hive
An introduction to Apache Hadoop Hive
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionHadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
 
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQSQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex Gryzlov
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS Cloud
 
HW09 Hadoop Vaidya
HW09 Hadoop VaidyaHW09 Hadoop Vaidya
HW09 Hadoop Vaidya
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Hw09 Sqoop Database Import For Hadoop
Hw09   Sqoop Database Import For HadoopHw09   Sqoop Database Import For Hadoop
Hw09 Sqoop Database Import For Hadoop
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
 
Where does hadoop come handy
Where does hadoop come handyWhere does hadoop come handy
Where does hadoop come handy
 

Destacado

W - Recession & Depression in 28 different languages
W - Recession & Depression in 28 different languagesW - Recession & Depression in 28 different languages
W - Recession & Depression in 28 different languagesPatrick Pawlowski
 
Reinventando a Colmeia
Reinventando a ColmeiaReinventando a Colmeia
Reinventando a Colmeiajmm kazi
 
Information Retrieval with Open Source
Information Retrieval with Open SourceInformation Retrieval with Open Source
Information Retrieval with Open Sourcekorzonek
 
Alpha Kappa Psi Professional Development Workshop. Interviewing Made Easy
Alpha Kappa Psi Professional Development Workshop. Interviewing Made EasyAlpha Kappa Psi Professional Development Workshop. Interviewing Made Easy
Alpha Kappa Psi Professional Development Workshop. Interviewing Made Easyucsdakpsi
 

Destacado (6)

W - Recession & Depression in 28 different languages
W - Recession & Depression in 28 different languagesW - Recession & Depression in 28 different languages
W - Recession & Depression in 28 different languages
 
Reinventando a Colmeia
Reinventando a ColmeiaReinventando a Colmeia
Reinventando a Colmeia
 
Information Retrieval with Open Source
Information Retrieval with Open SourceInformation Retrieval with Open Source
Information Retrieval with Open Source
 
Alpha Kappa Psi Professional Development Workshop. Interviewing Made Easy
Alpha Kappa Psi Professional Development Workshop. Interviewing Made EasyAlpha Kappa Psi Professional Development Workshop. Interviewing Made Easy
Alpha Kappa Psi Professional Development Workshop. Interviewing Made Easy
 
Diigo Presentation
Diigo PresentationDiigo Presentation
Diigo Presentation
 
Q - The House Of Slaves
Q - The House Of SlavesQ - The House Of Slaves
Q - The House Of Slaves
 

Similar a It takes two to tango! : Is SQL-on-Hadoop the next big step?

Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight ServiceNeil Mackenzie
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: RevealedSachin Holla
 
SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases  SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases DataWorks Summit
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Hadoop Training in Hyderabad | Online Training
Hadoop Training in Hyderabad | Online TrainingHadoop Training in Hyderabad | Online Training
Hadoop Training in Hyderabad | Online TrainingN Benchmark IT Solutions
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewGreat Wide Open
 
7 Databases in 70 minutes
7 Databases in 70 minutes7 Databases in 70 minutes
7 Databases in 70 minutesKaren Lopez
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data trainingagiamas
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop StoryMichael Rys
 
Best Hadoop and Amazon Online Training
Best Hadoop and Amazon Online TrainingBest Hadoop and Amazon Online Training
Best Hadoop and Amazon Online TrainingSamatha Kamuni
 
Hadoop and aws map reducecourse
Hadoop and aws map reducecourseHadoop and aws map reducecourse
Hadoop and aws map reducecourseSamatha Kamuni
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Lace Lofranco
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big dealeduarderwee
 
Data Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management MonolithsData Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management MonolithsItai Yaffe
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft PlatformAndrew Brust
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackAndrew Brust
 

Similar a It takes two to tango! : Is SQL-on-Hadoop the next big step? (20)

Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
 
SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases  SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Hadoop Training in Hyderabad | Online Training
Hadoop Training in Hyderabad | Online TrainingHadoop Training in Hyderabad | Online Training
Hadoop Training in Hyderabad | Online Training
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An Overview
 
Hive with HDInsight
Hive with HDInsightHive with HDInsight
Hive with HDInsight
 
7 Databases in 70 minutes
7 Databases in 70 minutes7 Databases in 70 minutes
7 Databases in 70 minutes
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
 
Best Hadoop and Amazon Online Training
Best Hadoop and Amazon Online TrainingBest Hadoop and Amazon Online Training
Best Hadoop and Amazon Online Training
 
Hadoop and aws map reducecourse
Hadoop and aws map reducecourseHadoop and aws map reducecourse
Hadoop and aws map reducecourse
 
Sureh hadoop 3 years t
Sureh hadoop 3 years tSureh hadoop 3 years t
Sureh hadoop 3 years t
 
Apache drill
Apache drillApache drill
Apache drill
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big deal
 
Data Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management MonolithsData Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management Monoliths
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
 

Último

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

It takes two to tango! : Is SQL-on-Hadoop the next big step?

  • 1.
  • 2. It takes two to tango! Is SQL-on-Hadoop the next big step?
  • 3. Big Data Crunching A Retrospective
  • 5. What was it like before Hadoop? ThePhylogeneticTreeofElephants
  • 6. Partitioned or Sharded RDBMSs Data Warehouses Massively Parallel Databases Tech before Hadoop
  • 7. Massively Parallel Databases Shared Nothing Architecture
  • 12. What’s the best way to answer questions that span these two worlds? Can we interface SQL atop Hadoop? Can we combine the strengths of parallel databases with those of Hadoop?
  • 14. Distributed Query Processing Cloudera’s Impala MapR supported Apache Drill and more.. Split Query Processing Microsoft Polybase Hadapt SQL-on-Hadoop : Technical Approaches Faster Hive Hortonworks’ Stinger initiative Qubole’s Hive-on-the-Cloud
  • 16. Cloudera Impala : Architecture Clients Impala Shell JDBC/ODBC Client SQL Tools Data Node Data Node Impala Daemon Impala Daemon Impala Daemon Data Node Query Execution Query Planning Query Coordination Query Execution Query Planning Query Coordination Query Execution Query Planning Query Coordination State StoreMetadata Catalog HDFS Name Node Unified Metadata Store
  • 17. Life Cycle of an Impala Query Clients Impala Shell JDBC/ODBC Client SQL Tools Impala Daemon Data Node State StoreMetadata Catalog HDFS Name Node Impala Daemon Data Node Impala Daemon Data Node Impala Daemon Data Node Coordinate Execution Plan and Optimize Parse Query
  • 19. Polybase + PDW : Architecture Clients ADO.NET JDBC/ODBC Client OLEDB PDW Engine Service DMS Controller Loader Manager SQL Server HDFS Bridge Compute Node Data Move Service SQL Server Job Tracker Hadoop Cluster Name Node Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker PDW Cluster SQL Server Compute Node Data Move Service HDFS Bridge Compute Node Data Move Service SQL Server SQL Server Compute Node Data Move Service SQL Server PDW : Architecture Control Node
  • 20. CREATE HADOOP_CLUSTER GSL_CLUSTER WITH (namenode=‘hadoop-head’,namenode_port=9000, jobtracker=‘hadoop-head’,jobtracker_port=9010); Register the Hadoop Cluster with PDW
  • 21. Map HDFS File to External Tables in PDW CREATE EXTERNAL TABLE hdfsCustomer ( c_custkey!! bigint not null, c_name!! varchar(25) not null, c_address!! varchar(40) not null, c_nationkey! integer not null, c_phone! ! char(15) not null, c_acctbal!! decimal(15,2) not null, c_mktsegment! char(10) not null, c_comment!! varchar(117) not null) WITH (LOCATION='/tpch1gb/customer.tbl', FORMAT_OPTIONS (EXTERNAL_CLUSTER = GSL_CLUSTER, EXTERNAL_FILEFORMAT = TEXT_FORMAT));
  • 22. Life Cycle of a Split Query Clients ADO.NET JDBC/ODBC Client OLEDB Loader Manager Control Node DMS Controller Engine Service SQL Server HDFS Bridge Compute Node Data Move Service SQL Server Hadoop Cluster Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker PDW Cluster HDFS Bridge Compute Node Data Move Service SQL Server Plan Job Tracker Name Node Data Node Task Tracker
  • 23. SQL-on-Hadoop : The Technology Faster Hive Distributed Query Processors Split Query Processors