SlideShare una empresa de Scribd logo
1 de 18
Cloudera	
  Impala	
  
Real	
  Time	
  Query	
  for	
  HDFS	
  and	
  HBase	
  
Beyond	
  Batch	
  
    What	
  is	
  Impala	
  
    Capability	
  
    Architecture	
  
    Demo	
  




2
Beyond	
  Batch	
  

        For	
  some	
  things	
  MapReduce	
  is	
  just	
  too	
  slow	
  
        Apache	
  Hive:	
  
            MapReduce	
  execuHon	
  engine	
  
            High-­‐latency,	
  low	
  throughput	
  
            High	
  runHme	
  overhead	
  
        Google	
  realized	
  this	
  early	
  on	
  
           	
  Analysts	
  wanted	
  fast,	
  interacHve	
  results	
  

3	
  
Dremel	
  

        Google	
  paper	
  (2010)	
  
           “scalable,	
  interac.ve	
  ad-­‐hoc	
  query	
  system	
  for	
  
           analysis	
  of	
  read-­‐only	
  nested	
  data”	
  
        Columnar	
  storage	
  format	
  
        Distributed	
  scalable	
  aggregaHon	
  
           “capable	
  of	
  running	
  aggrega.on	
  queries	
  over	
  
           trillion-­‐row	
  tables	
  in	
  seconds”	
  
                                       hUp://research.google.com/pubs/pub36632.html	
  

4	
  
Impala:	
  Goals	
  

        General-­‐purpose	
  SQL	
  query	
  engine	
  for	
  Hadoop	
  
        For	
  analyHcal	
  and	
  transacHonal	
  workloads	
  
        Support	
  queries	
  that	
  take	
  μs	
  to	
  hours	
  
        Run	
  directly	
  with	
  Hadoop	
  
            Collocated	
  daemons	
  
            Same	
  file	
  formats	
  
            Same	
  storage	
  managers	
  (NN,	
  metastore)	
  

5	
  
Impala:	
  Goals	
  

        High	
  performance	
  
            C++	
  
            runHme	
  code	
  generaHon	
  (LLVM)	
  
            direct	
  access	
  to	
  data	
  (no	
  MapReduce)	
  
        Retain	
  user	
  experience	
  
            	
  easy	
  for	
  Hive	
  users	
  to	
  migrate	
  
        100%	
  open-­‐source	
  

6	
  
Impala:	
  Capability	
  

        HiveQL	
  (subset	
  of	
  SQL92)	
  
           select,	
  project,	
  join,	
  union,	
  subqueries,	
  
           aggregaHon,	
  insert,	
  order	
  by	
  (with	
  limit)	
  
           DDL	
  
        Directly	
  queries	
  data	
  in	
  HDFS	
  &	
  HBase	
  
           Text	
  files	
  (compressed)	
  
           Sequence	
  files	
  (snappy/gzip)	
  
           Avro	
  &	
  Trevni	
                                    GA	
  features	
  
7	
  
Impala:	
  Capability	
  

        Familiar	
  and	
  unified	
  plagorm	
  
          Uses	
  Hive’s	
  metastore	
  
          Submit	
  queries	
  via	
  ODBC	
  |	
  Beeswax	
  Thril	
  API	
  
        Query	
  is	
  distributed	
  to	
  nodes	
  with	
  relevant	
  data	
  
        Process-­‐to-­‐process	
  data	
  exchange	
  
        Kerberos	
  authenHcaHon	
  
        No	
  fault	
  tolerance	
  


8	
  
Impala:	
  Performance	
  

        Greater	
  disk	
  throughput	
  
           ~100MB/sec/disk	
  
           I/O-­‐bound	
  workloads	
  faster	
  by	
  3-­‐4x	
  
        Queries	
  that	
  require	
  mulHple	
  map-­‐reduce	
  
        phases	
  in	
  Hive	
  are	
  significantly	
  faster	
  in	
  Impala	
  
        (up	
  to	
  45x)	
  
        Queries	
  that	
  run	
  against	
  in-­‐memory	
  cached	
  data	
  
        see	
  a	
  significant	
  speedup	
  (up	
  to	
  90x)	
  
9	
  
Impala:	
  Architecture	
  

         impalad	
  
            runs	
  on	
  every	
  node	
  
            handles	
  client	
  requests	
  (ODBC,	
  thril)	
  
            handles	
  query	
  planning	
  &	
  execuHon	
  
         statestored	
  
            provides	
  name	
  service	
  
            metadata	
  distribuHon	
  
            used	
  for	
  finding	
  data	
  
10	
  
Impala:	
  Architecture	
  




11	
  
Impala:	
  Architecture	
  




12	
  
Impala:	
  Architecture	
  




13	
  
Impala:	
  Architecture	
  




14	
  
Current	
  limitaHons	
  

         Public	
  Beta	
  (available	
  since	
  24	
  Oct	
  2012)	
  
           No	
  SerDes	
  
           No	
  User	
  Defined	
  FuncHons	
  (UDF’s)	
  
           Joins	
  are	
  done	
  in	
  memory	
  space	
  no	
  larger	
  
           than	
  that	
  of	
  smallest	
  node	
  
           impalad’s	
  only	
  read	
  statestored	
  metadata	
  at	
  
           startup	
  	
  


15	
  
Futures	
  

         GA	
  Q1	
  2013	
  
           DDL	
  support	
  (CREATE,	
  ALTER)	
  
           Rudimentary	
  cost-­‐based	
  opHmizer	
  (CBO)	
  
           Joins	
  done	
  in	
  aggregate	
  memory	
  
           metadata	
  distribuHon	
  through	
  statestored	
  
           Doug	
  Curng’s	
  Trevni	
  
                Columnar	
  storage	
  format	
  like	
  Dremel’s	
  
                Impala	
  +	
  Trevni	
  =	
  Dremel	
  superset	
  
16	
  
Demo	
  

                    impala-­‐user@cloudera.com	
  
                         kinley@cloudera.com	
  
                                      @jrkinley	
  




17	
  
Cloudera impala

Más contenido relacionado

La actualidad más candente

How Impala Works
How Impala WorksHow Impala Works
How Impala WorksYue Chen
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala InternalsDavid Groozman
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Scott Leberknight
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Cloudera, Inc.
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017larsgeorge
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaJason Shih
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoopmarkgrover
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLliuknag
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon
 
(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impalaNAVER D2
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Node labels in YARN
Node labels in YARNNode labels in YARN
Node labels in YARNWangda Tan
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLArseny Chernov
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 

La actualidad más candente (20)

How Impala Works
How Impala WorksHow Impala Works
How Impala Works
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 
Is hadoop for you
Is hadoop for youIs hadoop for you
Is hadoop for you
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Hive vs. Impala
Hive vs. ImpalaHive vs. Impala
Hive vs. Impala
 
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
 
(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impala
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Node labels in YARN
Node labels in YARNNode labels in YARN
Node labels in YARN
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 

Destacado

Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
 
Cloudera Impala technical deep dive
Cloudera Impala technical deep diveCloudera Impala technical deep dive
Cloudera Impala technical deep divehuguk
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopCloudera, Inc.
 
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...Cloudera, Inc.
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHortonworks
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Hadoopとその周辺の紹介
Hadoopとその周辺の紹介Hadoopとその周辺の紹介
Hadoopとその周辺の紹介Shinya Okano
 
Database Fundamental Concepts- Series 1 - Performance Analysis
Database Fundamental Concepts- Series 1 - Performance AnalysisDatabase Fundamental Concepts- Series 1 - Performance Analysis
Database Fundamental Concepts- Series 1 - Performance AnalysisDAGEOP LTD
 
SQL-on-Hadoop with Apache Tajo, and application case of SK Telecom
SQL-on-Hadoop with Apache Tajo,  and application case of SK TelecomSQL-on-Hadoop with Apache Tajo,  and application case of SK Telecom
SQL-on-Hadoop with Apache Tajo, and application case of SK TelecomGruter
 
Cloudera Impalaをサービスに組み込むときに苦労した話
Cloudera Impalaをサービスに組み込むときに苦労した話Cloudera Impalaをサービスに組み込むときに苦労した話
Cloudera Impalaをサービスに組み込むときに苦労した話Yukinori Suda
 
Impala データサイエンティストのための 高速大規模分散基盤 #tokyowebmining
Impala データサイエンティストのための 高速大規模分散基盤 #tokyowebminingImpala データサイエンティストのための 高速大規模分散基盤 #tokyowebmining
Impala データサイエンティストのための 高速大規模分散基盤 #tokyowebminingSho Shimauchi
 
Hw09 Hadoop Development At Facebook Hive And Hdfs
Hw09   Hadoop Development At Facebook  Hive And HdfsHw09   Hadoop Development At Facebook  Hive And Hdfs
Hw09 Hadoop Development At Facebook Hive And HdfsCloudera, Inc.
 
Introduction to Apache Spark and MLlib
Introduction to Apache Spark and MLlibIntroduction to Apache Spark and MLlib
Introduction to Apache Spark and MLlibpumaranikar
 
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
The Evolution of Data Analysis with Hadoop - StampedeCon 2014The Evolution of Data Analysis with Hadoop - StampedeCon 2014
The Evolution of Data Analysis with Hadoop - StampedeCon 2014StampedeCon
 
Apache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance UpdateApache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance UpdateCloudera, Inc.
 

Destacado (18)

The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Cloudera Impala technical deep dive
Cloudera Impala technical deep diveCloudera Impala technical deep dive
Cloudera Impala technical deep dive
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
 
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Hadoopとその周辺の紹介
Hadoopとその周辺の紹介Hadoopとその周辺の紹介
Hadoopとその周辺の紹介
 
Database Fundamental Concepts- Series 1 - Performance Analysis
Database Fundamental Concepts- Series 1 - Performance AnalysisDatabase Fundamental Concepts- Series 1 - Performance Analysis
Database Fundamental Concepts- Series 1 - Performance Analysis
 
Failing gracefully
Failing gracefullyFailing gracefully
Failing gracefully
 
SQL-on-Hadoop with Apache Tajo, and application case of SK Telecom
SQL-on-Hadoop with Apache Tajo,  and application case of SK TelecomSQL-on-Hadoop with Apache Tajo,  and application case of SK Telecom
SQL-on-Hadoop with Apache Tajo, and application case of SK Telecom
 
Cloudera Impalaをサービスに組み込むときに苦労した話
Cloudera Impalaをサービスに組み込むときに苦労した話Cloudera Impalaをサービスに組み込むときに苦労した話
Cloudera Impalaをサービスに組み込むときに苦労した話
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Impala データサイエンティストのための 高速大規模分散基盤 #tokyowebmining
Impala データサイエンティストのための 高速大規模分散基盤 #tokyowebminingImpala データサイエンティストのための 高速大規模分散基盤 #tokyowebmining
Impala データサイエンティストのための 高速大規模分散基盤 #tokyowebmining
 
Hw09 Hadoop Development At Facebook Hive And Hdfs
Hw09   Hadoop Development At Facebook  Hive And HdfsHw09   Hadoop Development At Facebook  Hive And Hdfs
Hw09 Hadoop Development At Facebook Hive And Hdfs
 
Introduction to Apache Spark and MLlib
Introduction to Apache Spark and MLlibIntroduction to Apache Spark and MLlib
Introduction to Apache Spark and MLlib
 
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
The Evolution of Data Analysis with Hadoop - StampedeCon 2014The Evolution of Data Analysis with Hadoop - StampedeCon 2014
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
 
Apache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance UpdateApache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance Update
 

Similar a Cloudera impala

Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014cdmaxime
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionChirag Ahuja
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemCloudera, Inc.
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad ranaData Con LA
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache HadoopAjit Koti
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Rajan Kanitkar
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoopveeracynixit
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoopveeracynixit
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxDanishMahmood23
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache HadoopOleksiy Krotov
 
BDM8 - Near-realtime Big Data Analytics using Impala
BDM8 - Near-realtime Big Data Analytics using ImpalaBDM8 - Near-realtime Big Data Analytics using Impala
BDM8 - Near-realtime Big Data Analytics using ImpalaDavid Lauzon
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedDouglas Bernardini
 
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfimpalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfssusere05ec21
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming DataGeoffrey Fox
 

Similar a Cloudera impala (20)

Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
 
Apache drill
Apache drillApache drill
Apache drill
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
 
BDM8 - Near-realtime Big Data Analytics using Impala
BDM8 - Near-realtime Big Data Analytics using ImpalaBDM8 - Near-realtime Big Data Analytics using Impala
BDM8 - Near-realtime Big Data Analytics using Impala
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integrated
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfimpalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming Data
 

Más de Swiss Big Data User Group

Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useSwiss Big Data User Group
 
A real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorA real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorSwiss Big Data User Group
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisSwiss Big Data User Group
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesSwiss Big Data User Group
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningSwiss Big Data User Group
 
Unleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseUnleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseSwiss Big Data User Group
 
Project "Babelfish" - A data warehouse to attack complexity
 Project "Babelfish" - A data warehouse to attack complexity Project "Babelfish" - A data warehouse to attack complexity
Project "Babelfish" - A data warehouse to attack complexitySwiss Big Data User Group
 
Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceSwiss Big Data User Group
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketSwiss Big Data User Group
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridSwiss Big Data User Group
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseSwiss Big Data User Group
 
Technology Outlook - The new Era of computing
Technology Outlook - The new Era of computingTechnology Outlook - The new Era of computing
Technology Outlook - The new Era of computingSwiss Big Data User Group
 

Más de Swiss Big Data User Group (20)

Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to use
 
A real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorA real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operator
 
Data Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2CData Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2C
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data Analysis
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companies
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time Learning
 
Educating Data Scientists of the Future
Educating Data Scientists of the FutureEducating Data Scientists of the Future
Educating Data Scientists of the Future
 
Unleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseUnleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data Warehouse
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
Project "Babelfish" - A data warehouse to attack complexity
 Project "Babelfish" - A data warehouse to attack complexity Project "Babelfish" - A data warehouse to attack complexity
Project "Babelfish" - A data warehouse to attack complexity
 
Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density Choice
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maket
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC Datagrid
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph database
 
Technology Outlook - The new Era of computing
Technology Outlook - The new Era of computingTechnology Outlook - The new Era of computing
Technology Outlook - The new Era of computing
 
In-Store Analysis with Hadoop
In-Store Analysis with HadoopIn-Store Analysis with Hadoop
In-Store Analysis with Hadoop
 
Big Data Visualization With ParaView
Big Data Visualization With ParaViewBig Data Visualization With ParaView
Big Data Visualization With ParaView
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 
Oracle's BigData solutions
Oracle's BigData solutionsOracle's BigData solutions
Oracle's BigData solutions
 

Último

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Cloudera impala

  • 1. Cloudera  Impala   Real  Time  Query  for  HDFS  and  HBase  
  • 2. Beyond  Batch   What  is  Impala   Capability   Architecture   Demo   2
  • 3. Beyond  Batch   For  some  things  MapReduce  is  just  too  slow   Apache  Hive:   MapReduce  execuHon  engine   High-­‐latency,  low  throughput   High  runHme  overhead   Google  realized  this  early  on    Analysts  wanted  fast,  interacHve  results   3  
  • 4. Dremel   Google  paper  (2010)   “scalable,  interac.ve  ad-­‐hoc  query  system  for   analysis  of  read-­‐only  nested  data”   Columnar  storage  format   Distributed  scalable  aggregaHon   “capable  of  running  aggrega.on  queries  over   trillion-­‐row  tables  in  seconds”   hUp://research.google.com/pubs/pub36632.html   4  
  • 5. Impala:  Goals   General-­‐purpose  SQL  query  engine  for  Hadoop   For  analyHcal  and  transacHonal  workloads   Support  queries  that  take  μs  to  hours   Run  directly  with  Hadoop   Collocated  daemons   Same  file  formats   Same  storage  managers  (NN,  metastore)   5  
  • 6. Impala:  Goals   High  performance   C++   runHme  code  generaHon  (LLVM)   direct  access  to  data  (no  MapReduce)   Retain  user  experience    easy  for  Hive  users  to  migrate   100%  open-­‐source   6  
  • 7. Impala:  Capability   HiveQL  (subset  of  SQL92)   select,  project,  join,  union,  subqueries,   aggregaHon,  insert,  order  by  (with  limit)   DDL   Directly  queries  data  in  HDFS  &  HBase   Text  files  (compressed)   Sequence  files  (snappy/gzip)   Avro  &  Trevni   GA  features   7  
  • 8. Impala:  Capability   Familiar  and  unified  plagorm   Uses  Hive’s  metastore   Submit  queries  via  ODBC  |  Beeswax  Thril  API   Query  is  distributed  to  nodes  with  relevant  data   Process-­‐to-­‐process  data  exchange   Kerberos  authenHcaHon   No  fault  tolerance   8  
  • 9. Impala:  Performance   Greater  disk  throughput   ~100MB/sec/disk   I/O-­‐bound  workloads  faster  by  3-­‐4x   Queries  that  require  mulHple  map-­‐reduce   phases  in  Hive  are  significantly  faster  in  Impala   (up  to  45x)   Queries  that  run  against  in-­‐memory  cached  data   see  a  significant  speedup  (up  to  90x)   9  
  • 10. Impala:  Architecture   impalad   runs  on  every  node   handles  client  requests  (ODBC,  thril)   handles  query  planning  &  execuHon   statestored   provides  name  service   metadata  distribuHon   used  for  finding  data   10  
  • 15. Current  limitaHons   Public  Beta  (available  since  24  Oct  2012)   No  SerDes   No  User  Defined  FuncHons  (UDF’s)   Joins  are  done  in  memory  space  no  larger   than  that  of  smallest  node   impalad’s  only  read  statestored  metadata  at   startup     15  
  • 16. Futures   GA  Q1  2013   DDL  support  (CREATE,  ALTER)   Rudimentary  cost-­‐based  opHmizer  (CBO)   Joins  done  in  aggregate  memory   metadata  distribuHon  through  statestored   Doug  Curng’s  Trevni   Columnar  storage  format  like  Dremel’s   Impala  +  Trevni  =  Dremel  superset   16  
  • 17. Demo   impala-­‐user@cloudera.com   kinley@cloudera.com   @jrkinley   17