SlideShare una empresa de Scribd logo
1 de 18
Descargar para leer sin conexión
Cloudera	
  Impala	
  
Jus/n	
  Erickson	
  |	
  	
  Senior	
  Product	
  Manager	
  
May	
  2013	
  
Agenda	
  
•  Why	
  Impala?	
  
•  Architectural	
  Overview	
  
•  Real-­‐World	
  Use	
  Cases	
  
•  Alterna/ve	
  Approaches	
  
•  The	
  PlaKorm	
  for	
  Big	
  Data	
  
©2013 Cloudera, Inc. All Rights
Reserved.
2
Why	
  Hadoop?	
  
•  Scalability	
  
•  Simply	
  scales	
  just	
  by	
  adding	
  nodes	
  
•  Local	
  processing	
  to	
  avoid	
  network	
  boSlenecks	
  
•  Flexibility	
  
•  All	
  kinds	
  of	
  data	
  (blobs,	
  documents,	
  records,	
  etc)	
  
•  In	
  all	
  forms	
  (structured,	
  semi-­‐structured,	
  unstructured)	
  
•  Store	
  anything	
  then	
  later	
  analyze	
  what	
  you	
  need	
  
•  Efficiency	
  
•  Cost	
  efficiency	
  (<$1k/TB)	
  on	
  commodity	
  hardware	
  
•  Unified	
  storage,	
  metadata,	
  security	
  (no	
  duplica/on	
  or	
  
synchroniza/on)	
  
©2013 Cloudera, Inc. All Rights
Reserved.
3
What’s	
  Impala?	
  
•  Interac2ve	
  SQL	
  
•  Typically	
  5-­‐65x	
  faster	
  than	
  Hive	
  (observed	
  up	
  to	
  100x	
  faster)	
  
•  Responses	
  in	
  seconds	
  instead	
  of	
  minutes	
  (some/mes	
  sub-­‐second)	
  
•  Nearly	
  ANSI-­‐92	
  standard	
  SQL	
  queries	
  with	
  Hive	
  SQL	
  
•  Compa/ble	
  SQL	
  interface	
  for	
  exis/ng	
  Hadoop/CDH	
  applica/ons	
  
•  Based	
  on	
  industry	
  standard	
  SQL	
  
•  Na2vely	
  on	
  Hadoop/HBase	
  storage	
  and	
  metadata	
  
•  Flexibility,	
  scale,	
  and	
  cost	
  advantages	
  of	
  Hadoop	
  
•  No	
  duplica/on/synchroniza/on	
  of	
  data	
  and	
  metadata	
  
•  Local	
  processing	
  to	
  avoid	
  network	
  boSlenecks	
  
•  Separate	
  run2me	
  from	
  MapReduce	
  
•  MapReduce	
  is	
  designed	
  and	
  great	
  for	
  batch	
  
•  Impala	
  is	
  purpose-­‐built	
  for	
  low-­‐latency	
  SQL	
  queries	
  on	
  Hadoop	
  
©2013 Cloudera, Inc. All Rights
Reserved.
4
Benefits	
  of	
  Impala	
  
5
More	
  &	
  Faster	
  Value	
  from	
  “Big	
  Data”	
  
§  BI	
  tools	
  imprac/cal	
  on	
  Hadoop	
  before	
  Impala	
  
§  Move	
  from	
  10s	
  of	
  Hadoop	
  users	
  per	
  cluster	
  to	
  100s	
  of	
  SQL	
  users	
  
§  No	
  delays	
  from	
  data	
  migra/on	
  
Flexibility	
  
§  Query	
  across	
  exis/ng	
  data	
  
§  Select	
  best-­‐fit	
  file	
  formats	
  (Parquet,	
  Avro,	
  etc.)	
  
§  Run	
  mul/ple	
  frameworks	
  on	
  the	
  same	
  data	
  at	
  the	
  same	
  /me	
  	
  
Cost	
  Efficiency	
  
§  Reduce	
  movement,	
  duplicate	
  storage	
  &	
  compute	
  
§  10%	
  to	
  1%	
  the	
  cost	
  of	
  analy/c	
  DBMS	
  
Full	
  Fidelity	
  Analysis	
  
§  No	
  loss	
  from	
  aggrega/ons	
  or	
  fixed	
  schemas	
  
©2013 Cloudera, Inc. All Rights
Reserved.
Impala	
  Query	
  Execu/on	
  
6
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
SQL	
  App	
  
ODBC	
  
Hive	
  
Metastore	
  
HDFS	
  NN	
   Statestore	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
SQL	
  request	
  
1)	
  Request	
  arrives	
  via	
  ODBC/JDBC/Beeswax/Shell	
  
©2013 Cloudera, Inc. All Rights
Reserved.
Impala	
  Query	
  Execu/on	
  
7
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
SQL	
  App	
  
ODBC	
  
Hive	
  
Metastore	
  
HDFS	
  NN	
   Statestore	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
2)	
  Planner	
  turns	
  request	
  into	
  collec2ons	
  of	
  plan	
  fragments	
  
3)	
  Coordinator	
  ini2ates	
  execu2on	
  on	
  impalad(s)	
  local	
  to	
  data	
  
©2013 Cloudera, Inc. All Rights
Reserved.
Impala	
  Query	
  Execu/on	
  
8
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
SQL	
  App	
  
ODBC	
  
Hive	
  
Metastore	
  
HDFS	
  NN	
   Statestore	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
4)	
  Intermediate	
  results	
  are	
  streamed	
  between	
  impalad(s)	
  
5)	
  Query	
  results	
  are	
  streamed	
  back	
  to	
  client	
  
Query	
  results	
  
©2013 Cloudera, Inc. All Rights
Reserved.
Impala	
  and	
  Hive	
  
9
Shares	
  Everything	
  Client-­‐Facing	
  
§  Metadata	
  (table	
  defini/ons)	
  
§  ODBC/JDBC	
  drivers	
  
§  SQL	
  syntax	
  (Hive	
  SQL)	
  
§  Flexible	
  file	
  formats	
  
§  Machine	
  pool	
  
§  Hue	
  GUI	
  
But	
  Built	
  for	
  Different	
  Purposes	
  
§  Hive:	
  runs	
  on	
  MapReduce	
  and	
  ideal	
  for	
  batch	
  
processing	
  
§  Impala:	
  na/ve	
  MPP	
  query	
  engine	
  ideal	
  for	
  
interac/ve	
  SQL	
  
Storage	
  
Integra2on	
  
Resource	
  Management	
  
Metadata	
  
HDFS	
   HBase	
  
TEXT,	
  RCFILE,	
  PARQUET,	
  AVRO,	
  ETC.	
   RECORDS	
  
Hive	
  
SQL	
  Syntax	
   Impala	
  
SQL	
  Syntax	
  +	
  
Compute	
  Framework	
  MapReduce	
  
Compute	
  Framework	
  
Batch	
  
Processing	
  
Interac/ve	
  
SQL	
  
©2013 Cloudera, Inc. All Rights
Reserved.
Impala	
  Use	
  Cases	
  
10
Interac/ve	
  BI/analy/cs	
  on	
  more	
  data	
  
Asking	
  new	
  ques/ons	
  
Query-­‐able	
  archive	
  w/	
  full	
  fidelity	
  
Data	
  processing	
  with	
  /ght	
  SLAs	
  
Cost-­‐effec2ve,	
  ad	
  hoc	
  query	
  environment	
  that	
  
offloads	
  the	
  data	
  warehouse	
  for:	
  
©2013 Cloudera, Inc. All Rights
Reserved.
Global	
  Financial	
  Services	
  Company	
  
11
Saving	
  90%	
  on	
  incremental	
  EDW	
  spend	
  &	
  
improving	
  performance	
  by	
  5x	
  
Offload	
  data	
  warehouse	
  for	
  query-­‐able	
  archive	
  
Store	
  decades	
  of	
  data	
  cost-­‐effec/vely	
  
Process	
  &	
  analyze	
  on	
  the	
  same	
  system	
  
Improve	
  capabili/es	
  through	
  interac/ve	
  query	
  
on	
  more	
  data	
  
©2013 Cloudera, Inc. All Rights
Reserved.
Six3	
  Systems	
  
12
Boos2ng	
  performance	
  by	
  20X	
  for	
  mission-­‐cri2cal,	
  
real-­‐2me	
  cyber	
  security	
  
Analyze	
  unstructured	
  data	
  with	
  flexibility	
  &	
  	
  
real-­‐/me	
  response	
  
Integrate	
  with	
  exis/ng	
  desktop	
  &	
  BI	
  tools	
  
Deploy	
  in	
  minutes	
  with	
  Cloudera	
  Manager	
  
©2013 Cloudera, Inc. All Rights
Reserved.
Expedia	
  
13
Implemen2ng	
  self-­‐service	
  BI	
  on	
  big	
  data,	
  	
  
reducing	
  data	
  latency	
  by	
  50%	
  	
  
Offload	
  data	
  warehouse	
  for	
  archiving,	
  ETL	
  &	
  
analy/cs	
  
Unify	
  IT	
  environment	
  	
  
Con/nuously	
  ingest	
  &	
  analyze	
  at	
  scale	
  
Drive	
  greater	
  usability	
  &	
  adop/on	
  of	
  big	
  data	
  
stack	
  
©2013 Cloudera, Inc. All Rights
Reserved.
Our	
  Design	
  Strategy	
  
14
Storage	
  
Integra2on	
  
Resource	
  Management	
  
Metadata	
  
Batch	
  
Processing	
  
MAPREDUCE,	
  
HIVE	
  &	
  PIG	
  
…
Interac/ve	
  
SQL	
  
IMPALA	
  
Machine	
  
Learning	
  
MAHOUT,	
  DATAFU	
  
HDFS	
   HBase	
  
TEXT,	
  RCFILE,	
  PARQUET,	
  AVRO,	
  ETC.	
   RECORDS	
  
Engines	
  
One	
  pool	
  of	
  data	
  
One	
  metadata	
  model	
  
One	
  security	
  framework	
  
One	
  set	
  of	
  system	
  resources	
  
An	
  Integrated	
  Part	
  of	
  
the	
  Hadoop	
  System	
  
©2013 Cloudera, Inc. All Rights
Reserved.
Not	
  All	
  SQL	
  on	
  Hadoop	
  is	
  Created	
  Equal	
  
15
Batch	
  MapReduce	
  
Make	
  MapReduce	
  faster	
  
Slow,	
  s2ll	
  batch	
  
Remote	
  Query	
  
Pull	
  data	
  from	
  HDFS	
  over	
  
the	
  network	
  to	
  the	
  DW	
  
compute	
  layer	
  
Slow,	
  expensive	
  
Siloed	
  DBMS	
  
Load	
  data	
  into	
  a	
  
proprietary	
  database	
  file	
  
Rigid,	
  siloed	
  data,	
  
slow	
  ETL	
  
Impala	
  
Na/ve	
  MPP	
  query	
  engine	
  
that’s	
  integrated	
  into	
  
Hadoop	
  
Fast,	
  flexible,	
  	
  
cost-­‐effec2ve	
  
$
©2013 Cloudera, Inc. All Rights
Reserved.
The	
  Impala	
  Advantage	
  
16
BI	
  Partners:	
  
Building	
  on	
  the	
  
Enterprise	
  Standard	
  
POWERED BY
IMPALA
©2013 Cloudera, Inc. All Rights
Reserved.
It’s	
  Not	
  Just	
  About	
  SQL	
  on	
  Hadoop	
  
17
The	
  Plaeorm	
  for	
  Big	
  Data	
  
Storage	
  
Integra2on	
  
Resource	
  Management	
  
Metadata	
  
Batch	
  
Processing	
  
MAPREDUCE,	
  
HIVE	
  &	
  PIG	
  
…
Interac/ve	
  
SQL	
  
IMPALA	
  
Machine	
  
Learning	
  
MAHOUT,	
  DATAFU	
  
HDFS	
   HBase	
  
TEXT,	
  RCFILE,	
  PARQUET,	
  AVRO…	
   RECORDS	
  
Engines	
  
Management	
  	
  |	
  	
  Support	
  
Single	
  plaKorm	
  for	
  processing	
  
&	
  analy/cs	
  
Scales	
  to	
  ‘000s	
  of	
  servers	
  
No	
  upfront	
  schema	
  
10%	
  the	
  cost	
  per	
  TB	
  
Open	
  source	
  plaKorm	
  
©2013 Cloudera, Inc. All Rights
Reserved.
Cloudera Impala Technical Overview

Más contenido relacionado

Destacado

Cloudera cluster setup and configuration
Cloudera cluster setup and configurationCloudera cluster setup and configuration
Cloudera cluster setup and configurationSudheer Kondla
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera, Inc.
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetupiwrigley
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache SqoopAvkash Chauhan
 
Architectural considerations for Hadoop Applications
Architectural considerations for Hadoop ApplicationsArchitectural considerations for Hadoop Applications
Architectural considerations for Hadoop Applicationshadooparchbook
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopCloudera, Inc.
 
How Impala Works
How Impala WorksHow Impala Works
How Impala WorksYue Chen
 
Cloudera Impala technical deep dive
Cloudera Impala technical deep diveCloudera Impala technical deep dive
Cloudera Impala technical deep divehuguk
 
Can We Assess Creativity?
Can We Assess Creativity?Can We Assess Creativity?
Can We Assess Creativity?John Spencer
 
Introduction to Hadoop : A bird eye's view | Abhishek Mukherjee
Introduction to Hadoop : A bird eye's view | Abhishek MukherjeeIntroduction to Hadoop : A bird eye's view | Abhishek Mukherjee
Introduction to Hadoop : A bird eye's view | Abhishek MukherjeeFinTechopedia
 
Hadoop Einführung @codecentric
Hadoop Einführung @codecentricHadoop Einführung @codecentric
Hadoop Einführung @codecentricimalik8088
 
Hadoop, Cloud y Spring
Hadoop, Cloud y Spring Hadoop, Cloud y Spring
Hadoop, Cloud y Spring Miguel Pastor
 
Introduction to Cloudera Search Training
Introduction to Cloudera Search TrainingIntroduction to Cloudera Search Training
Introduction to Cloudera Search TrainingCloudera, Inc.
 
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Publicis Sapient Engineering
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera managerChris Westin
 
Conferencia MySQL, NoSQL & Cloud: Construyendo una infraestructura de big dat...
Conferencia MySQL, NoSQL & Cloud: Construyendo una infraestructura de big dat...Conferencia MySQL, NoSQL & Cloud: Construyendo una infraestructura de big dat...
Conferencia MySQL, NoSQL & Cloud: Construyendo una infraestructura de big dat...Socialmetrix
 

Destacado (19)

Cloudera cluster setup and configuration
Cloudera cluster setup and configurationCloudera cluster setup and configuration
Cloudera cluster setup and configuration
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache Sqoop
 
Architectural considerations for Hadoop Applications
Architectural considerations for Hadoop ApplicationsArchitectural considerations for Hadoop Applications
Architectural considerations for Hadoop Applications
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
 
How Impala Works
How Impala WorksHow Impala Works
How Impala Works
 
Cloudera Impala technical deep dive
Cloudera Impala technical deep diveCloudera Impala technical deep dive
Cloudera Impala technical deep dive
 
Can We Assess Creativity?
Can We Assess Creativity?Can We Assess Creativity?
Can We Assess Creativity?
 
Introduction to Hadoop : A bird eye's view | Abhishek Mukherjee
Introduction to Hadoop : A bird eye's view | Abhishek MukherjeeIntroduction to Hadoop : A bird eye's view | Abhishek Mukherjee
Introduction to Hadoop : A bird eye's view | Abhishek Mukherjee
 
Hadoop Einführung @codecentric
Hadoop Einführung @codecentricHadoop Einführung @codecentric
Hadoop Einführung @codecentric
 
Hadoop, Cloud y Spring
Hadoop, Cloud y Spring Hadoop, Cloud y Spring
Hadoop, Cloud y Spring
 
Introduction to Cloudera Search Training
Introduction to Cloudera Search TrainingIntroduction to Cloudera Search Training
Introduction to Cloudera Search Training
 
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
 
Avro introduction
Avro introductionAvro introduction
Avro introduction
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera manager
 
Conferencia MySQL, NoSQL & Cloud: Construyendo una infraestructura de big dat...
Conferencia MySQL, NoSQL & Cloud: Construyendo una infraestructura de big dat...Conferencia MySQL, NoSQL & Cloud: Construyendo una infraestructura de big dat...
Conferencia MySQL, NoSQL & Cloud: Construyendo una infraestructura de big dat...
 
Big Data mit Apache Hadoop
Big Data mit Apache HadoopBig Data mit Apache Hadoop
Big Data mit Apache Hadoop
 

Más de inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

Más de inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Último

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 

Último (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 

Cloudera Impala Technical Overview

  • 1. Cloudera  Impala   Jus/n  Erickson  |    Senior  Product  Manager   May  2013  
  • 2. Agenda   •  Why  Impala?   •  Architectural  Overview   •  Real-­‐World  Use  Cases   •  Alterna/ve  Approaches   •  The  PlaKorm  for  Big  Data   ©2013 Cloudera, Inc. All Rights Reserved. 2
  • 3. Why  Hadoop?   •  Scalability   •  Simply  scales  just  by  adding  nodes   •  Local  processing  to  avoid  network  boSlenecks   •  Flexibility   •  All  kinds  of  data  (blobs,  documents,  records,  etc)   •  In  all  forms  (structured,  semi-­‐structured,  unstructured)   •  Store  anything  then  later  analyze  what  you  need   •  Efficiency   •  Cost  efficiency  (<$1k/TB)  on  commodity  hardware   •  Unified  storage,  metadata,  security  (no  duplica/on  or   synchroniza/on)   ©2013 Cloudera, Inc. All Rights Reserved. 3
  • 4. What’s  Impala?   •  Interac2ve  SQL   •  Typically  5-­‐65x  faster  than  Hive  (observed  up  to  100x  faster)   •  Responses  in  seconds  instead  of  minutes  (some/mes  sub-­‐second)   •  Nearly  ANSI-­‐92  standard  SQL  queries  with  Hive  SQL   •  Compa/ble  SQL  interface  for  exis/ng  Hadoop/CDH  applica/ons   •  Based  on  industry  standard  SQL   •  Na2vely  on  Hadoop/HBase  storage  and  metadata   •  Flexibility,  scale,  and  cost  advantages  of  Hadoop   •  No  duplica/on/synchroniza/on  of  data  and  metadata   •  Local  processing  to  avoid  network  boSlenecks   •  Separate  run2me  from  MapReduce   •  MapReduce  is  designed  and  great  for  batch   •  Impala  is  purpose-­‐built  for  low-­‐latency  SQL  queries  on  Hadoop   ©2013 Cloudera, Inc. All Rights Reserved. 4
  • 5. Benefits  of  Impala   5 More  &  Faster  Value  from  “Big  Data”   §  BI  tools  imprac/cal  on  Hadoop  before  Impala   §  Move  from  10s  of  Hadoop  users  per  cluster  to  100s  of  SQL  users   §  No  delays  from  data  migra/on   Flexibility   §  Query  across  exis/ng  data   §  Select  best-­‐fit  file  formats  (Parquet,  Avro,  etc.)   §  Run  mul/ple  frameworks  on  the  same  data  at  the  same  /me     Cost  Efficiency   §  Reduce  movement,  duplicate  storage  &  compute   §  10%  to  1%  the  cost  of  analy/c  DBMS   Full  Fidelity  Analysis   §  No  loss  from  aggrega/ons  or  fixed  schemas   ©2013 Cloudera, Inc. All Rights Reserved.
  • 6. Impala  Query  Execu/on   6 Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   SQL  App   ODBC   Hive   Metastore   HDFS  NN   Statestore   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   SQL  request   1)  Request  arrives  via  ODBC/JDBC/Beeswax/Shell   ©2013 Cloudera, Inc. All Rights Reserved.
  • 7. Impala  Query  Execu/on   7 Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   SQL  App   ODBC   Hive   Metastore   HDFS  NN   Statestore   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   2)  Planner  turns  request  into  collec2ons  of  plan  fragments   3)  Coordinator  ini2ates  execu2on  on  impalad(s)  local  to  data   ©2013 Cloudera, Inc. All Rights Reserved.
  • 8. Impala  Query  Execu/on   8 Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   SQL  App   ODBC   Hive   Metastore   HDFS  NN   Statestore   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   4)  Intermediate  results  are  streamed  between  impalad(s)   5)  Query  results  are  streamed  back  to  client   Query  results   ©2013 Cloudera, Inc. All Rights Reserved.
  • 9. Impala  and  Hive   9 Shares  Everything  Client-­‐Facing   §  Metadata  (table  defini/ons)   §  ODBC/JDBC  drivers   §  SQL  syntax  (Hive  SQL)   §  Flexible  file  formats   §  Machine  pool   §  Hue  GUI   But  Built  for  Different  Purposes   §  Hive:  runs  on  MapReduce  and  ideal  for  batch   processing   §  Impala:  na/ve  MPP  query  engine  ideal  for   interac/ve  SQL   Storage   Integra2on   Resource  Management   Metadata   HDFS   HBase   TEXT,  RCFILE,  PARQUET,  AVRO,  ETC.   RECORDS   Hive   SQL  Syntax   Impala   SQL  Syntax  +   Compute  Framework  MapReduce   Compute  Framework   Batch   Processing   Interac/ve   SQL   ©2013 Cloudera, Inc. All Rights Reserved.
  • 10. Impala  Use  Cases   10 Interac/ve  BI/analy/cs  on  more  data   Asking  new  ques/ons   Query-­‐able  archive  w/  full  fidelity   Data  processing  with  /ght  SLAs   Cost-­‐effec2ve,  ad  hoc  query  environment  that   offloads  the  data  warehouse  for:   ©2013 Cloudera, Inc. All Rights Reserved.
  • 11. Global  Financial  Services  Company   11 Saving  90%  on  incremental  EDW  spend  &   improving  performance  by  5x   Offload  data  warehouse  for  query-­‐able  archive   Store  decades  of  data  cost-­‐effec/vely   Process  &  analyze  on  the  same  system   Improve  capabili/es  through  interac/ve  query   on  more  data   ©2013 Cloudera, Inc. All Rights Reserved.
  • 12. Six3  Systems   12 Boos2ng  performance  by  20X  for  mission-­‐cri2cal,   real-­‐2me  cyber  security   Analyze  unstructured  data  with  flexibility  &     real-­‐/me  response   Integrate  with  exis/ng  desktop  &  BI  tools   Deploy  in  minutes  with  Cloudera  Manager   ©2013 Cloudera, Inc. All Rights Reserved.
  • 13. Expedia   13 Implemen2ng  self-­‐service  BI  on  big  data,     reducing  data  latency  by  50%     Offload  data  warehouse  for  archiving,  ETL  &   analy/cs   Unify  IT  environment     Con/nuously  ingest  &  analyze  at  scale   Drive  greater  usability  &  adop/on  of  big  data   stack   ©2013 Cloudera, Inc. All Rights Reserved.
  • 14. Our  Design  Strategy   14 Storage   Integra2on   Resource  Management   Metadata   Batch   Processing   MAPREDUCE,   HIVE  &  PIG   … Interac/ve   SQL   IMPALA   Machine   Learning   MAHOUT,  DATAFU   HDFS   HBase   TEXT,  RCFILE,  PARQUET,  AVRO,  ETC.   RECORDS   Engines   One  pool  of  data   One  metadata  model   One  security  framework   One  set  of  system  resources   An  Integrated  Part  of   the  Hadoop  System   ©2013 Cloudera, Inc. All Rights Reserved.
  • 15. Not  All  SQL  on  Hadoop  is  Created  Equal   15 Batch  MapReduce   Make  MapReduce  faster   Slow,  s2ll  batch   Remote  Query   Pull  data  from  HDFS  over   the  network  to  the  DW   compute  layer   Slow,  expensive   Siloed  DBMS   Load  data  into  a   proprietary  database  file   Rigid,  siloed  data,   slow  ETL   Impala   Na/ve  MPP  query  engine   that’s  integrated  into   Hadoop   Fast,  flexible,     cost-­‐effec2ve   $ ©2013 Cloudera, Inc. All Rights Reserved.
  • 16. The  Impala  Advantage   16 BI  Partners:   Building  on  the   Enterprise  Standard   POWERED BY IMPALA ©2013 Cloudera, Inc. All Rights Reserved.
  • 17. It’s  Not  Just  About  SQL  on  Hadoop   17 The  Plaeorm  for  Big  Data   Storage   Integra2on   Resource  Management   Metadata   Batch   Processing   MAPREDUCE,   HIVE  &  PIG   … Interac/ve   SQL   IMPALA   Machine   Learning   MAHOUT,  DATAFU   HDFS   HBase   TEXT,  RCFILE,  PARQUET,  AVRO…   RECORDS   Engines   Management    |    Support   Single  plaKorm  for  processing   &  analy/cs   Scales  to  ‘000s  of  servers   No  upfront  schema   10%  the  cost  per  TB   Open  source  plaKorm   ©2013 Cloudera, Inc. All Rights Reserved.