SlideShare a Scribd company logo
1 of 40
Download to read offline
Impala: A Modern,
Open-Source SQL
Engine for Hadoop	
  
Mark	
  Grover	
  
So+ware	
  Engineer,	
  Cloudera	
  
May	
  22nd,	
  2014	
  
Twi<er:	
  mark_grover	
  
	
  
Slides	
  at	
  	
  
slideshare.net/markgrover/introducFon-­‐to-­‐impala	
  
	
  
•  What	
  is	
  Hadoop?	
  
•  What	
  is	
  Impala?	
  
•  Use-­‐cases	
  for	
  Impala	
  
•  Architecture	
  of	
  Impala	
  
•  Impala	
  comparisons	
  and	
  performance	
  
•  Demo	
  (Fme	
  permiRng)	
  
Agenda	
  
Intro	
  to	
  Hadoop	
  
What	
  is	
  Apache	
  Hadoop?	
  
Has the Flexibility to Store
and Mine Any Type of Data
§  Ask questions across structured and
unstructured data that were previously
impossible to ask or solve
§  Not bound by a single schema
Excels at
Processing Complex Data
§  Scale-out architecture divides
workloads across multiple nodes
§  Flexible file system eliminates ETL
bottlenecks
Scales
Economically
§  Can be deployed on commodity
hardware
§  Open source platform guards
against vendor lock
Hadoop
Distributed File
System (HDFS)
Self-Healing, High
Bandwidth Clustered
Storage
MapReduce
Distributed Computing
Framework
Apache Hadoop is an open
source platform for data storage and
processing that is…
ü  Distributed
ü  Fault tolerant
ü  Scalable
CORE HADOOP SYSTEM COMPONENTS
MapReduce	
  -­‐	
  the	
  good	
  and	
  the	
  bad	
  
The	
  Good	
  
• VersaFle	
  
• Flexible	
  
• Scalable	
  
The	
  Bad	
  
•  High	
  latency	
  
•  Batch	
  oriented	
  
•  Not	
  all	
  paradigms	
  fit	
  very	
  
well	
  
•  Only	
  for	
  developers	
  
•  MR	
  is	
  hard	
  and	
  only	
  for	
  developers	
  
•  Higher	
  level	
  pla]orms	
  for	
  converFng	
  declaraFve	
  
syntax	
  to	
  MapReduce	
  
•  SQL	
  –	
  Hive	
  
•  workflow	
  language	
  –	
  Pig	
  
•  Build	
  on	
  top	
  of	
  MapReduce	
  (although	
  they	
  are	
  being	
  
made	
  more	
  pluggable	
  now)	
  
•  But	
  jobs	
  are	
  sFll	
  as	
  slow	
  as	
  MapReduce	
  
What	
  are	
  Hive	
  and	
  Pig?	
  
Impala	
  
•  General-­‐purpose	
  SQL	
  engine	
  
•  Real-­‐Fme	
  queries	
  in	
  Apache	
  Hadoop	
  
•  Beta	
  version	
  released	
  since	
  October	
  2012	
  
•  General	
  availability	
  (v1.0)	
  release	
  out	
  since	
  April	
  2013	
  
•  Open	
  source	
  under	
  Apache	
  license	
  
•  Latest	
  release	
  (v1.3.1)	
  released	
  on	
  May	
  1st,	
  2014	
  
What	
  is	
  Impala?	
  
Impala	
  Overview:	
  Goals	
  
•  General-­‐purpose	
  SQL	
  query	
  engine:	
  
•  Works	
  for	
  both	
  for	
  analyFcal	
  and	
  transacFonal/single-­‐row	
  
workloads	
  
•  Supports	
  queries	
  that	
  take	
  from	
  milliseconds	
  to	
  hours	
  
•  Runs	
  directly	
  within	
  Hadoop:	
  
•  reads	
  widely	
  used	
  Hadoop	
  file	
  formats	
  
•  talks	
  to	
  widely	
  used	
  Hadoop	
  storage	
  managers	
  	
  
•  runs	
  on	
  same	
  nodes	
  that	
  run	
  Hadoop	
  processes	
  
•  High	
  performance:	
  
•  C++	
  instead	
  of	
  Java	
  
•  runFme	
  code	
  generaFon	
  
•  completely	
  new	
  execuFon	
  engine	
  –	
  No	
  MapReduce	
  
User	
  View	
  of	
  Impala:	
  Overview	
  
•  Runs	
  as	
  a	
  distributed	
  service	
  in	
  cluster:	
  one	
  Impala	
  daemon	
  on	
  
each	
  node	
  with	
  data	
  
•  Highly	
  available:	
  no	
  single	
  point	
  of	
  failure	
  
User	
  View	
  of	
  Impala:	
  Overview	
  
•  There	
  is	
  no	
  ‘Impala	
  format’!	
  
•  Supported	
  file	
  formats:	
  
•  uncompressed/lzo-­‐compressed	
  text	
  files	
  
•  sequence	
  files	
  and	
  RCFile	
  with	
  snappy/gzip	
  compression	
  
•  Avro	
  data	
  files	
  
•  Parquet	
  columnar	
  format	
  (more	
  on	
  that	
  later)	
  
•  HBase	
  
User	
  View	
  of	
  Impala:	
  SQL	
  
•  SQL	
  support:	
  
•  essenFally	
  SQL-­‐92,	
  minus	
  correlated	
  subqueries	
  
•  only	
  equi-­‐joins;	
  no	
  non-­‐equi	
  joins,	
  no	
  cross	
  products	
  
•  Order	
  By	
  requires	
  Limit	
  
•  (Limited)	
  DDL	
  support	
  
•  SQL-­‐style	
  authorizaFon	
  via	
  Apache	
  Sentry	
  (incubaFng)	
  
•  UDFs	
  and	
  UDAFs	
  are	
  supported	
  
User	
  View	
  of	
  Impala:	
  SQL	
  
•  FuncFonal	
  limitaFons:	
  
•  No	
  file	
  formats,	
  SerDes	
  
•  no	
  beyond	
  SQL	
  (buckets,	
  samples,	
  transforms,	
  arrays,	
  
structs,	
  maps,	
  xpath,	
  json)	
  
•  Broadcast	
  joins	
  and	
  parFFoned	
  hash	
  joins	
  supported	
  
•  Smaller	
  table	
  has	
  to	
  fit	
  in	
  aggregate	
  memory	
  of	
  all	
  execuFng	
  
nodes	
  
Use	
  Cases	
  of	
  Impala	
  
Impala	
  Use	
  Cases	
  
Interactive BI/analytics on more data
Asking new questions – exploration,
ML
Data processing with tight SLAs
Query-able archive w/full fidelity
Cost-effective, ad hoc query environment that
offloads/replaces the data warehouse for:
Global	
  Financial	
  Services	
  Company	
  
Saved 90% on incremental EDW spend &
improved performance by 5x
Offload data warehouse for query-able
archive
Store decades of data cost-effectively
Process & analyze on the same system
Improved capabilities through interactive
query on more data
Digital	
  Media	
  Company	
  
20x performance improvement for
exploration & data discovery
Easily identify new data sets for
modeling
Interact with raw data directly to test
hypotheses
Avoid expensive DW schema changes
Accelerate ‘time to answer’
Architecture	
  of	
  Impala	
  
Impala	
  Architecture	
  
•  Three	
  binaries:	
  impalad,	
  statestored,	
  catalogd	
  
•  Impala	
  daemon	
  (impalad)	
  –	
  N	
  instances	
  
•  handles	
  client	
  requests	
  and	
  all	
  internal	
  requests	
  related	
  to	
  
query	
  execuFon	
  
•  State	
  store	
  daemon	
  (statestored)	
  –	
  1	
  instance	
  
•  Provides	
  name	
  service	
  and	
  metadata	
  distribuFon	
  
•  Catalog	
  daemon	
  (catalogd)	
  –	
  1	
  instance	
  
•  Relays	
  metadata	
  changes	
  to	
  all	
  impalad’s	
  
Impala	
  Architecture:	
  Query	
  ExecuFon	
  
Request	
  arrives	
  via	
  odbc/jdbc	
  
Query	
  Planner	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
SQL	
  App	
  
ODBC	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
Query	
  Planner	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
SQL	
  
request	
  
Query	
  Coordinator	
   Query	
  Coordinator	
  
HiveMeta
store	
  
HDFS	
  NN	
  
Statestore	
  
+	
  
Catalogd	
  
Impala	
  Architecture:	
  Query	
  ExecuFon	
  
Planner	
  turns	
  request	
  into	
  collecFons	
  of	
  plan	
  fragments	
  
Coordinator	
  iniFates	
  execuFon	
  on	
  remote	
  impalad's	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
SQL	
  App	
  
ODBC	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
HiveMeta
store	
  
HDFS	
  NN	
  
Statestore	
  
+	
  
Catalogd	
  
Impala	
  Architecture:	
  Query	
  ExecuFon	
  
Intermediate	
  results	
  are	
  streamed	
  between	
  impalad's	
  Query	
  
results	
  are	
  streamed	
  back	
  to	
  client	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
SQL	
  App	
  
ODBC	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
Query	
  Planner	
  
Query	
  Coordinator	
  
Query	
  Executor	
  
HDFS	
  DN	
   HBase	
  
query	
  
results	
  
HiveMeta
store	
  
HDFS	
  NN	
  
Statestore	
  
+	
  
Catalogd	
  
Query	
  Planning:	
  Overview	
  
•  2-­‐phase	
  planning	
  process:	
  
•  single-­‐node	
  plan:	
  le+-­‐deep	
  tree	
  of	
  plan	
  operators	
  
•  plan	
  parFFoning:	
  parFFon	
  single-­‐node	
  plan	
  to	
  maximize	
  scan	
  locality,	
  
minimize	
  data	
  movement	
  
•  ParallelizaFon	
  of	
  operators:	
  
•  All	
  query	
  operators	
  are	
  fully	
  distributed	
  
Query Planning:	
  Single-­‐Node	
  Plan	
  
•  Plan	
  operators:	
  Scan,	
  HashJoin,	
  HashAggregaFon,	
  Union,	
  
TopN,	
  Exchange	
  
Single-­‐Node	
  Plan:	
  Example	
  Query	
  
SELECT	
  t1.cusFd,	
  
	
  	
  	
  	
  	
  	
  	
  SUM(t2.revenue)	
  AS	
  revenue	
  
FROM	
  LargeHdfsTable	
  t1	
  
JOIN	
  LargeHdfsTable	
  t2	
  ON	
  (t1.id1	
  =	
  t2.id)	
  
JOIN	
  SmallHbaseTable	
  t3	
  ON	
  (t1.id2	
  =	
  t3.id)	
  
WHERE	
  t3.category	
  =	
  'Online'	
  
GROUP	
  BY	
  t1.cusFd	
  
ORDER	
  BY	
  revenue	
  DESC	
  LIMIT	
  10;	
  
Query Planning:	
  Single-­‐Node	
  Plan	
  
HashJoin
Scan: t1
Scan: t3
Scan: t2
HashJoin
TopN
Agg
•  Single-­‐node	
  plan	
  for	
  example:	
  	
  
Query	
  Planning:	
  Distributed	
  Plans	
  
HashJoinScan: t1
Scan: t3
Scan: t2
HashJoin
TopN
Pre-Agg
MergeAgg
TopN
Broadcast
Broadcast
hash t2.idhash t1.id1
hash
t1.custid
at HDFS DN
at HBase RS
at coordinator
Metadata	
  Handling	
  
•  Impala	
  metadata:	
  
•  Hive’s	
  metastore:	
  logical	
  metadata	
  (table	
  definiFons,	
  
columns,	
  CREATE	
  TABLE	
  parameters)	
  
•  HDFS	
  Namenode:	
  directory	
  contents	
  and	
  block	
  replica	
  
locaFons	
  
•  HDFS	
  DataNode:	
  block	
  replicas’	
  volume	
  IDs	
  
Impala	
  ExecuFon	
  Engine	
  
•  Wri<en	
  in	
  C++	
  for	
  minimal	
  execuFon	
  overhead	
  
•  Internal	
  in-­‐memory	
  tuple	
  format	
  puts	
  fixed-­‐width	
  
data	
  at	
  fixed	
  offsets	
  
•  Uses	
  intrinsics/special	
  cpu	
  instrucFons	
  for	
  text	
  
parsing,	
  crc32	
  computaFon,	
  etc.	
  
•  RunFme	
  code	
  generaFon	
  for	
  “big	
  loops”	
  
Impala	
  ExecuFon	
  Engine	
  
•  More	
  on	
  runFme	
  code	
  generaFon	
  
•  example	
  of	
  "big	
  loop":	
  insert	
  batch	
  of	
  rows	
  into	
  hash	
  table	
  
•  known	
  at	
  query	
  compile	
  Fme:	
  #	
  of	
  tuples	
  in	
  a	
  batch,	
  tuple	
  
layout,	
  column	
  types,	
  etc.	
  
•  generate	
  at	
  compile	
  Fme:	
  unrolled	
  loop	
  that	
  inlines	
  all	
  
funcFon	
  calls,	
  contains	
  no	
  dead	
  code,	
  minimizes	
  branches	
  
•  code	
  generated	
  using	
  llvm	
  
Comparing	
  Impala	
  to	
  Dremel	
  
•  What	
  is	
  Dremel?	
  
•  columnar	
  storage	
  for	
  data	
  with	
  nested	
  structures	
  
•  distributed	
  scalable	
  aggregaFon	
  on	
  top	
  of	
  that	
  
•  Columnar	
  storage	
  in	
  Hadoop:	
  Parquet	
  
•  stores	
  data	
  in	
  appropriate	
  naFve/binary	
  types	
  
•  can	
  also	
  store	
  nested	
  structures	
  similar	
  to	
  Dremel's	
  ColumnIO	
  
•  Parquet	
  is	
  open	
  source:	
  github.com/parquet	
  
•  Distributed	
  aggregaFon:	
  Impala	
  
•  Impala	
  plus	
  Parquet:	
  a	
  superset	
  of	
  the	
  published	
  version	
  of	
  
Dremel	
  (which	
  didn't	
  support	
  joins)	
  
32	
  
Performance	
  
Impala	
  Performance	
  Results	
  
• Impala’s	
  Latest	
  Milestone:	
  
•  Comparable	
  commercial	
  MPP	
  DBMS	
  speed	
  
•  NaFvely	
  on	
  Hadoop	
  
	
  
• Three	
  Result	
  Sets:	
  
•  Impala	
  vs	
  Hive	
  0.12	
  (Impala	
  6-­‐70x	
  faster)	
  
•  Impala	
  vs	
  “DBMS-­‐Y”	
  (Impala	
  average	
  of	
  2x	
  faster)	
  
•  Impala	
  scalability	
  (Impala	
  achieves	
  linear	
  scale)	
  
	
  
• Background	
  
•  20	
  pre-­‐selected,	
  diverse	
  TPC-­‐DS	
  queries	
  (modified	
  to	
  remove	
  unsupported	
  
language)	
  
•  Sufficient	
  data	
  scale	
  for	
  realisFc	
  comparison	
  (3	
  TB,	
  15	
  TB,	
  and	
  30	
  TB)	
  
•  RealisFc	
  nodes	
  (e.g.	
  8-­‐core	
  CPU,	
  96GB	
  RAM,	
  12x2TB	
  disks)	
  
•  Methodical	
  tesFng	
  (mulFple	
  runs,	
  reviewed	
  fairness	
  for	
  compeFFon,	
  etc)	
  
	
  
•  Details:	
  h<p://blog.cloudera.com/blog/2014/01/impala-­‐performance-­‐dbms-­‐class-­‐speed/	
  
33	
  
Impala	
  vs	
  Hive	
  0.12	
  (Lower	
  bars	
  are	
  be<er)	
  
34	
  
Impala	
  vs	
  “DBMS-­‐Y”	
  (Lower	
  bars	
  are	
  
be<er)	
  
35	
  
Impala	
  Scalability:	
  2x	
  the	
  Hardware	
  
(ExpectaFon:	
  Cut	
  Response	
  Times	
  in	
  Half)	
  
36	
  
Impala	
  Scalability:	
  2x	
  the	
  Hardware	
  and	
  2x	
  Users/Data	
  
(ExpectaFon:	
  Constant	
  Response	
  Times)	
  
37	
  
2x the Users, 2x the Hardware
2x the Data, 2x the Hardware
Demo	
  
•  Uses	
  Cloudera’s	
  Quickstart	
  VM
h<p://Fny.cloudera.com/quick-­‐start	
  
•  Dataset/queries	
  from	
  h<ps://github.com/
markgrover/cloudcon-­‐hive	
  
I	
  am	
  co-­‐authoring	
  O’Reilly	
  book	
  
Hadoop	
  ApplicaFon	
  
Architectures	
  
How	
  to	
  build	
  end-­‐to-­‐end	
  soluFons	
  
using	
  Apache	
  Hadoop	
  and	
  related	
  
tools	
  
@hadooparchbook	
  
www.hadooparchitecturebook.com	
  
Try	
  it	
  out!	
  
•  Open	
  source!	
  Available	
  at	
  cloudera.com,	
  AWS	
  EMR!	
  
•  Packages	
  for	
  many	
  different	
  Linux	
  flavours	
  
•  QuesFons/comments?	
  community.cloudera.com	
  
•  My	
  twi<er	
  handle:	
  mark_grover	
  
•  Slides	
  at:	
  slideshare.net/markgrover/introducFon-­‐to-­‐
impala	
  

More Related Content

What's hot

Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaDatabricks
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Julian Hyde
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)NAVER D2
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache HiveAvkash Chauhan
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
Impala presentation
Impala presentationImpala presentation
Impala presentationtrihug
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 

What's hot (20)

Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Apache hive
Apache hiveApache hive
Apache hive
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Sqoop
SqoopSqoop
Sqoop
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Impala presentation
Impala presentationImpala presentation
Impala presentation
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 

Viewers also liked

An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopChicago Hadoop Users Group
 
YARN - Hadoop's Resource Manager
YARN - Hadoop's Resource ManagerYARN - Hadoop's Resource Manager
YARN - Hadoop's Resource ManagerVertiCloud Inc
 
[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)Steve Min
 
Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue
Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and HueHadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue
Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Huegethue
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba
 
August 2013 HUG: Hue: the UI for Apache Hadoop
August 2013 HUG: Hue: the UI for Apache HadoopAugust 2013 HUG: Hue: the UI for Apache Hadoop
August 2013 HUG: Hue: the UI for Apache HadoopYahoo Developer Network
 
Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonBig Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonCaserta
 
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014gethue
 
Apache hadoop hue overview and introduction
Apache hadoop hue overview and introductionApache hadoop hue overview and introduction
Apache hadoop hue overview and introductionBigClasses Com
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
 
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Yongho Ha
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst TrainingCloudera, Inc.
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
An Introduction to Hadoop Hue Gui
An Introduction to Hadoop Hue GuiAn Introduction to Hadoop Hue Gui
An Introduction to Hadoop Hue GuiMike Frampton
 
Solr+Hadoop = Big Data Search
Solr+Hadoop = Big Data SearchSolr+Hadoop = Big Data Search
Solr+Hadoop = Big Data SearchCloudera, Inc.
 

Viewers also liked (16)

An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache Hadoop
 
YARN - Hadoop's Resource Manager
YARN - Hadoop's Resource ManagerYARN - Hadoop's Resource Manager
YARN - Hadoop's Resource Manager
 
[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)
 
Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue
Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and HueHadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue
Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
August 2013 HUG: Hue: the UI for Apache Hadoop
August 2013 HUG: Hue: the UI for Apache HadoopAugust 2013 HUG: Hue: the UI for Apache Hadoop
August 2013 HUG: Hue: the UI for Apache Hadoop
 
Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonBig Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive Comparison
 
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
 
Apache hadoop hue overview and introduction
Apache hadoop hue overview and introductionApache hadoop hue overview and introduction
Apache hadoop hue overview and introduction
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst Training
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
An Introduction to Hadoop Hue Gui
An Introduction to Hadoop Hue GuiAn Introduction to Hadoop Hue Gui
An Introduction to Hadoop Hue Gui
 
Solr+Hadoop = Big Data Search
Solr+Hadoop = Big Data SearchSolr+Hadoop = Big Data Search
Solr+Hadoop = Big Data Search
 

Similar to Introduction to Impala

Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentationhadooparchbook
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkJames Chen
 
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfimpalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfssusere05ec21
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera, Inc.
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...ssuserd3a367
 
SQL and Machine Learning on Hadoop
SQL and Machine Learning on HadoopSQL and Machine Learning on Hadoop
SQL and Machine Learning on HadoopMukund Babbar
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014cdmaxime
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemCloudera, Inc.
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)VMware Tanzu
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013StampedeCon
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoopbddmoscow
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Modern Data Stack France
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics PlatformN Masahiro
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera, Inc.
 

Similar to Introduction to Impala (20)

Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentation
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfimpalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Incredible Impala
Incredible Impala Incredible Impala
Incredible Impala
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Impala for PhillyDB Meetup
Impala for PhillyDB MeetupImpala for PhillyDB Meetup
Impala for PhillyDB Meetup
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed Computing
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
SQL and Machine Learning on Hadoop
SQL and Machine Learning on HadoopSQL and Machine Learning on Hadoop
SQL and Machine Learning on Hadoop
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
 

More from markgrover

From discovering to trusting data
From discovering to trusting dataFrom discovering to trusting data
From discovering to trusting datamarkgrover
 
Amundsen lineage designs - community meeting, Dec 2020
Amundsen lineage designs - community meeting, Dec 2020 Amundsen lineage designs - community meeting, Dec 2020
Amundsen lineage designs - community meeting, Dec 2020 markgrover
 
Amundsen at Brex and Looker integration
Amundsen at Brex and Looker integrationAmundsen at Brex and Looker integration
Amundsen at Brex and Looker integrationmarkgrover
 
REA Group's journey with Data Cataloging and Amundsen
REA Group's journey with Data Cataloging and AmundsenREA Group's journey with Data Cataloging and Amundsen
REA Group's journey with Data Cataloging and Amundsenmarkgrover
 
Amundsen gremlin proxy design
Amundsen gremlin proxy designAmundsen gremlin proxy design
Amundsen gremlin proxy designmarkgrover
 
Amundsen: From discovering to security data
Amundsen: From discovering to security dataAmundsen: From discovering to security data
Amundsen: From discovering to security datamarkgrover
 
Amundsen: From discovering to security data
Amundsen: From discovering to security dataAmundsen: From discovering to security data
Amundsen: From discovering to security datamarkgrover
 
Data Discovery & Trust through Metadata
Data Discovery & Trust through MetadataData Discovery & Trust through Metadata
Data Discovery & Trust through Metadatamarkgrover
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadatamarkgrover
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discoverymarkgrover
 
TensorFlow Extension (TFX) and Apache Beam
TensorFlow Extension (TFX) and Apache BeamTensorFlow Extension (TFX) and Apache Beam
TensorFlow Extension (TFX) and Apache Beammarkgrover
 
Big Data at Speed
Big Data at SpeedBig Data at Speed
Big Data at Speedmarkgrover
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyftmarkgrover
 
Dogfooding data at Lyft
Dogfooding data at LyftDogfooding data at Lyft
Dogfooding data at Lyftmarkgrover
 
Fighting cybersecurity threats with Apache Spot
Fighting cybersecurity threats with Apache SpotFighting cybersecurity threats with Apache Spot
Fighting cybersecurity threats with Apache Spotmarkgrover
 
Fraud Detection with Hadoop
Fraud Detection with HadoopFraud Detection with Hadoop
Fraud Detection with Hadoopmarkgrover
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsmarkgrover
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsmarkgrover
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoopmarkgrover
 

More from markgrover (20)

From discovering to trusting data
From discovering to trusting dataFrom discovering to trusting data
From discovering to trusting data
 
Amundsen lineage designs - community meeting, Dec 2020
Amundsen lineage designs - community meeting, Dec 2020 Amundsen lineage designs - community meeting, Dec 2020
Amundsen lineage designs - community meeting, Dec 2020
 
Amundsen at Brex and Looker integration
Amundsen at Brex and Looker integrationAmundsen at Brex and Looker integration
Amundsen at Brex and Looker integration
 
REA Group's journey with Data Cataloging and Amundsen
REA Group's journey with Data Cataloging and AmundsenREA Group's journey with Data Cataloging and Amundsen
REA Group's journey with Data Cataloging and Amundsen
 
Amundsen gremlin proxy design
Amundsen gremlin proxy designAmundsen gremlin proxy design
Amundsen gremlin proxy design
 
Amundsen: From discovering to security data
Amundsen: From discovering to security dataAmundsen: From discovering to security data
Amundsen: From discovering to security data
 
Amundsen: From discovering to security data
Amundsen: From discovering to security dataAmundsen: From discovering to security data
Amundsen: From discovering to security data
 
Data Discovery & Trust through Metadata
Data Discovery & Trust through MetadataData Discovery & Trust through Metadata
Data Discovery & Trust through Metadata
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
TensorFlow Extension (TFX) and Apache Beam
TensorFlow Extension (TFX) and Apache BeamTensorFlow Extension (TFX) and Apache Beam
TensorFlow Extension (TFX) and Apache Beam
 
Big Data at Speed
Big Data at SpeedBig Data at Speed
Big Data at Speed
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
 
Dogfooding data at Lyft
Dogfooding data at LyftDogfooding data at Lyft
Dogfooding data at Lyft
 
Fighting cybersecurity threats with Apache Spot
Fighting cybersecurity threats with Apache SpotFighting cybersecurity threats with Apache Spot
Fighting cybersecurity threats with Apache Spot
 
Fraud Detection with Hadoop
Fraud Detection with HadoopFraud Detection with Hadoop
Fraud Detection with Hadoop
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 

Recently uploaded

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 

Recently uploaded (20)

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 

Introduction to Impala

  • 1. Impala: A Modern, Open-Source SQL Engine for Hadoop   Mark  Grover   So+ware  Engineer,  Cloudera   May  22nd,  2014   Twi<er:  mark_grover     Slides  at     slideshare.net/markgrover/introducFon-­‐to-­‐impala    
  • 2. •  What  is  Hadoop?   •  What  is  Impala?   •  Use-­‐cases  for  Impala   •  Architecture  of  Impala   •  Impala  comparisons  and  performance   •  Demo  (Fme  permiRng)   Agenda  
  • 4. What  is  Apache  Hadoop?   Has the Flexibility to Store and Mine Any Type of Data §  Ask questions across structured and unstructured data that were previously impossible to ask or solve §  Not bound by a single schema Excels at Processing Complex Data §  Scale-out architecture divides workloads across multiple nodes §  Flexible file system eliminates ETL bottlenecks Scales Economically §  Can be deployed on commodity hardware §  Open source platform guards against vendor lock Hadoop Distributed File System (HDFS) Self-Healing, High Bandwidth Clustered Storage MapReduce Distributed Computing Framework Apache Hadoop is an open source platform for data storage and processing that is… ü  Distributed ü  Fault tolerant ü  Scalable CORE HADOOP SYSTEM COMPONENTS
  • 5. MapReduce  -­‐  the  good  and  the  bad   The  Good   • VersaFle   • Flexible   • Scalable   The  Bad   •  High  latency   •  Batch  oriented   •  Not  all  paradigms  fit  very   well   •  Only  for  developers  
  • 6. •  MR  is  hard  and  only  for  developers   •  Higher  level  pla]orms  for  converFng  declaraFve   syntax  to  MapReduce   •  SQL  –  Hive   •  workflow  language  –  Pig   •  Build  on  top  of  MapReduce  (although  they  are  being   made  more  pluggable  now)   •  But  jobs  are  sFll  as  slow  as  MapReduce   What  are  Hive  and  Pig?  
  • 8. •  General-­‐purpose  SQL  engine   •  Real-­‐Fme  queries  in  Apache  Hadoop   •  Beta  version  released  since  October  2012   •  General  availability  (v1.0)  release  out  since  April  2013   •  Open  source  under  Apache  license   •  Latest  release  (v1.3.1)  released  on  May  1st,  2014   What  is  Impala?  
  • 9. Impala  Overview:  Goals   •  General-­‐purpose  SQL  query  engine:   •  Works  for  both  for  analyFcal  and  transacFonal/single-­‐row   workloads   •  Supports  queries  that  take  from  milliseconds  to  hours   •  Runs  directly  within  Hadoop:   •  reads  widely  used  Hadoop  file  formats   •  talks  to  widely  used  Hadoop  storage  managers     •  runs  on  same  nodes  that  run  Hadoop  processes   •  High  performance:   •  C++  instead  of  Java   •  runFme  code  generaFon   •  completely  new  execuFon  engine  –  No  MapReduce  
  • 10. User  View  of  Impala:  Overview   •  Runs  as  a  distributed  service  in  cluster:  one  Impala  daemon  on   each  node  with  data   •  Highly  available:  no  single  point  of  failure  
  • 11. User  View  of  Impala:  Overview   •  There  is  no  ‘Impala  format’!   •  Supported  file  formats:   •  uncompressed/lzo-­‐compressed  text  files   •  sequence  files  and  RCFile  with  snappy/gzip  compression   •  Avro  data  files   •  Parquet  columnar  format  (more  on  that  later)   •  HBase  
  • 12. User  View  of  Impala:  SQL   •  SQL  support:   •  essenFally  SQL-­‐92,  minus  correlated  subqueries   •  only  equi-­‐joins;  no  non-­‐equi  joins,  no  cross  products   •  Order  By  requires  Limit   •  (Limited)  DDL  support   •  SQL-­‐style  authorizaFon  via  Apache  Sentry  (incubaFng)   •  UDFs  and  UDAFs  are  supported  
  • 13. User  View  of  Impala:  SQL   •  FuncFonal  limitaFons:   •  No  file  formats,  SerDes   •  no  beyond  SQL  (buckets,  samples,  transforms,  arrays,   structs,  maps,  xpath,  json)   •  Broadcast  joins  and  parFFoned  hash  joins  supported   •  Smaller  table  has  to  fit  in  aggregate  memory  of  all  execuFng   nodes  
  • 14. Use  Cases  of  Impala  
  • 15. Impala  Use  Cases   Interactive BI/analytics on more data Asking new questions – exploration, ML Data processing with tight SLAs Query-able archive w/full fidelity Cost-effective, ad hoc query environment that offloads/replaces the data warehouse for:
  • 16. Global  Financial  Services  Company   Saved 90% on incremental EDW spend & improved performance by 5x Offload data warehouse for query-able archive Store decades of data cost-effectively Process & analyze on the same system Improved capabilities through interactive query on more data
  • 17. Digital  Media  Company   20x performance improvement for exploration & data discovery Easily identify new data sets for modeling Interact with raw data directly to test hypotheses Avoid expensive DW schema changes Accelerate ‘time to answer’
  • 19. Impala  Architecture   •  Three  binaries:  impalad,  statestored,  catalogd   •  Impala  daemon  (impalad)  –  N  instances   •  handles  client  requests  and  all  internal  requests  related  to   query  execuFon   •  State  store  daemon  (statestored)  –  1  instance   •  Provides  name  service  and  metadata  distribuFon   •  Catalog  daemon  (catalogd)  –  1  instance   •  Relays  metadata  changes  to  all  impalad’s  
  • 20. Impala  Architecture:  Query  ExecuFon   Request  arrives  via  odbc/jdbc   Query  Planner   Query  Executor   HDFS  DN   HBase   SQL  App   ODBC   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   Query  Planner   Query  Executor   HDFS  DN   HBase   SQL   request   Query  Coordinator   Query  Coordinator   HiveMeta store   HDFS  NN   Statestore   +   Catalogd  
  • 21. Impala  Architecture:  Query  ExecuFon   Planner  turns  request  into  collecFons  of  plan  fragments   Coordinator  iniFates  execuFon  on  remote  impalad's   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   SQL  App   ODBC   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   HiveMeta store   HDFS  NN   Statestore   +   Catalogd  
  • 22. Impala  Architecture:  Query  ExecuFon   Intermediate  results  are  streamed  between  impalad's  Query   results  are  streamed  back  to  client   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   SQL  App   ODBC   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   Query  Planner   Query  Coordinator   Query  Executor   HDFS  DN   HBase   query   results   HiveMeta store   HDFS  NN   Statestore   +   Catalogd  
  • 23. Query  Planning:  Overview   •  2-­‐phase  planning  process:   •  single-­‐node  plan:  le+-­‐deep  tree  of  plan  operators   •  plan  parFFoning:  parFFon  single-­‐node  plan  to  maximize  scan  locality,   minimize  data  movement   •  ParallelizaFon  of  operators:   •  All  query  operators  are  fully  distributed  
  • 24. Query Planning:  Single-­‐Node  Plan   •  Plan  operators:  Scan,  HashJoin,  HashAggregaFon,  Union,   TopN,  Exchange  
  • 25. Single-­‐Node  Plan:  Example  Query   SELECT  t1.cusFd,                SUM(t2.revenue)  AS  revenue   FROM  LargeHdfsTable  t1   JOIN  LargeHdfsTable  t2  ON  (t1.id1  =  t2.id)   JOIN  SmallHbaseTable  t3  ON  (t1.id2  =  t3.id)   WHERE  t3.category  =  'Online'   GROUP  BY  t1.cusFd   ORDER  BY  revenue  DESC  LIMIT  10;  
  • 26. Query Planning:  Single-­‐Node  Plan   HashJoin Scan: t1 Scan: t3 Scan: t2 HashJoin TopN Agg •  Single-­‐node  plan  for  example:    
  • 27. Query  Planning:  Distributed  Plans   HashJoinScan: t1 Scan: t3 Scan: t2 HashJoin TopN Pre-Agg MergeAgg TopN Broadcast Broadcast hash t2.idhash t1.id1 hash t1.custid at HDFS DN at HBase RS at coordinator
  • 28. Metadata  Handling   •  Impala  metadata:   •  Hive’s  metastore:  logical  metadata  (table  definiFons,   columns,  CREATE  TABLE  parameters)   •  HDFS  Namenode:  directory  contents  and  block  replica   locaFons   •  HDFS  DataNode:  block  replicas’  volume  IDs  
  • 29. Impala  ExecuFon  Engine   •  Wri<en  in  C++  for  minimal  execuFon  overhead   •  Internal  in-­‐memory  tuple  format  puts  fixed-­‐width   data  at  fixed  offsets   •  Uses  intrinsics/special  cpu  instrucFons  for  text   parsing,  crc32  computaFon,  etc.   •  RunFme  code  generaFon  for  “big  loops”  
  • 30. Impala  ExecuFon  Engine   •  More  on  runFme  code  generaFon   •  example  of  "big  loop":  insert  batch  of  rows  into  hash  table   •  known  at  query  compile  Fme:  #  of  tuples  in  a  batch,  tuple   layout,  column  types,  etc.   •  generate  at  compile  Fme:  unrolled  loop  that  inlines  all   funcFon  calls,  contains  no  dead  code,  minimizes  branches   •  code  generated  using  llvm  
  • 31. Comparing  Impala  to  Dremel   •  What  is  Dremel?   •  columnar  storage  for  data  with  nested  structures   •  distributed  scalable  aggregaFon  on  top  of  that   •  Columnar  storage  in  Hadoop:  Parquet   •  stores  data  in  appropriate  naFve/binary  types   •  can  also  store  nested  structures  similar  to  Dremel's  ColumnIO   •  Parquet  is  open  source:  github.com/parquet   •  Distributed  aggregaFon:  Impala   •  Impala  plus  Parquet:  a  superset  of  the  published  version  of   Dremel  (which  didn't  support  joins)  
  • 33. Impala  Performance  Results   • Impala’s  Latest  Milestone:   •  Comparable  commercial  MPP  DBMS  speed   •  NaFvely  on  Hadoop     • Three  Result  Sets:   •  Impala  vs  Hive  0.12  (Impala  6-­‐70x  faster)   •  Impala  vs  “DBMS-­‐Y”  (Impala  average  of  2x  faster)   •  Impala  scalability  (Impala  achieves  linear  scale)     • Background   •  20  pre-­‐selected,  diverse  TPC-­‐DS  queries  (modified  to  remove  unsupported   language)   •  Sufficient  data  scale  for  realisFc  comparison  (3  TB,  15  TB,  and  30  TB)   •  RealisFc  nodes  (e.g.  8-­‐core  CPU,  96GB  RAM,  12x2TB  disks)   •  Methodical  tesFng  (mulFple  runs,  reviewed  fairness  for  compeFFon,  etc)     •  Details:  h<p://blog.cloudera.com/blog/2014/01/impala-­‐performance-­‐dbms-­‐class-­‐speed/   33  
  • 34. Impala  vs  Hive  0.12  (Lower  bars  are  be<er)   34  
  • 35. Impala  vs  “DBMS-­‐Y”  (Lower  bars  are   be<er)   35  
  • 36. Impala  Scalability:  2x  the  Hardware   (ExpectaFon:  Cut  Response  Times  in  Half)   36  
  • 37. Impala  Scalability:  2x  the  Hardware  and  2x  Users/Data   (ExpectaFon:  Constant  Response  Times)   37   2x the Users, 2x the Hardware 2x the Data, 2x the Hardware
  • 38. Demo   •  Uses  Cloudera’s  Quickstart  VM h<p://Fny.cloudera.com/quick-­‐start   •  Dataset/queries  from  h<ps://github.com/ markgrover/cloudcon-­‐hive  
  • 39. I  am  co-­‐authoring  O’Reilly  book   Hadoop  ApplicaFon   Architectures   How  to  build  end-­‐to-­‐end  soluFons   using  Apache  Hadoop  and  related   tools   @hadooparchbook   www.hadooparchitecturebook.com  
  • 40. Try  it  out!   •  Open  source!  Available  at  cloudera.com,  AWS  EMR!   •  Packages  for  many  different  Linux  flavours   •  QuesFons/comments?  community.cloudera.com   •  My  twi<er  handle:  mark_grover   •  Slides  at:  slideshare.net/markgrover/introducFon-­‐to-­‐ impala