SlideShare una empresa de Scribd logo
1 de 43
Descargar para leer sin conexión
Apache	
  Accumulo	
  and	
  Cloudera	
  
Hadoop-­‐DC,	
  July	
  2013	
  
Joey	
  Echeverria	
  |	
  Director,	
  Federal	
  FTS	
  
joey@cloudera.com	
  |	
  @fwiffo	
  
©2013	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
1
Apache	
  Accumulo	
  and	
  Cloudera	
  
HADOOP	
  101	
  
2	
  
OperaNng	
  Systems	
  
•  Manage	
  and	
  schedule	
  machine	
  resources	
  
•  CPU	
  
•  RAM	
  
•  Memory	
  
•  Provide	
  abstracNons	
  and	
  APIs	
  
•  Files	
  =	
  stream	
  of	
  bytes	
  
•  Process	
  =	
  instrucNons	
  +	
  private	
  memory	
  space	
  
3
Distributed	
  OperaNng	
  System	
  
•  Same	
  thing,	
  but	
  over	
  a	
  cluster	
  of	
  networked	
  servers	
  
•  AddiNonal	
  concerns:	
  
•  Inter-­‐process	
  and	
  inter-­‐machine	
  communicaNon	
  
•  Data	
  locality	
  
•  Data	
  availability	
  
•  Data	
  processing	
  availability	
  
4
Hadoop	
  
•  Defacto	
  Distributed	
  OperaNng	
  System	
  
•  Apache	
  HDFS	
  
•  Apache	
  MapReduce	
  and	
  Apache	
  YARN	
  
5
Ecosystem	
  
6
Key	
  Value	
  Stores	
   High	
  Level	
  Batch	
  Languages	
  
Low	
  Latency	
  SQL	
  Engine	
  Graph	
  Processing	
  
Cloudera	
  
7
CDH	
  History	
  
8
CDH1	
  
	
  
*HDFS	
  
*MR	
  
*Hive	
  
*Pig	
  
CDH2	
  
	
  
*HDFS	
  
*MR	
  
*Hive	
  
*Pig	
  
CDH3	
  
	
  
*HDFS	
  
*MR	
  
*Hive	
  
*Pig	
  
*Flume	
  
*HBase	
  
Hue	
  
*Mahout	
  
*Oozie	
  
*Sqoop	
  
*Whirr	
  
*Zookeeper	
  
*Avro	
  
CDH4	
  
	
  
*HDFS	
  
*MR	
  
*YARN	
  
*Hive	
  
*Pig	
  
*Flume	
  
*HBase	
  
Hue	
  
*Mahout	
  
*Oozie	
  
*Sqoop	
  
*Whirr	
  
*Zookeeper	
  
*Avro	
  
DataFu	
  
HCatalog	
  
Impala	
  
*Solr	
  
*BigTop	
  
Sentry	
  
Apache	
  Accumulo	
  and	
  Cloudera	
  
ACCUMULO	
  101	
  AND	
  201	
  
9	
  
BigTable	
  
10
Accumulo	
  Data	
  Model	
  
•  MulJ-­‐dimensional	
  sorted	
  map	
  
row id -> [
family -> [
qualifier -> [
visibility -> [
timestamp -> value
]
]
]
]
11
Accumulo	
  Storage	
  Model	
  
•  key	
  -­‐>	
  value	
  
•  key	
  =	
  <row	
  id><column><Nmestamp>	
  
•  column	
  =	
  <family><qualifier><visibility>	
  
12
Key	
  
Value	
  
Row	
  ID	
  
Column	
  
Timestamp	
  
Family	
   Qualifier	
   Visibility	
  
13	
  
Other	
  Concerns	
  
•  Write-­‐ahead	
  log	
  
•  Tablet	
  server	
  failure	
  handling	
  
•  Versioning	
  
•  Iterators	
  
•  Cell-­‐level	
  security	
  
14
Apache	
  Accumulo	
  and	
  Cloudera	
  
PROJECT	
  HISTORY	
  
15	
  
Pre-­‐Apache	
  
16
Apache	
  
17
RelaNonship	
  to	
  Hadoop	
  Releases	
  
•  1.3.x	
  -­‐>	
  Hadoop	
  0.20.2	
  
•  1.4.x	
  -­‐>	
  Hadoop	
  0.20.2,	
  Hadoop	
  0.20.203	
  
•  1.5.x	
  -­‐>	
  Hadoop	
  1.0.4,	
  Hadoop	
  2.0.4-­‐alpha	
  
18
Accumulo	
  and	
  Cloudera	
  Releases	
  
•  Accumulo	
  1.3.x,	
  1.4.x,	
  and	
  1.5.x	
  all	
  work	
  with	
  CDH3	
  
•  Accumulo	
  1.5.x	
  should	
  work	
  with	
  CDH4…	
  
•  Limited	
  tesNng	
  
19
Apache	
  Accumulo	
  and	
  Cloudera	
  
ANNOUNCEMENT	
  
20	
  
Apache	
  Accumulo	
  and	
  Cloudera	
  
CLOUDERA	
  SUPPORT	
  OF	
  APACHE	
  
ACCUMULO	
  ON	
  CDH4	
  
21	
  
Apache	
  Accumulo	
  and	
  Cloudera	
  
DEMO	
  
22	
  
System	
  Logs	
  
•  Id	
  
•  Unique	
  id	
  for	
  an	
  acNon	
  
•  Timestamp	
  
•  Time	
  the	
  acNon	
  occured	
  
•  Actor	
  
•  User	
  or	
  system	
  performing	
  the	
  acNon	
  
•  AcNon	
  
•  The	
  acNon	
  taken	
  
•  Object	
  
•  The	
  object	
  of	
  the	
  acNon	
  
•  Info	
  
•  Free	
  form	
  informaNon	
  (e.g.	
  success/failure,	
  alribute	
  value,	
  etc.)	
  
23
AcNons	
  
•  created_user	
  
•  deleted_user	
  
•  set_password	
  
•  logged_in	
  
•  logged_out	
  
•  read	
  
•  modified	
  
24
Roles	
  
•  system	
  
•  Any	
  user	
  on	
  the	
  system	
  
•  admin	
  
•  Administrators	
  
•  audit	
  
•  Auditors	
  
25
Accumulo	
  Data	
  Model	
  
26
Key	
  
Value	
  
Row	
  ID	
  
Column	
  
Timestamp	
  
Family	
   Qualifier	
   Visibility	
  
<ts>-­‐<id>	
   <actor>	
   <acNon>:<object>	
   	
  	
   	
  	
   <info>	
  
Apache	
  Accumulo	
  and	
  Cloudera	
  
DEMO	
  
27	
  
Logs	
  Demo	
  
28
Row	
  key	
   Column	
   Visibility	
   Value	
  
201307241535-­‐1	
   root:created_user:sean	
   audit	
   succeeded	
  
201307241535-­‐1	
   root:set_password:sean	
   admin&audit	
   password	
  
201307241537-­‐2	
   sean:logged_in:host	
   system	
   succeeded	
  
201307241538-­‐3	
  
	
  
sean:read:/tmp/a	
   audit	
   succeeded	
  
201307241539-­‐4	
  
	
  
sean:modified:/tmp/a	
   audit	
   failed	
  
201307241540-­‐5	
  
	
  
sean:logged_out:host	
   system	
   succeeded	
  
Apache	
  Accumulo	
  and	
  Cloudera	
  
VERSIONS	
  REDUX	
  
29	
  
Recap	
  
•  Accumulo	
  1.3.x,	
  1.4.x,	
  and	
  1.5.x	
  all	
  work	
  with	
  CDH3	
  
•  Accumulo	
  1.5.x	
  should	
  work	
  with	
  CDH4	
  
30
Cloudera	
  Support	
  
•  Naturally,	
  Cloudera	
  has	
  tested	
  and	
  packaged	
  
Accumulo	
  1.5…	
  
•  But	
  1.5	
  is	
  rather	
  bleeding	
  edge…	
  
•  So,	
  we	
  instead	
  back	
  ported	
  Hadoop	
  2.0	
  support	
  from	
  
1.5	
  onto	
  1.4.3	
  
31
Apache	
  Accumulo	
  and	
  Cloudera	
  
ECOSYSTEM	
  INTEGRATION	
  
32	
  
Apache	
  Nutch	
  
33
Apache	
  Pig	
  
34
Apache	
  Accumulo	
  and	
  Cloudera	
  
DEMO	
  
35	
  
Apache	
  Accumulo	
  and	
  Cloudera	
  
NEXT	
  STEPS	
  
36	
  
Recap	
  
•  What’s	
  available	
  today	
  
•  Beta	
  release	
  of	
  Accumulo	
  1.4.3	
  on	
  CDH4.3	
  
•  Beta	
  release	
  of	
  Accumulo	
  1.4.3	
  Pig	
  integraNon	
  
•  Semi-­‐private	
  beta	
  
•  Contact	
  me	
  (joey@cloudera.com)	
  if	
  you’re	
  interested	
  in	
  
trying	
  out	
  the	
  bits	
  
37
Future	
  Ideas	
  (not	
  promises	
  ;)	
  
•  Cloudera	
  Manager	
  integraNon	
  
•  Flume	
  integraNon	
  
•  Sqoop	
  integraNon	
  
•  Hive	
  integraNon	
  
•  Impala	
  integraNon	
  
38
What	
  next?	
  
•  Download	
  Hadoop!	
  
•  CDH	
  available	
  at	
  www.cloudera.com	
  
•  Cloudera	
  provides	
  pre-­‐loaded	
  VMs	
  
•  hlps://ccp.cloudera.com/display/SUPPORT/Cloudera
+QuickStart+VM	
  
•  Reach	
  out	
  to	
  me	
  (joey@cloudera.com)	
  if	
  you	
  want	
  to	
  
try	
  out	
  the	
  Accumulo	
  beta	
  
•  InstrucNons	
  to	
  replicate	
  the	
  demos	
  pending	
  
My	
  personal	
  preference	
  
•  Cloudera	
  Manager	
  
•  hlps://ccp.cloudera.com/display/SUPPORT/Downloads	
  
•  Free	
  up	
  to	
  unlimited	
  nodes!	
  
Shout	
  Out	
  
•  Jason	
  Trost	
  
•  @jason_trost	
  
•  covert.io	
  blog	
  posts	
  
•  hlp://www.covert.io/post/18414889381/accumulo-­‐
nutch-­‐and-­‐gora	
  
•  hlp://www.covert.io/post/18605091231/accumulo-­‐and-­‐
pig	
  
QuesNons?	
  
•  Contact	
  me!	
  
•  Joey	
  Echeverria	
  
•  joey@cloudera.com	
  
•  @fwiffo	
  
•  We’re	
  hiring!	
  
©2013	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
43

Más contenido relacionado

La actualidad más candente

August 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieAugust 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieYahoo Developer Network
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
HBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User GroupHBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User GroupCloudera, Inc.
 
Hivemail: Scalable Machine Learning Library for Apache Hive
Hivemail: Scalable Machine Learning Library for Apache HiveHivemail: Scalable Machine Learning Library for Apache Hive
Hivemail: Scalable Machine Learning Library for Apache HiveDataWorks Summit
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiSlim Baltagi
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kiteJoey Echeverria
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDataWorks Summit
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv larsgeorge
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014alanfgates
 
Functional Programming and Big Data
Functional Programming and Big DataFunctional Programming and Big Data
Functional Programming and Big DataDataWorks Summit
 
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigHivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigDataWorks Summit/Hadoop Summit
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureSkillspeed
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst TrainingCloudera, Inc.
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopCloudera, Inc.
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoopmarkgrover
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoopnvvrajesh
 
Apache Hive authorization models
Apache Hive authorization modelsApache Hive authorization models
Apache Hive authorization modelsThejas Nair
 

La actualidad más candente (20)

August 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieAugust 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache Oozie
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
HBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User GroupHBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User Group
 
Hivemail: Scalable Machine Learning Library for Apache Hive
Hivemail: Scalable Machine Learning Library for Apache HiveHivemail: Scalable Machine Learning Library for Apache Hive
Hivemail: Scalable Machine Learning Library for Apache Hive
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kite
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
 
Functional Programming and Big Data
Functional Programming and Big DataFunctional Programming and Big Data
Functional Programming and Big Data
 
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigHivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst Training
 
HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016
 
Polyalgebra
PolyalgebraPolyalgebra
Polyalgebra
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Apache Hive authorization models
Apache Hive authorization modelsApache Hive authorization models
Apache Hive authorization models
 

Destacado

Stupid Shell Tricks with Apache Accumulo
Stupid Shell Tricks with Apache AccumuloStupid Shell Tricks with Apache Accumulo
Stupid Shell Tricks with Apache AccumuloCloudera, Inc.
 
AnalyzingMovieData and Business Intelligence
AnalyzingMovieData and Business IntelligenceAnalyzingMovieData and Business Intelligence
AnalyzingMovieData and Business IntelligenceJUNWEI GUAN
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupAndrei Savu
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation Mahantesh Angadi
 
Unit testing Agile OpenSpace
Unit testing Agile OpenSpaceUnit testing Agile OpenSpace
Unit testing Agile OpenSpaceAndrei Savu
 
CDH5最新情報 #cwt2013
CDH5最新情報 #cwt2013CDH5最新情報 #cwt2013
CDH5最新情報 #cwt2013Cloudera Japan
 
Recommendation Engine using Apache Mahout
Recommendation Engine using Apache MahoutRecommendation Engine using Apache Mahout
Recommendation Engine using Apache MahoutAmbarish Hazarnis
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installationSumitra Pundlik
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashAndrei Savu
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
 
Extending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via APIExtending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via APIClouderaUserGroups
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera, Inc.
 
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformSamsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformCloudera, Inc.
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera managerChris Westin
 
Apache Accumulo Overview
Apache Accumulo OverviewApache Accumulo Overview
Apache Accumulo OverviewBill Havanki
 
Cloudera Manager 5 (hadoop運用) #cwt2013
Cloudera Manager 5 (hadoop運用)  #cwt2013Cloudera Manager 5 (hadoop運用)  #cwt2013
Cloudera Manager 5 (hadoop運用) #cwt2013Cloudera Japan
 
Хмарні технології (доповідач Ткачук Г.В.)
Хмарні технології (доповідач Ткачук Г.В.)Хмарні технології (доповідач Ткачук Г.В.)
Хмарні технології (доповідач Ткачук Г.В.)galanet82
 

Destacado (20)

YARN High Availability
YARN High AvailabilityYARN High Availability
YARN High Availability
 
Opc
OpcOpc
Opc
 
чоповський Open scada_2014
чоповський Open scada_2014чоповський Open scada_2014
чоповський Open scada_2014
 
Stupid Shell Tricks with Apache Accumulo
Stupid Shell Tricks with Apache AccumuloStupid Shell Tricks with Apache Accumulo
Stupid Shell Tricks with Apache Accumulo
 
AnalyzingMovieData and Business Intelligence
AnalyzingMovieData and Business IntelligenceAnalyzingMovieData and Business Intelligence
AnalyzingMovieData and Business Intelligence
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation
 
Unit testing Agile OpenSpace
Unit testing Agile OpenSpaceUnit testing Agile OpenSpace
Unit testing Agile OpenSpace
 
CDH5最新情報 #cwt2013
CDH5最新情報 #cwt2013CDH5最新情報 #cwt2013
CDH5最新情報 #cwt2013
 
Recommendation Engine using Apache Mahout
Recommendation Engine using Apache MahoutRecommendation Engine using Apache Mahout
Recommendation Engine using Apache Mahout
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
Extending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via APIExtending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via API
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
 
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformSamsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera manager
 
Apache Accumulo Overview
Apache Accumulo OverviewApache Accumulo Overview
Apache Accumulo Overview
 
Cloudera Manager 5 (hadoop運用) #cwt2013
Cloudera Manager 5 (hadoop運用)  #cwt2013Cloudera Manager 5 (hadoop運用)  #cwt2013
Cloudera Manager 5 (hadoop運用) #cwt2013
 
Хмарні технології (доповідач Ткачук Г.В.)
Хмарні технології (доповідач Ткачук Г.В.)Хмарні технології (доповідач Ткачук Г.В.)
Хмарні технології (доповідач Ткачук Г.В.)
 

Similar a Apache Accumulo and Cloudera

Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Data Con LA
 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Cloudera, Inc.
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Belmiro Moreira
 
“Containerizing” applications with Docker: Ecosystem and Tools
“Containerizing” applications with Docker: Ecosystem and Tools“Containerizing” applications with Docker: Ecosystem and Tools
“Containerizing” applications with Docker: Ecosystem and ToolsFrancisco Javier Ramírez Urea
 
Oracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceOracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceEnkitec
 
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)Bobby Curtis
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
OpenContrail Implementations
OpenContrail ImplementationsOpenContrail Implementations
OpenContrail ImplementationsJakub Pavlik
 
OGG Architecture Performance
OGG Architecture PerformanceOGG Architecture Performance
OGG Architecture PerformanceEnkitec
 
Cloud Standards and CloudStack
Cloud Standards and CloudStackCloud Standards and CloudStack
Cloud Standards and CloudStackSebastien Goasguen
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingGreat Wide Open
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on dockerWei Ting Chen
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopNeo4j
 
Vijfhart thema-avond-oracle-12c-new-features
Vijfhart thema-avond-oracle-12c-new-featuresVijfhart thema-avond-oracle-12c-new-features
Vijfhart thema-avond-oracle-12c-new-featuresmkorremans
 
High Performance Drupal
High Performance DrupalHigh Performance Drupal
High Performance DrupalChapter Three
 

Similar a Apache Accumulo and Cloudera (20)

Instant hadoop of your own
Instant hadoop of your ownInstant hadoop of your own
Instant hadoop of your own
 
YARN
YARNYARN
YARN
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
OWF12/Open Standards for Cloud - Cs owf
OWF12/Open Standards for Cloud - Cs owfOWF12/Open Standards for Cloud - Cs owf
OWF12/Open Standards for Cloud - Cs owf
 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
 
“Containerizing” applications with Docker: Ecosystem and Tools
“Containerizing” applications with Docker: Ecosystem and Tools“Containerizing” applications with Docker: Ecosystem and Tools
“Containerizing” applications with Docker: Ecosystem and Tools
 
Oracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceOracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture Performance
 
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
OpenContrail Implementations
OpenContrail ImplementationsOpenContrail Implementations
OpenContrail Implementations
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
OGG Architecture Performance
OGG Architecture PerformanceOGG Architecture Performance
OGG Architecture Performance
 
Cloud Standards and CloudStack
Cloud Standards and CloudStackCloud Standards and CloudStack
Cloud Standards and CloudStack
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
 
SolrCloud on Hadoop
SolrCloud on HadoopSolrCloud on Hadoop
SolrCloud on Hadoop
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache Hop
 
Vijfhart thema-avond-oracle-12c-new-features
Vijfhart thema-avond-oracle-12c-new-featuresVijfhart thema-avond-oracle-12c-new-features
Vijfhart thema-avond-oracle-12c-new-features
 
High Performance Drupal
High Performance DrupalHigh Performance Drupal
High Performance Drupal
 

Más de Joey Echeverria

Building production spark streaming applications
Building production spark streaming applicationsBuilding production spark streaming applications
Building production spark streaming applicationsJoey Echeverria
 
Embeddable data transformation for real time streams
Embeddable data transformation for real time streamsEmbeddable data transformation for real time streams
Embeddable data transformation for real time streamsJoey Echeverria
 
The Future of Apache Hadoop Security
The Future of Apache Hadoop SecurityThe Future of Apache Hadoop Security
The Future of Apache Hadoop SecurityJoey Echeverria
 
Analyzing twitter data with hadoop
Analyzing twitter data with hadoopAnalyzing twitter data with hadoop
Analyzing twitter data with hadoopJoey Echeverria
 
Hadoop in three use cases
Hadoop in three use casesHadoop in three use cases
Hadoop in three use casesJoey Echeverria
 
Scratching your own itch
Scratching your own itchScratching your own itch
Scratching your own itchJoey Echeverria
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
Hadoop and h base in the real world
Hadoop and h base in the real worldHadoop and h base in the real world
Hadoop and h base in the real worldJoey Echeverria
 

Más de Joey Echeverria (11)

Debugging Apache Spark
Debugging Apache SparkDebugging Apache Spark
Debugging Apache Spark
 
Building production spark streaming applications
Building production spark streaming applicationsBuilding production spark streaming applications
Building production spark streaming applications
 
Streaming ETL for All
Streaming ETL for AllStreaming ETL for All
Streaming ETL for All
 
Embeddable data transformation for real time streams
Embeddable data transformation for real time streamsEmbeddable data transformation for real time streams
Embeddable data transformation for real time streams
 
The Future of Apache Hadoop Security
The Future of Apache Hadoop SecurityThe Future of Apache Hadoop Security
The Future of Apache Hadoop Security
 
Analyzing twitter data with hadoop
Analyzing twitter data with hadoopAnalyzing twitter data with hadoop
Analyzing twitter data with hadoop
 
Big data security
Big data securityBig data security
Big data security
 
Hadoop in three use cases
Hadoop in three use casesHadoop in three use cases
Hadoop in three use cases
 
Scratching your own itch
Scratching your own itchScratching your own itch
Scratching your own itch
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Hadoop and h base in the real world
Hadoop and h base in the real worldHadoop and h base in the real world
Hadoop and h base in the real world
 

Último

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Último (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

Apache Accumulo and Cloudera

  • 1. Apache  Accumulo  and  Cloudera   Hadoop-­‐DC,  July  2013   Joey  Echeverria  |  Director,  Federal  FTS   joey@cloudera.com  |  @fwiffo   ©2013  Cloudera,  Inc.  All  Rights  Reserved.   1
  • 2. Apache  Accumulo  and  Cloudera   HADOOP  101   2  
  • 3. OperaNng  Systems   •  Manage  and  schedule  machine  resources   •  CPU   •  RAM   •  Memory   •  Provide  abstracNons  and  APIs   •  Files  =  stream  of  bytes   •  Process  =  instrucNons  +  private  memory  space   3
  • 4. Distributed  OperaNng  System   •  Same  thing,  but  over  a  cluster  of  networked  servers   •  AddiNonal  concerns:   •  Inter-­‐process  and  inter-­‐machine  communicaNon   •  Data  locality   •  Data  availability   •  Data  processing  availability   4
  • 5. Hadoop   •  Defacto  Distributed  OperaNng  System   •  Apache  HDFS   •  Apache  MapReduce  and  Apache  YARN   5
  • 6. Ecosystem   6 Key  Value  Stores   High  Level  Batch  Languages   Low  Latency  SQL  Engine  Graph  Processing  
  • 8. CDH  History   8 CDH1     *HDFS   *MR   *Hive   *Pig   CDH2     *HDFS   *MR   *Hive   *Pig   CDH3     *HDFS   *MR   *Hive   *Pig   *Flume   *HBase   Hue   *Mahout   *Oozie   *Sqoop   *Whirr   *Zookeeper   *Avro   CDH4     *HDFS   *MR   *YARN   *Hive   *Pig   *Flume   *HBase   Hue   *Mahout   *Oozie   *Sqoop   *Whirr   *Zookeeper   *Avro   DataFu   HCatalog   Impala   *Solr   *BigTop   Sentry  
  • 9. Apache  Accumulo  and  Cloudera   ACCUMULO  101  AND  201   9  
  • 11. Accumulo  Data  Model   •  MulJ-­‐dimensional  sorted  map   row id -> [ family -> [ qualifier -> [ visibility -> [ timestamp -> value ] ] ] ] 11
  • 12. Accumulo  Storage  Model   •  key  -­‐>  value   •  key  =  <row  id><column><Nmestamp>   •  column  =  <family><qualifier><visibility>   12 Key   Value   Row  ID   Column   Timestamp   Family   Qualifier   Visibility  
  • 13. 13  
  • 14. Other  Concerns   •  Write-­‐ahead  log   •  Tablet  server  failure  handling   •  Versioning   •  Iterators   •  Cell-­‐level  security   14
  • 15. Apache  Accumulo  and  Cloudera   PROJECT  HISTORY   15  
  • 18. RelaNonship  to  Hadoop  Releases   •  1.3.x  -­‐>  Hadoop  0.20.2   •  1.4.x  -­‐>  Hadoop  0.20.2,  Hadoop  0.20.203   •  1.5.x  -­‐>  Hadoop  1.0.4,  Hadoop  2.0.4-­‐alpha   18
  • 19. Accumulo  and  Cloudera  Releases   •  Accumulo  1.3.x,  1.4.x,  and  1.5.x  all  work  with  CDH3   •  Accumulo  1.5.x  should  work  with  CDH4…   •  Limited  tesNng   19
  • 20. Apache  Accumulo  and  Cloudera   ANNOUNCEMENT   20  
  • 21. Apache  Accumulo  and  Cloudera   CLOUDERA  SUPPORT  OF  APACHE   ACCUMULO  ON  CDH4   21  
  • 22. Apache  Accumulo  and  Cloudera   DEMO   22  
  • 23. System  Logs   •  Id   •  Unique  id  for  an  acNon   •  Timestamp   •  Time  the  acNon  occured   •  Actor   •  User  or  system  performing  the  acNon   •  AcNon   •  The  acNon  taken   •  Object   •  The  object  of  the  acNon   •  Info   •  Free  form  informaNon  (e.g.  success/failure,  alribute  value,  etc.)   23
  • 24. AcNons   •  created_user   •  deleted_user   •  set_password   •  logged_in   •  logged_out   •  read   •  modified   24
  • 25. Roles   •  system   •  Any  user  on  the  system   •  admin   •  Administrators   •  audit   •  Auditors   25
  • 26. Accumulo  Data  Model   26 Key   Value   Row  ID   Column   Timestamp   Family   Qualifier   Visibility   <ts>-­‐<id>   <actor>   <acNon>:<object>           <info>  
  • 27. Apache  Accumulo  and  Cloudera   DEMO   27  
  • 28. Logs  Demo   28 Row  key   Column   Visibility   Value   201307241535-­‐1   root:created_user:sean   audit   succeeded   201307241535-­‐1   root:set_password:sean   admin&audit   password   201307241537-­‐2   sean:logged_in:host   system   succeeded   201307241538-­‐3     sean:read:/tmp/a   audit   succeeded   201307241539-­‐4     sean:modified:/tmp/a   audit   failed   201307241540-­‐5     sean:logged_out:host   system   succeeded  
  • 29. Apache  Accumulo  and  Cloudera   VERSIONS  REDUX   29  
  • 30. Recap   •  Accumulo  1.3.x,  1.4.x,  and  1.5.x  all  work  with  CDH3   •  Accumulo  1.5.x  should  work  with  CDH4   30
  • 31. Cloudera  Support   •  Naturally,  Cloudera  has  tested  and  packaged   Accumulo  1.5…   •  But  1.5  is  rather  bleeding  edge…   •  So,  we  instead  back  ported  Hadoop  2.0  support  from   1.5  onto  1.4.3   31
  • 32. Apache  Accumulo  and  Cloudera   ECOSYSTEM  INTEGRATION   32  
  • 35. Apache  Accumulo  and  Cloudera   DEMO   35  
  • 36. Apache  Accumulo  and  Cloudera   NEXT  STEPS   36  
  • 37. Recap   •  What’s  available  today   •  Beta  release  of  Accumulo  1.4.3  on  CDH4.3   •  Beta  release  of  Accumulo  1.4.3  Pig  integraNon   •  Semi-­‐private  beta   •  Contact  me  (joey@cloudera.com)  if  you’re  interested  in   trying  out  the  bits   37
  • 38. Future  Ideas  (not  promises  ;)   •  Cloudera  Manager  integraNon   •  Flume  integraNon   •  Sqoop  integraNon   •  Hive  integraNon   •  Impala  integraNon   38
  • 39. What  next?   •  Download  Hadoop!   •  CDH  available  at  www.cloudera.com   •  Cloudera  provides  pre-­‐loaded  VMs   •  hlps://ccp.cloudera.com/display/SUPPORT/Cloudera +QuickStart+VM   •  Reach  out  to  me  (joey@cloudera.com)  if  you  want  to   try  out  the  Accumulo  beta   •  InstrucNons  to  replicate  the  demos  pending  
  • 40. My  personal  preference   •  Cloudera  Manager   •  hlps://ccp.cloudera.com/display/SUPPORT/Downloads   •  Free  up  to  unlimited  nodes!  
  • 41. Shout  Out   •  Jason  Trost   •  @jason_trost   •  covert.io  blog  posts   •  hlp://www.covert.io/post/18414889381/accumulo-­‐ nutch-­‐and-­‐gora   •  hlp://www.covert.io/post/18605091231/accumulo-­‐and-­‐ pig  
  • 42. QuesNons?   •  Contact  me!   •  Joey  Echeverria   •  joey@cloudera.com   •  @fwiffo   •  We’re  hiring!  
  • 43. ©2013  Cloudera,  Inc.  All  Rights  Reserved.   43