SlideShare una empresa de Scribd logo
1 de 29
Introduction to Apache HBase Training
Jesse Anderson
Curriculum Developer and Instructor
Agenda
• Why Cloudera Training?
• Target Audience and Prerequisites
• Course Outline
• Short Presentation Based on Actual Course Material
- Using Scans to Access Data
• Q&A
32,000trained professionals by 2015
Rising demand for Big Data
and analytics experts but a
DEFICIENCY OF TALENT
will result in a shortfall of
Source: Accenture “Analytics in Action,“ March 2013.
55%
of the Fortune 100
have attended live
Cloudera training
Source: Fortune, “Fortune 500 “ and “Global 500,” May 2012.
100%
of the top 20 global
technology firms to
use Hadoop
Cloudera has trained
employees from
Big Data
professionals from
Cloudera Trains the Top Companies
Intro to Data
Science
Design schemas to minimize latency on massive data sets
Scale hundreds of thousands of operations per second
HBase
Training
Learn to code and write MapReduce programs for production
Master advanced API topics required for real-world data analysis
Implement recommenders and data experiments
Draw actionable insights from analysis of disparate data
Data Analyst
Training
Run full analyses natively on Big Data without BI software
Eliminate complexity to perform ad hoc queries in real time
Developer
Training
Learning Path: Developers
Data Analyst
Training
Implement massively distributed, columnar storage at scale
Enable random, real-time read/write access to all data
HBase
Training
Configure, install, and monitor clusters for optimal performance
Implement security measures and multi-user functionality
Vertically integrate basic analytics into data management
Transform and manipulate data to drive high-value utilization
Enterprise
Training
Use Cloudera Manager to speed deployment and scale the cluster
Learn which tools and techniques improve cluster performance
Administrator
Training
Learning Path: Administrators
1 Broadest Range of Courses
Developer, Admin, Analyst, HBase, Data Science
2
3
Most Experienced Instructors
Over 15,000 students trained since 2009
5 Widest Geographic Coverage
Most classes offered: 50 cities worldwide plus online
6 Most Relevant Platform & Community
CDH deployed more than all other distributions combined
7 Depth of Training Material
Hands-on labs and VMs support live instruction
Leader in Certification
Over 5,000 accredited Cloudera professionals
4 State of the Art Curriculum
Classes updated regularly as Hadoop evolves 8 Ongoing Learning
Video tutorials and e-learning complement training
Why Cloudera Training?
Cloudera is the best vendor evangelizing
the Big Data movement and is doing a
great service promoting Hadoop in the
industry. Developer training was a great
way to get started on my journey.
Cloudera Training for Apache HBase
About the Course
 This course was created for people in developer and operations roles,
including
–Developers
–DevOps
–Database Administrator
–Data Warehouse Engineer
–Administrators
 Also useful for others who want to access HBase
–Business Intelligence Developer
–ETL Developers
–Quality Assurance Engineers
Intended Audience
 Developers who want to learn details of MapReduce programming
–Recommend Cloudera Developer Training for Apache Hadoop
 System administrators who want to learn how to install/configure tools
–Recommend Cloudera Administrator Training for Apache Hadoop
Who Should Not Take this Course
 No prior knowledge of Hadoop is required
 What is required is an understanding of
–Basic end-user UNIX commands
 An optional understanding of
–Basic relational database concepts
–Basic knowledge of SQL
Course Prerequisites
SELECT id, first_name, last_name
FROM customers;
ORDER BY last_name;
$ mkdir /data
$ cd /data
$ rm /home/tomwheeler/salesreport.txt
During this course, you will learn:
 The core technologies of Apache HBase
 How HBase and HDFS work together
 How to work with the HBase shell, Java API, and Thrift API
 The HBase storage and cluster architecture
 The fundamentals of HBase administration
 Best practices for installing and configuring HBase
 Advanced features of the HBase API
 The importance of schema design in HBase
 How to work with HBase ecosystem projects
Course Objectives
 Hadoop Introduction
–Hands-On Exercise - Using HDFS
 Introduction to HBase
 HBase Concepts
–Hands-On Exercise - HBase Data Import
 The HBase Administration API
–Hands-On Exercise - Using the HBase Shell
 Accessing Data with the HBase API Part 1
–Hands-On Exercise - Data Access in the HBase Shell
 Accessing Data with the HBase API Part 2
–Hands-On Exercise - Using the Developer API
Course Outline
 Accessing Data with the HBase API Part 3
–Hands-On Exercise - Filters
 HBase Architecture Part 1
–Hands-On Exercise - Exploring HBase
 HBase Architecture Part 2
–Hands-On Exercise - Flushes and Compactions
 Installation and Configuration Part 1
 Installation and Configuration Part 2
–Hands-On Exercise - Administration
 Row Key Design in HBase
Course Outline (cont’d)
 Schema Design in HBase
–Hands-On Exercise - Detecting Hot Spots
 The HBase Ecosystem
–Hands-On Exercise - Hive and HBase
Course Outline (cont’d)
 A Scan can be used when:
–The exact row key is not known
–A group of rows needs to be accessed
 Scans can be bounded by a start and stop row key
–The start row key is included in the results
–The stop row is not included in the results and the Scan will exhaust its
data upon hitting the stop row key
 Scans can be limited to certain column families or column descriptors
Scans
 A scan without a start
and stop row will scan
the entire table
 With a start row of
"jordena" and an end
row of "turnerb"
–The scan will return
all rows starting at
"jordena" and not
include "turnerb"
Scanning
Row key Users Table
aaronsona fname: Aaron lname: Aaronson
harrise fname: Ernest lname: Harris
jordena fname: Adam lname: Jorden
laytonb fname: Bennie lname: Layton
millerb fname: Billie lname: Miller
nununezw fname: Willam lname: Nunez
rossw fname: William lname: Ross
sperberp fname: Phyllis lname: Sperber
turnerb fname: Brian lname: Turner
walkerm fname: Martin lname: Walker
zykowskiz fname: Zeph lname: Zykowski
 Retrieve a group of rows with scan
 General form:
 Examples:
Scanning Rows With scan in HBase Shell
hbase> scan 'tablename' [,options]
hbase> scan 'table1'
hbase> scan 'table1', {LIMIT => 10}
hbase> scan 'table1', {STARTROW => 'start',
STOPROW => 'stop'}
hbase> scan 'table1', {COLUMNS =>
['fam1:col1', 'fam2:col2']}
Scan Java API: Complete Code
Scan s = new Scan();
ResultScanner rs = table.getScanner(s);
for (Result r : rs) {
String rowKey = Bytes.toString(r.getRow());
byte[] b = r.getValue(FAMILY_BYTES, COLUMN_BYTES);
String user = Bytes.toString(b);
}
s.close();
Scan Java API: Scan and ResultScanner
Scan s = new Scan();
ResultScanner rs = table.getScanner(s);
for (Result r : rs) {
String rowKey = Bytes.toString(r.getRow());
byte[] b = r.getValue(FAMILY_BYTES, COLUMN_BYTES);
String user = Bytes.toString(b);
}
s.close();
The Scan object is created and will scan all rows. The scan
is executed on the table and a ResultScanner object is
returned.
Scan Java API: Iterating
Scan s = new Scan();
ResultScanner rs = table.getScanner(s);
for (Result r : rs) {
String rowKey = Bytes.toString(r.getRow());
byte[] b = r.getValue(FAMILY_BYTES, COLUMN_BYTES);
String user = Bytes.toString(b);
}
s.close();Using a for loop, you iterate through all Result objects
in the ResultScanner. Each Result can be used to get
the values.
Python Scan Code: Complete Code
scannerId = client.scannerOpen("tablename")
row = client.scannerGet(scannerId)
while row:
columnvalue = row.columns.get(columnwithcf).value
row = client.scannerGet(scannerId)
client.scannerClose(scannerId)
Python Scan Code: Open Scanner
scannerId = client.scannerOpen("tablename")
row = client.scannerGet(scannerId)
while row:
columnvalue = row.columns.get(columnwithcf).value
row = client.scannerGet(scannerId)
client.scannerClose(scannerId)
Call scannerOpen to create a scan object on the Thrift
server. This returns a scanner id that uniquely identifies the
scanner on the server.
Python Scan Code: Get the List
scannerId = client.scannerOpen("tablename")
row = client.scannerGet(scannerId)
while row:
columnvalue = row.columns.get(columnwithcf).value
row = client.scannerGet(scannerId)
client.scannerClose(scannerId)
The scannerGet method needs to be called with the
unique id. This returns a row of results.
Python Scan Code: Iterating Through
scannerId = client.scannerOpen("tablename")
row = client.scannerGet(scannerId)
while row:
columnvalue = row.columns.get(columnwithcf).value
row = client.scannerGet(scannerId)
client.scannerClose(scannerId)The while loop continues as long as the scanner returns a
new row. Columns must be addressed with column family,
":", and the column descriptor. row gets populated by
another call to scannerGet and the loop is repeated.
Python Scan Code: Closing the Scanner
scannerId = client.scannerOpen("tablename")
row = client.scannerGet(scannerId)
while row:
columnvalue = row.columns.get(columnwithcf).value
row = client.scannerGet(scannerId)
client.scannerClose(scannerId)
The scannerClose method call is very important. This
closes the Scan object on the Thrift server. Not calling this
method can leak Scan objects on the server.
 Scan results can be retrieved in batches to improve performance
–Performance will improve but memory usage will increase
 Java API:
 Python with Thrift:
Scanner Caching
Scan s = new Scan();
s.setCaching(20);
rowsArray = client.scannerGetList(scannerId, 10)
Introduction to Apache HBase Training

Más contenido relacionado

La actualidad más candente

Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce introGeoff Hendrey
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hari Shankar Sreekumar
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop EcosystemJ Singh
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopRan Ziv
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Uwe Printz
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...The Hive
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersAmal G Jose
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorialawesomesos
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDataWorks Summit
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentationMapR Technologies
 

La actualidad más candente (20)

Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
 
6.hive
6.hive6.hive
6.hive
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
 
Hadoop
HadoopHadoop
Hadoop
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 

Destacado

Deploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopDeploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopCloudera, Inc.
 
Hive and HiveQL - Module6
Hive and HiveQL - Module6Hive and HiveQL - Module6
Hive and HiveQL - Module6Rohit Agrawal
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessCloudera, Inc.
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache HiveHBaseCon
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera, Inc.
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureSkillspeed
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopCloudera, Inc.
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Cloudera, Inc.
 

Destacado (10)

Deploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopDeploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for Hadoop
 
Hive and HiveQL - Module6
Hive and HiveQL - Module6Hive and HiveQL - Module6
Hive and HiveQL - Module6
 
03 hive query language (hql)
03 hive query language (hql)03 hive query language (hql)
03 hive query language (hql)
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache Hive
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 

Similar a Introduction to Apache HBase Training

Apache hadoop-administrator-training
Apache hadoop-administrator-trainingApache hadoop-administrator-training
Apache hadoop-administrator-trainingKnowledgehut
 
Hadoop online training in india
Hadoop online training  in indiaHadoop online training  in india
Hadoop online training in indiaMadhu Trainer
 
Introduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsIntroduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsCloudera, Inc.
 
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AGOLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AGLucidworks
 
Hadoop course content
Hadoop course contentHadoop course content
Hadoop course contentRS Trainings
 
Conference 2014: Rajat Arya - Deployment with GraphLab Create
Conference 2014: Rajat Arya - Deployment with GraphLab Create Conference 2014: Rajat Arya - Deployment with GraphLab Create
Conference 2014: Rajat Arya - Deployment with GraphLab Create Turi, Inc.
 
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...mindscriptsseo
 
Hadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationHadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationVskills
 
ITB2016 - Building ColdFusion RESTFul Services
ITB2016 - Building ColdFusion RESTFul ServicesITB2016 - Building ColdFusion RESTFul Services
ITB2016 - Building ColdFusion RESTFul ServicesOrtus Solutions, Corp
 
Big data analytics_using_hadoop
Big data analytics_using_hadoopBig data analytics_using_hadoop
Big data analytics_using_hadoopKnowledgehut
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online trainingsrikanthhadoop
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightPaco Nathan
 
Learn How to Run Python on Redshift
Learn How to Run Python on RedshiftLearn How to Run Python on Redshift
Learn How to Run Python on RedshiftChartio
 
Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-trainingGeohedrick
 
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVMVoxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVMManuel Bernhardt
 
Orchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache MahoutOrchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache Mahoutaneeshabakharia
 

Similar a Introduction to Apache HBase Training (20)

Apache hadoop-administrator-training
Apache hadoop-administrator-trainingApache hadoop-administrator-training
Apache hadoop-administrator-training
 
Hadoop online training in india
Hadoop online training  in indiaHadoop online training  in india
Hadoop online training in india
 
Introduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsIntroduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data Applications
 
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AGOLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
 
Hadoop course content
Hadoop course contentHadoop course content
Hadoop course content
 
Conference 2014: Rajat Arya - Deployment with GraphLab Create
Conference 2014: Rajat Arya - Deployment with GraphLab Create Conference 2014: Rajat Arya - Deployment with GraphLab Create
Conference 2014: Rajat Arya - Deployment with GraphLab Create
 
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
 
Hadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationHadoop and Mapreduce Certification
Hadoop and Mapreduce Certification
 
Aws r
Aws rAws r
Aws r
 
ITB2016 - Building ColdFusion RESTFul Services
ITB2016 - Building ColdFusion RESTFul ServicesITB2016 - Building ColdFusion RESTFul Services
ITB2016 - Building ColdFusion RESTFul Services
 
Big data analytics_using_hadoop
Big data analytics_using_hadoopBig data analytics_using_hadoop
Big data analytics_using_hadoop
 
Hadoop_Admin_eVenkat
Hadoop_Admin_eVenkatHadoop_Admin_eVenkat
Hadoop_Admin_eVenkat
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online training
 
Apache HAWQ Architecture
Apache HAWQ ArchitectureApache HAWQ Architecture
Apache HAWQ Architecture
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
 
Beginning hive and_apache_pig
Beginning hive and_apache_pigBeginning hive and_apache_pig
Beginning hive and_apache_pig
 
Learn How to Run Python on Redshift
Learn How to Run Python on RedshiftLearn How to Run Python on Redshift
Learn How to Run Python on Redshift
 
Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-training
 
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVMVoxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
 
Orchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache MahoutOrchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache Mahout
 

Más de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Más de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

Introduction to Apache HBase Training

  • 1. Introduction to Apache HBase Training Jesse Anderson Curriculum Developer and Instructor
  • 2. Agenda • Why Cloudera Training? • Target Audience and Prerequisites • Course Outline • Short Presentation Based on Actual Course Material - Using Scans to Access Data • Q&A
  • 3. 32,000trained professionals by 2015 Rising demand for Big Data and analytics experts but a DEFICIENCY OF TALENT will result in a shortfall of Source: Accenture “Analytics in Action,“ March 2013.
  • 4. 55% of the Fortune 100 have attended live Cloudera training Source: Fortune, “Fortune 500 “ and “Global 500,” May 2012. 100% of the top 20 global technology firms to use Hadoop Cloudera has trained employees from Big Data professionals from Cloudera Trains the Top Companies
  • 5. Intro to Data Science Design schemas to minimize latency on massive data sets Scale hundreds of thousands of operations per second HBase Training Learn to code and write MapReduce programs for production Master advanced API topics required for real-world data analysis Implement recommenders and data experiments Draw actionable insights from analysis of disparate data Data Analyst Training Run full analyses natively on Big Data without BI software Eliminate complexity to perform ad hoc queries in real time Developer Training Learning Path: Developers
  • 6. Data Analyst Training Implement massively distributed, columnar storage at scale Enable random, real-time read/write access to all data HBase Training Configure, install, and monitor clusters for optimal performance Implement security measures and multi-user functionality Vertically integrate basic analytics into data management Transform and manipulate data to drive high-value utilization Enterprise Training Use Cloudera Manager to speed deployment and scale the cluster Learn which tools and techniques improve cluster performance Administrator Training Learning Path: Administrators
  • 7. 1 Broadest Range of Courses Developer, Admin, Analyst, HBase, Data Science 2 3 Most Experienced Instructors Over 15,000 students trained since 2009 5 Widest Geographic Coverage Most classes offered: 50 cities worldwide plus online 6 Most Relevant Platform & Community CDH deployed more than all other distributions combined 7 Depth of Training Material Hands-on labs and VMs support live instruction Leader in Certification Over 5,000 accredited Cloudera professionals 4 State of the Art Curriculum Classes updated regularly as Hadoop evolves 8 Ongoing Learning Video tutorials and e-learning complement training Why Cloudera Training?
  • 8. Cloudera is the best vendor evangelizing the Big Data movement and is doing a great service promoting Hadoop in the industry. Developer training was a great way to get started on my journey.
  • 9. Cloudera Training for Apache HBase About the Course
  • 10.  This course was created for people in developer and operations roles, including –Developers –DevOps –Database Administrator –Data Warehouse Engineer –Administrators  Also useful for others who want to access HBase –Business Intelligence Developer –ETL Developers –Quality Assurance Engineers Intended Audience
  • 11.  Developers who want to learn details of MapReduce programming –Recommend Cloudera Developer Training for Apache Hadoop  System administrators who want to learn how to install/configure tools –Recommend Cloudera Administrator Training for Apache Hadoop Who Should Not Take this Course
  • 12.  No prior knowledge of Hadoop is required  What is required is an understanding of –Basic end-user UNIX commands  An optional understanding of –Basic relational database concepts –Basic knowledge of SQL Course Prerequisites SELECT id, first_name, last_name FROM customers; ORDER BY last_name; $ mkdir /data $ cd /data $ rm /home/tomwheeler/salesreport.txt
  • 13. During this course, you will learn:  The core technologies of Apache HBase  How HBase and HDFS work together  How to work with the HBase shell, Java API, and Thrift API  The HBase storage and cluster architecture  The fundamentals of HBase administration  Best practices for installing and configuring HBase  Advanced features of the HBase API  The importance of schema design in HBase  How to work with HBase ecosystem projects Course Objectives
  • 14.  Hadoop Introduction –Hands-On Exercise - Using HDFS  Introduction to HBase  HBase Concepts –Hands-On Exercise - HBase Data Import  The HBase Administration API –Hands-On Exercise - Using the HBase Shell  Accessing Data with the HBase API Part 1 –Hands-On Exercise - Data Access in the HBase Shell  Accessing Data with the HBase API Part 2 –Hands-On Exercise - Using the Developer API Course Outline
  • 15.  Accessing Data with the HBase API Part 3 –Hands-On Exercise - Filters  HBase Architecture Part 1 –Hands-On Exercise - Exploring HBase  HBase Architecture Part 2 –Hands-On Exercise - Flushes and Compactions  Installation and Configuration Part 1  Installation and Configuration Part 2 –Hands-On Exercise - Administration  Row Key Design in HBase Course Outline (cont’d)
  • 16.  Schema Design in HBase –Hands-On Exercise - Detecting Hot Spots  The HBase Ecosystem –Hands-On Exercise - Hive and HBase Course Outline (cont’d)
  • 17.  A Scan can be used when: –The exact row key is not known –A group of rows needs to be accessed  Scans can be bounded by a start and stop row key –The start row key is included in the results –The stop row is not included in the results and the Scan will exhaust its data upon hitting the stop row key  Scans can be limited to certain column families or column descriptors Scans
  • 18.  A scan without a start and stop row will scan the entire table  With a start row of "jordena" and an end row of "turnerb" –The scan will return all rows starting at "jordena" and not include "turnerb" Scanning Row key Users Table aaronsona fname: Aaron lname: Aaronson harrise fname: Ernest lname: Harris jordena fname: Adam lname: Jorden laytonb fname: Bennie lname: Layton millerb fname: Billie lname: Miller nununezw fname: Willam lname: Nunez rossw fname: William lname: Ross sperberp fname: Phyllis lname: Sperber turnerb fname: Brian lname: Turner walkerm fname: Martin lname: Walker zykowskiz fname: Zeph lname: Zykowski
  • 19.  Retrieve a group of rows with scan  General form:  Examples: Scanning Rows With scan in HBase Shell hbase> scan 'tablename' [,options] hbase> scan 'table1' hbase> scan 'table1', {LIMIT => 10} hbase> scan 'table1', {STARTROW => 'start', STOPROW => 'stop'} hbase> scan 'table1', {COLUMNS => ['fam1:col1', 'fam2:col2']}
  • 20. Scan Java API: Complete Code Scan s = new Scan(); ResultScanner rs = table.getScanner(s); for (Result r : rs) { String rowKey = Bytes.toString(r.getRow()); byte[] b = r.getValue(FAMILY_BYTES, COLUMN_BYTES); String user = Bytes.toString(b); } s.close();
  • 21. Scan Java API: Scan and ResultScanner Scan s = new Scan(); ResultScanner rs = table.getScanner(s); for (Result r : rs) { String rowKey = Bytes.toString(r.getRow()); byte[] b = r.getValue(FAMILY_BYTES, COLUMN_BYTES); String user = Bytes.toString(b); } s.close(); The Scan object is created and will scan all rows. The scan is executed on the table and a ResultScanner object is returned.
  • 22. Scan Java API: Iterating Scan s = new Scan(); ResultScanner rs = table.getScanner(s); for (Result r : rs) { String rowKey = Bytes.toString(r.getRow()); byte[] b = r.getValue(FAMILY_BYTES, COLUMN_BYTES); String user = Bytes.toString(b); } s.close();Using a for loop, you iterate through all Result objects in the ResultScanner. Each Result can be used to get the values.
  • 23. Python Scan Code: Complete Code scannerId = client.scannerOpen("tablename") row = client.scannerGet(scannerId) while row: columnvalue = row.columns.get(columnwithcf).value row = client.scannerGet(scannerId) client.scannerClose(scannerId)
  • 24. Python Scan Code: Open Scanner scannerId = client.scannerOpen("tablename") row = client.scannerGet(scannerId) while row: columnvalue = row.columns.get(columnwithcf).value row = client.scannerGet(scannerId) client.scannerClose(scannerId) Call scannerOpen to create a scan object on the Thrift server. This returns a scanner id that uniquely identifies the scanner on the server.
  • 25. Python Scan Code: Get the List scannerId = client.scannerOpen("tablename") row = client.scannerGet(scannerId) while row: columnvalue = row.columns.get(columnwithcf).value row = client.scannerGet(scannerId) client.scannerClose(scannerId) The scannerGet method needs to be called with the unique id. This returns a row of results.
  • 26. Python Scan Code: Iterating Through scannerId = client.scannerOpen("tablename") row = client.scannerGet(scannerId) while row: columnvalue = row.columns.get(columnwithcf).value row = client.scannerGet(scannerId) client.scannerClose(scannerId)The while loop continues as long as the scanner returns a new row. Columns must be addressed with column family, ":", and the column descriptor. row gets populated by another call to scannerGet and the loop is repeated.
  • 27. Python Scan Code: Closing the Scanner scannerId = client.scannerOpen("tablename") row = client.scannerGet(scannerId) while row: columnvalue = row.columns.get(columnwithcf).value row = client.scannerGet(scannerId) client.scannerClose(scannerId) The scannerClose method call is very important. This closes the Scan object on the Thrift server. Not calling this method can leak Scan objects on the server.
  • 28.  Scan results can be retrieved in batches to improve performance –Performance will improve but memory usage will increase  Java API:  Python with Thrift: Scanner Caching Scan s = new Scan(); s.setCaching(20); rowsArray = client.scannerGetList(scannerId, 10)

Notas del editor

  1. scan 'table1'Scans the entire tablescan 'table1', {LIMIT => 10}Scans the first 10 rows in the tablescan 'table1', {STARTROW => 'start', STOPROW => 'stop'} Scan between the start and stop rowsscan 'table1', {COLUMNS => ['fam1:col1', 'fam2:col2']} Scans the entire table for just those 2 column familities
  2. The full code listing. A virtual line by line discussion follows.
  3. Note that for the Python code, the row comes back as an array.