SlideShare una empresa de Scribd logo
1 de 43
Descargar para leer sin conexión
Scott Miao 2012/7/12
HBase Admin API & Available Clients
1
Agenda
 Course Credit
 HBaseAdmin APIs
 HTableDescriptor
 HColumnDescriptor
 HBaseAdmin
 Available Clients
 Interactive Clients
 Batch Clients
 Shell
 Web-based UI
2
Course Credit
 Show up, 30 scores
 Ask question, each question earns 5 scores
 Hands-on, 40 scores
 70 scores will pass this course
 Each course credit will be calculated once for each course
finished
 The course credit will be sent to you and your supervisor by
mail
3
Hadoop RPC framework
 Writable interface
 void write(DataOutput out) throws IOException;
 Serialize the Object data and send to remote
 void readFields(DataInput in) throws IOException;
 New an instance and deserialize the remote-data for subsequent
operations
 Parameterless Constructor
 Hadoop will instantiate a empty Object
 Call the readFields method to deserialize the remote data
4
HTableDescriptor
 Constructor
 HTableDescriptor();
 HTableDescriptor(String name);
 HTableDescriptor(byte[] name);
 HTableDescriptor(HTableDescriptor desc);
 ch05/admin.CreateTableExample
 Can be used to fine-tune the table’s performance
5
HTableDescriptor – Logical V.S. physical views
6
HTableDescriptor - Properties
Property Description
Name SpecifyTable Name
byte[] getName();
String getNameAsString();
void setName(byte[] name);
Column Families Specify column family
void addFamily(HColumnDescriptor family);
boolean hasFamily(byte[] c);
HColumnDescriptor[] getColumnFamilies();
HColumnDescriptor getFamily(byte[]column);
HColumnDescriptor removeFamily(byte[] column);
Maximum File Size Specify maximum size a region within the table can grow to
long getMaxFileSize();
void setMaxFileSize(long maxFileSize);
It really about the maximum size of each store, the better name would be
maxStoreSize; By default, it’s size is 256 MB, a larger value may be required
when you have a lot of data.7
HTableDescriptor - Properties
Property Description
Read-only By default, all tables are writable, If the flag is set to true, you can only read
from the table and not modify it at all.
boolean isReadOnly();
void setReadOnly(boolean readOnly);
Memstore flush size An in-memory store to buffer values before writing them to disk as a new
storage file. default 64 MB.
long getMemStoreFlushSize();
void setMemStoreFlushSize(long memstoreFlushSize);
Deferred log flush Save write-ahead-log entries to disk, by default, set to false.
synchronized boolean isDeferredLogFlush();
void setDeferredLogFlush(boolean isDeferredLogFlush);
Miscellaneous options Stored with the table definition and can be retrieved if necessary.
byte[] getValue(byte[] key)
String getValue(String key)
Map<ImmutableBytesWritable,ImmutableBytesWritable> getValues()
void setValue(byte[] key,byte[] value)
void setValue(String key,String value)
void remove(byte[] key)
8
HColumnDescriptor
 A more appropriate name would be HColumnFamilyDescriptor
 The family name must be printable
 You cannot simply rename them later
 Constructor
 HColumnDescriptor();
 HColumnDescriptor(String familyName),
 HColumnDescriptor(byte[] familyName);
 HColumnDescriptor(HColumnDescriptor desc);
 HColumnDescriptor(byte[] familyName,int maxVersions,String compression,
 boolean inMemory,boolean blockCacheEnabled,int timeToLive,
 String bloomFilter);
 HColumnDescriptor(byte [] familyName,int maxVersions,String compression,
 boolean inMemory,boolean blockCacheEnabled,int blocksize,
 int timeToLive,String bloomFilter,int scope);
9
HColumnDescriptor –
Column families V.S. store files
10
Property Description
Name Specify column family name.A column family cannot be renamed, create a new family
with the desired name and copy the data over, using theAPI
byte[] getName();
String getNameAsString();
Maximum
versions
Predicate deletion. How many versions of each value you want to keep. Default value is 3
int getMaxVersions();
void setMaxVersions(int maxVersions);
Compression HBase has pluggable compression algorithm support. Default value is NONE.
HColumnDescriptor – Properties
11
HColumnDescriptor – Properties
Property Description
Block size All stored files are divided into smaller blocks that are loaded during a get or scan
operation, default value is 64KB.
synchronized int getBlocksize();
void setBlocksize(int s);
HDFS is using a block size of—by default—64 MB
Block cache HBase reads entire blocks of data for efficient I/O usage and retains these blocks
in an in-memory cache so that subsequent reads do not need any disk operation.The
default is true.
boolean isBlockCacheEnabled();
void setBlockCacheEnabled(boolean blockCacheEnabled);
if your use case only ever has sequential reads on a particular column family, it is
advisable that you disable it.
Time-to-live (TTL) Predicate deletion.A threshold based on the timestamp of a value and the internal
housekeeping is checking automatically if a value exceeds itsTTL.
int getTimeToLive();
void setTimeToLive(int timeToLive);
By default, keeping the values forever (set to Integer.MAX_VALUE)
12
HColumnDescriptor – Properties
Property Description
In-memory lock cache and how HBase is using it to keep entire blocks of data in memory for
efficient sequential access to values.The in-memory flag defaults to false.
boolean isInMemory();
void setInMemory(boolean inMemory);
is good for small column families with few values, such as the passwords of a user
table, so that logins can be processed very fast.
Bloom filter Allowing you to improve lookup times given you have a specific access pattern.
Since they add overhead in terms of storage and memory, they are turned off by
default.
Replication scope It enables you to have multiple clusters that ship local updates across the network so
that they are applied to the remote copies. By default is 0.
13
HBaseAdmin
 Just like a DDL in RDBMSs
 Create tables with specific column families
 Check for table existence
 Alter table and column family definitions
 Drop tables
 And more…
14
HBaseAdmin – Basic Operations
 boolean isMasterRunning()
 HConnection getConnection()
 Configuration getConfiguration()
 close()
15
HBaseAdmin – Table Operations
 Table-related admin.API
 They are asynchronous in nature
 createTable() V.S. createTableAsync(), etc
 CreateTable
 ch05/admin.CreateTableExample
 ch05/admin.CreateTableWithRegionsExample
 A numRegions that is at least 3: otherwise, the call will return with an
exception
 This is to ensure that you end up with at least a minimum set of regions
16
HBaseAdmin – Table Operations
 DoesTable exist
 ch05/admin.ListTablesExample
 You should be using existing table names
 Otherwise, org.apache.hadoop.hbase.TableNotFoundException will be thrown
 DeleteTable
 ch05/admin.TableOperationsExample
 Disabling a table can potentially take a very long time, up to several
minutes
 Depending on how much data is residual in the server’s memory and
not yet persisted to disk
 Undeploying a region requires all the data to be written to disk first
 isTableAvailable() V.S. isTableEnabled()/isTableDisabled()
17
HBaseAdmin – Table Operations
 ModifyTable
 ch05/admin.ModifyTableExample
 HTableDescriptor.equals()
 Compares the current with the specified instance
 Returns true if they match in all properties
 Also including the contained column families and their respective settings
18
HBaseAdmin – Schema Operations
 Besides using the modifyTable() call, there are dedicated
methods provided by the HBaseAdmin
 Make sure the table to be modified is disabled first
 All of these calls are asynchronous
 void addColumn(String tableName,HColumnDescriptor column)
 void addColumn(byte[] tableName,HColumnDescriptor column)
 void deleteColumn(String tableName,String columnName)
 void deleteColumn(byte[] tableName,byte[] columnName)
 void modifyColumn(String tableName,HColumnDescriptor descriptor)
 void modifyColumn(byte[] tableName,HColumnDescriptor descriptor)
19
HBaseAdmin – Cluster Operations
Methods in HBaseAdmin Class Description
• static void
checkHBaseAvailable(Configuration
conf)
• ClusterStatus getClusterStatus()
• Client application can com-municate with the remote
HBase cluster, either silently succeeds, or throws said error
• Retrieve an instance of the ClusterStatus class,
containing detailed information about the cluster status
• void closeRegion(String regionname,
String hostAndPort)
• void closeRegion(byte[] regionname,
String hostAndPort)
Close regions that have previously been deployed to region
servers. Does bypass any master notification, the region is
directly closed by the region server, unseen by the master
node.
• void flush(String
tableNameOrRegionName)
• void flush(byte[]
tableNameOrRegionName)
Call the MemStore instances of the region or table, to flush
the cached modification data into disk. Or the data would be
written by hitting the memstore flush size.
For advanced users, so please check theseAPI in the document and handle with care
20
HBaseAdmin – Cluster Operations
Methods in HBaseAdmin
Class
Description
• void compact(String
tableNameOrRegionName)
• void compact(byte[]
tableNameOrRegionName)
Minor-compaction, compactions can potentially take a long
time to complete. It is executed in the background by the
server hosting the named region, or by all servers hosting
any region of the given table
• void majorCompact(String
tableNameOrRegionName)
• void majorCompact(byte[]
tableNameOrRegionName)
Major-compaction
• void split(String
tableNameOrRegionName)
• void split(byte[]
tableNameOrRegionName)
• …
These calls allows you to split a specific region, or table
21
HBaseAdmin – Cluster Operations
Methods in HBaseAdmin
Class
Description
• void assign(byte[] regionName,
boolean force)
• void unassign(byte[]
regionName,boolean force)
A client requires a region to be deployed or undeployed from
the region servers, it can invoke these calls.
• void move(byte[]
encodedRegionName,byte[]
destServerName)
Move a region from its current region server to a new one.
The destServerName parameter can be set to null to pick a new
server at random.
• boolean balanceSwitch(boolean
b)
• boolean balancer()
• Allows you to switch the region balancer on or off.
• A call to balancer() will start the process of moving regions
• from the servers, with more deployed to those with less
deployed regions.
• void shutdown()
• void stopMaster()
• void stopRegionServer(String
hostnamePort)
• Shut down the entire cluster
• Stop the master server
• Stop a particular region server only
• Once invoked, the affected servers will be stopped, that is,
there is no delay nor a way to revert the process22
HBaseAdmin –
Cluster Status Information
 You can get more details info. about your HBase cluster from
HBaseAdmin.getClusterStatus()
 Related Classes
 ClusterStatus
 ServerName => HServerInfo
 HServerLoad
 RegionLoad
 ch05/admin.ClusterStatusExample
23
Available Clients
 HBase comes with a variety of clients that can be used from
various programming languages
 Interactive Clients
 Native JavaAPI
 REST
 Thrift
 Avro
 Batch Clients
 MapReduce
 Hive
 Pig
 Shell
 Web-based UI
24
Available Clients
 Interactive Clients
 Native JavaAPI
 REST
 Thrift
 Avro
 Batch Clients
 MapReduce
 Hive
 Pig
 Shell
 Web-based UI
We’ve already done
25
Batch Clients – MapReduce framework
 HDFS:A distributed filesystem
 MapReduce:A distributedAlgorithm
26
Batch Clients - MapReduce framework
27
Batch Clients - MapReduce
 InputFormat and TableInputFormat
28
Batch Clients - MapReduce
 Mapper and TableMapper
29
Batch Clients - MapReduce
 Reducer and TableReducer
30
Batch Clients - MapReduce
 OutputFormat and TableOutputFomrat
31
Batch Clients - MapReduce
 Sample
 ch07/mapreduce.Driver
 How to run
//in root account
 In hbase shell
 create‘testtable_mr’,‘data’
//in hbase-user account
 cd ${GIT_HOME}/hbase-training/002/projects/hbase-book/ch07
 Hadoop fs –copyFromLocal
 hadoop fs -copyFromLocal test-data.txt /tmp
 hadoop jar target/hbase-book-ch07-1.0.jar ImportFromFile -t testtable -i
/tmp/test-data.txt -c data:json
 How to use
 hadoop jar target/hbase-book-ch07-1.0.jar //will show usage
32
 Apache Pig project
 A platform to analyze large amounts of data
 It has its own high-level query language, called Pig Latin
 uses an imperative programming style to formulate the steps
involved in transforming the input data to the final output
 Opposite of Hive’s declarative approach to emulate SQL (HiveQL)
 Combined with the power of Hadoop and the MapReduce
framework
Batch Clients - Pig
33
Batch Clients – Pig Latin Sample
--Load data from a file and write to HBase
raw = LOAD 'tutorial/data/excite-small.log' USING PigStorage('t') 
AS (user, time, query);
T = FOREACH raw GENERATE 
CONCAT(CONCAT(user, 'u0000'), time), query;
STORET INTO 'excite' USING 
org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfam1:query');
--Load records which just been written from HBase
R = LOAD 'excite' USING 
org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfam1:query', 
'-loadKey')AS (key: chararray, query: chararray);
34
Shell
 We already used on course #1
 hbase shell
 The majority of commands have a direct match with a
method provided by either the client or administrative API
 Grouped into five different categories, representing their
semantic relationships
35
Shell - General
36
Shell – Data definition
37
Shell – Data manipulation
38
Shell – Tools
39
Shell – Replication
40
Web-based UI
 Master UI (http://${your_host}:8110/master.jsp)
 Main page
 UserTable page
 Zookeeper page
 Region Server UI
 Shared pages
 Local logs
 Thread Dump
 Log level
41
呼~終於完了…Orz
42
43

Más contenido relacionado

La actualidad más candente

How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
PostgreSQL-Consulting
 
Reduce Resource Consumption & Clone in Seconds your Oracle Virtual Environmen...
Reduce Resource Consumption & Clone in Seconds your Oracle Virtual Environmen...Reduce Resource Consumption & Clone in Seconds your Oracle Virtual Environmen...
Reduce Resource Consumption & Clone in Seconds your Oracle Virtual Environmen...
BertrandDrouvot
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres Monitoring
Denish Patel
 

La actualidad más candente (20)

PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor Introduction
 
HBase at Xiaomi
HBase at XiaomiHBase at Xiaomi
HBase at Xiaomi
 
Cassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsCassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requests
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
 
The Essential postgresql.conf
The Essential postgresql.confThe Essential postgresql.conf
The Essential postgresql.conf
 
Oracle: Binding versus caging
Oracle: Binding versus cagingOracle: Binding versus caging
Oracle: Binding versus caging
 
Reduce Resource Consumption & Clone in Seconds your Oracle Virtual Environmen...
Reduce Resource Consumption & Clone in Seconds your Oracle Virtual Environmen...Reduce Resource Consumption & Clone in Seconds your Oracle Virtual Environmen...
Reduce Resource Consumption & Clone in Seconds your Oracle Virtual Environmen...
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres Monitoring
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
 
Hbase 89 fb online configuration
Hbase 89 fb online configurationHbase 89 fb online configuration
Hbase 89 fb online configuration
 
Postgresql Database Administration Basic - Day1
Postgresql  Database Administration Basic  - Day1Postgresql  Database Administration Basic  - Day1
Postgresql Database Administration Basic - Day1
 
PGPool-II Load testing
PGPool-II Load testingPGPool-II Load testing
PGPool-II Load testing
 
PostgreSQL Replication Tutorial
PostgreSQL Replication TutorialPostgreSQL Replication Tutorial
PostgreSQL Replication Tutorial
 
Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4
 
Streaming replication in PostgreSQL
Streaming replication in PostgreSQLStreaming replication in PostgreSQL
Streaming replication in PostgreSQL
 
Technical Overview of Apache Drill by Jacques Nadeau
Technical Overview of Apache Drill by Jacques NadeauTechnical Overview of Apache Drill by Jacques Nadeau
Technical Overview of Apache Drill by Jacques Nadeau
 
MapR Tutorial Series
MapR Tutorial SeriesMapR Tutorial Series
MapR Tutorial Series
 
PostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major FeaturesPostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major Features
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 

Similar a 003 admin featuresandclients

Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Yahoo Developer Network
 

Similar a 003 admin featuresandclients (20)

Hypertable Nosql
Hypertable NosqlHypertable Nosql
Hypertable Nosql
 
Hypertable
HypertableHypertable
Hypertable
 
Cassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathCassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write path
 
Csql Cache Presentation
Csql Cache PresentationCsql Cache Presentation
Csql Cache Presentation
 
memory.ppt
memory.pptmemory.ppt
memory.ppt
 
Local data storage for mobile apps
Local data storage for mobile appsLocal data storage for mobile apps
Local data storage for mobile apps
 
Hypertable Berlin Buzzwords
Hypertable Berlin BuzzwordsHypertable Berlin Buzzwords
Hypertable Berlin Buzzwords
 
SO-Memoria.pdf
SO-Memoria.pdfSO-Memoria.pdf
SO-Memoria.pdf
 
SO-Memoria.pdf
SO-Memoria.pdfSO-Memoria.pdf
SO-Memoria.pdf
 
Hypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comHypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.com
 
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
 
8 tune tusc
8 tune tusc8 tune tusc
8 tune tusc
 
Linux memory
Linux memoryLinux memory
Linux memory
 
Hbase
HbaseHbase
Hbase
 
Ssis partitioning and best practices
Ssis partitioning and best practicesSsis partitioning and best practices
Ssis partitioning and best practices
 
Exadata - BULK DATA LOAD Testing on Database Machine
Exadata - BULK DATA LOAD Testing on Database Machine Exadata - BULK DATA LOAD Testing on Database Machine
Exadata - BULK DATA LOAD Testing on Database Machine
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
SQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPSQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTP
 
Performance Tuning
Performance TuningPerformance Tuning
Performance Tuning
 
NodeJs Modules.pdf
NodeJs Modules.pdfNodeJs Modules.pdf
NodeJs Modules.pdf
 

Más de Scott Miao

004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduse
Scott Miao
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
Scott Miao
 
20121022 tm hbasecanarytool
20121022 tm hbasecanarytool20121022 tm hbasecanarytool
20121022 tm hbasecanarytool
Scott Miao
 

Más de Scott Miao (9)

My thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
My thoughts for - Building CI/CD Pipelines for Serverless Applications sharingMy thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
My thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
 
analytic engine - a common big data computation service on the aws
analytic engine - a common big data computation service on the awsanalytic engine - a common big data computation service on the aws
analytic engine - a common big data computation service on the aws
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter Migration
 
Attack on graph
Attack on graphAttack on graph
Attack on graph
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduse
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
 
20121022 tm hbasecanarytool
20121022 tm hbasecanarytool20121022 tm hbasecanarytool
20121022 tm hbasecanarytool
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

003 admin featuresandclients

  • 1. Scott Miao 2012/7/12 HBase Admin API & Available Clients 1
  • 2. Agenda  Course Credit  HBaseAdmin APIs  HTableDescriptor  HColumnDescriptor  HBaseAdmin  Available Clients  Interactive Clients  Batch Clients  Shell  Web-based UI 2
  • 3. Course Credit  Show up, 30 scores  Ask question, each question earns 5 scores  Hands-on, 40 scores  70 scores will pass this course  Each course credit will be calculated once for each course finished  The course credit will be sent to you and your supervisor by mail 3
  • 4. Hadoop RPC framework  Writable interface  void write(DataOutput out) throws IOException;  Serialize the Object data and send to remote  void readFields(DataInput in) throws IOException;  New an instance and deserialize the remote-data for subsequent operations  Parameterless Constructor  Hadoop will instantiate a empty Object  Call the readFields method to deserialize the remote data 4
  • 5. HTableDescriptor  Constructor  HTableDescriptor();  HTableDescriptor(String name);  HTableDescriptor(byte[] name);  HTableDescriptor(HTableDescriptor desc);  ch05/admin.CreateTableExample  Can be used to fine-tune the table’s performance 5
  • 6. HTableDescriptor – Logical V.S. physical views 6
  • 7. HTableDescriptor - Properties Property Description Name SpecifyTable Name byte[] getName(); String getNameAsString(); void setName(byte[] name); Column Families Specify column family void addFamily(HColumnDescriptor family); boolean hasFamily(byte[] c); HColumnDescriptor[] getColumnFamilies(); HColumnDescriptor getFamily(byte[]column); HColumnDescriptor removeFamily(byte[] column); Maximum File Size Specify maximum size a region within the table can grow to long getMaxFileSize(); void setMaxFileSize(long maxFileSize); It really about the maximum size of each store, the better name would be maxStoreSize; By default, it’s size is 256 MB, a larger value may be required when you have a lot of data.7
  • 8. HTableDescriptor - Properties Property Description Read-only By default, all tables are writable, If the flag is set to true, you can only read from the table and not modify it at all. boolean isReadOnly(); void setReadOnly(boolean readOnly); Memstore flush size An in-memory store to buffer values before writing them to disk as a new storage file. default 64 MB. long getMemStoreFlushSize(); void setMemStoreFlushSize(long memstoreFlushSize); Deferred log flush Save write-ahead-log entries to disk, by default, set to false. synchronized boolean isDeferredLogFlush(); void setDeferredLogFlush(boolean isDeferredLogFlush); Miscellaneous options Stored with the table definition and can be retrieved if necessary. byte[] getValue(byte[] key) String getValue(String key) Map<ImmutableBytesWritable,ImmutableBytesWritable> getValues() void setValue(byte[] key,byte[] value) void setValue(String key,String value) void remove(byte[] key) 8
  • 9. HColumnDescriptor  A more appropriate name would be HColumnFamilyDescriptor  The family name must be printable  You cannot simply rename them later  Constructor  HColumnDescriptor();  HColumnDescriptor(String familyName),  HColumnDescriptor(byte[] familyName);  HColumnDescriptor(HColumnDescriptor desc);  HColumnDescriptor(byte[] familyName,int maxVersions,String compression,  boolean inMemory,boolean blockCacheEnabled,int timeToLive,  String bloomFilter);  HColumnDescriptor(byte [] familyName,int maxVersions,String compression,  boolean inMemory,boolean blockCacheEnabled,int blocksize,  int timeToLive,String bloomFilter,int scope); 9
  • 11. Property Description Name Specify column family name.A column family cannot be renamed, create a new family with the desired name and copy the data over, using theAPI byte[] getName(); String getNameAsString(); Maximum versions Predicate deletion. How many versions of each value you want to keep. Default value is 3 int getMaxVersions(); void setMaxVersions(int maxVersions); Compression HBase has pluggable compression algorithm support. Default value is NONE. HColumnDescriptor – Properties 11
  • 12. HColumnDescriptor – Properties Property Description Block size All stored files are divided into smaller blocks that are loaded during a get or scan operation, default value is 64KB. synchronized int getBlocksize(); void setBlocksize(int s); HDFS is using a block size of—by default—64 MB Block cache HBase reads entire blocks of data for efficient I/O usage and retains these blocks in an in-memory cache so that subsequent reads do not need any disk operation.The default is true. boolean isBlockCacheEnabled(); void setBlockCacheEnabled(boolean blockCacheEnabled); if your use case only ever has sequential reads on a particular column family, it is advisable that you disable it. Time-to-live (TTL) Predicate deletion.A threshold based on the timestamp of a value and the internal housekeeping is checking automatically if a value exceeds itsTTL. int getTimeToLive(); void setTimeToLive(int timeToLive); By default, keeping the values forever (set to Integer.MAX_VALUE) 12
  • 13. HColumnDescriptor – Properties Property Description In-memory lock cache and how HBase is using it to keep entire blocks of data in memory for efficient sequential access to values.The in-memory flag defaults to false. boolean isInMemory(); void setInMemory(boolean inMemory); is good for small column families with few values, such as the passwords of a user table, so that logins can be processed very fast. Bloom filter Allowing you to improve lookup times given you have a specific access pattern. Since they add overhead in terms of storage and memory, they are turned off by default. Replication scope It enables you to have multiple clusters that ship local updates across the network so that they are applied to the remote copies. By default is 0. 13
  • 14. HBaseAdmin  Just like a DDL in RDBMSs  Create tables with specific column families  Check for table existence  Alter table and column family definitions  Drop tables  And more… 14
  • 15. HBaseAdmin – Basic Operations  boolean isMasterRunning()  HConnection getConnection()  Configuration getConfiguration()  close() 15
  • 16. HBaseAdmin – Table Operations  Table-related admin.API  They are asynchronous in nature  createTable() V.S. createTableAsync(), etc  CreateTable  ch05/admin.CreateTableExample  ch05/admin.CreateTableWithRegionsExample  A numRegions that is at least 3: otherwise, the call will return with an exception  This is to ensure that you end up with at least a minimum set of regions 16
  • 17. HBaseAdmin – Table Operations  DoesTable exist  ch05/admin.ListTablesExample  You should be using existing table names  Otherwise, org.apache.hadoop.hbase.TableNotFoundException will be thrown  DeleteTable  ch05/admin.TableOperationsExample  Disabling a table can potentially take a very long time, up to several minutes  Depending on how much data is residual in the server’s memory and not yet persisted to disk  Undeploying a region requires all the data to be written to disk first  isTableAvailable() V.S. isTableEnabled()/isTableDisabled() 17
  • 18. HBaseAdmin – Table Operations  ModifyTable  ch05/admin.ModifyTableExample  HTableDescriptor.equals()  Compares the current with the specified instance  Returns true if they match in all properties  Also including the contained column families and their respective settings 18
  • 19. HBaseAdmin – Schema Operations  Besides using the modifyTable() call, there are dedicated methods provided by the HBaseAdmin  Make sure the table to be modified is disabled first  All of these calls are asynchronous  void addColumn(String tableName,HColumnDescriptor column)  void addColumn(byte[] tableName,HColumnDescriptor column)  void deleteColumn(String tableName,String columnName)  void deleteColumn(byte[] tableName,byte[] columnName)  void modifyColumn(String tableName,HColumnDescriptor descriptor)  void modifyColumn(byte[] tableName,HColumnDescriptor descriptor) 19
  • 20. HBaseAdmin – Cluster Operations Methods in HBaseAdmin Class Description • static void checkHBaseAvailable(Configuration conf) • ClusterStatus getClusterStatus() • Client application can com-municate with the remote HBase cluster, either silently succeeds, or throws said error • Retrieve an instance of the ClusterStatus class, containing detailed information about the cluster status • void closeRegion(String regionname, String hostAndPort) • void closeRegion(byte[] regionname, String hostAndPort) Close regions that have previously been deployed to region servers. Does bypass any master notification, the region is directly closed by the region server, unseen by the master node. • void flush(String tableNameOrRegionName) • void flush(byte[] tableNameOrRegionName) Call the MemStore instances of the region or table, to flush the cached modification data into disk. Or the data would be written by hitting the memstore flush size. For advanced users, so please check theseAPI in the document and handle with care 20
  • 21. HBaseAdmin – Cluster Operations Methods in HBaseAdmin Class Description • void compact(String tableNameOrRegionName) • void compact(byte[] tableNameOrRegionName) Minor-compaction, compactions can potentially take a long time to complete. It is executed in the background by the server hosting the named region, or by all servers hosting any region of the given table • void majorCompact(String tableNameOrRegionName) • void majorCompact(byte[] tableNameOrRegionName) Major-compaction • void split(String tableNameOrRegionName) • void split(byte[] tableNameOrRegionName) • … These calls allows you to split a specific region, or table 21
  • 22. HBaseAdmin – Cluster Operations Methods in HBaseAdmin Class Description • void assign(byte[] regionName, boolean force) • void unassign(byte[] regionName,boolean force) A client requires a region to be deployed or undeployed from the region servers, it can invoke these calls. • void move(byte[] encodedRegionName,byte[] destServerName) Move a region from its current region server to a new one. The destServerName parameter can be set to null to pick a new server at random. • boolean balanceSwitch(boolean b) • boolean balancer() • Allows you to switch the region balancer on or off. • A call to balancer() will start the process of moving regions • from the servers, with more deployed to those with less deployed regions. • void shutdown() • void stopMaster() • void stopRegionServer(String hostnamePort) • Shut down the entire cluster • Stop the master server • Stop a particular region server only • Once invoked, the affected servers will be stopped, that is, there is no delay nor a way to revert the process22
  • 23. HBaseAdmin – Cluster Status Information  You can get more details info. about your HBase cluster from HBaseAdmin.getClusterStatus()  Related Classes  ClusterStatus  ServerName => HServerInfo  HServerLoad  RegionLoad  ch05/admin.ClusterStatusExample 23
  • 24. Available Clients  HBase comes with a variety of clients that can be used from various programming languages  Interactive Clients  Native JavaAPI  REST  Thrift  Avro  Batch Clients  MapReduce  Hive  Pig  Shell  Web-based UI 24
  • 25. Available Clients  Interactive Clients  Native JavaAPI  REST  Thrift  Avro  Batch Clients  MapReduce  Hive  Pig  Shell  Web-based UI We’ve already done 25
  • 26. Batch Clients – MapReduce framework  HDFS:A distributed filesystem  MapReduce:A distributedAlgorithm 26
  • 27. Batch Clients - MapReduce framework 27
  • 28. Batch Clients - MapReduce  InputFormat and TableInputFormat 28
  • 29. Batch Clients - MapReduce  Mapper and TableMapper 29
  • 30. Batch Clients - MapReduce  Reducer and TableReducer 30
  • 31. Batch Clients - MapReduce  OutputFormat and TableOutputFomrat 31
  • 32. Batch Clients - MapReduce  Sample  ch07/mapreduce.Driver  How to run //in root account  In hbase shell  create‘testtable_mr’,‘data’ //in hbase-user account  cd ${GIT_HOME}/hbase-training/002/projects/hbase-book/ch07  Hadoop fs –copyFromLocal  hadoop fs -copyFromLocal test-data.txt /tmp  hadoop jar target/hbase-book-ch07-1.0.jar ImportFromFile -t testtable -i /tmp/test-data.txt -c data:json  How to use  hadoop jar target/hbase-book-ch07-1.0.jar //will show usage 32
  • 33.  Apache Pig project  A platform to analyze large amounts of data  It has its own high-level query language, called Pig Latin  uses an imperative programming style to formulate the steps involved in transforming the input data to the final output  Opposite of Hive’s declarative approach to emulate SQL (HiveQL)  Combined with the power of Hadoop and the MapReduce framework Batch Clients - Pig 33
  • 34. Batch Clients – Pig Latin Sample --Load data from a file and write to HBase raw = LOAD 'tutorial/data/excite-small.log' USING PigStorage('t') AS (user, time, query); T = FOREACH raw GENERATE CONCAT(CONCAT(user, 'u0000'), time), query; STORET INTO 'excite' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfam1:query'); --Load records which just been written from HBase R = LOAD 'excite' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfam1:query', '-loadKey')AS (key: chararray, query: chararray); 34
  • 35. Shell  We already used on course #1  hbase shell  The majority of commands have a direct match with a method provided by either the client or administrative API  Grouped into five different categories, representing their semantic relationships 35
  • 37. Shell – Data definition 37
  • 38. Shell – Data manipulation 38
  • 41. Web-based UI  Master UI (http://${your_host}:8110/master.jsp)  Main page  UserTable page  Zookeeper page  Region Server UI  Shared pages  Local logs  Thread Dump  Log level 41
  • 43. 43