003 admin featuresandclients

Scott Miao 2012/7/12
HBase Admin API & Available Clients
1

Agenda
 Course Credit
 HBaseAdmin APIs
 HTableDescriptor
 HColumnDescriptor
 HBaseAdmin
 Available Clients
 Interactive Clients
 Batch Clients
 Shell
 Web-based UI
2

Course Credit
 Show up, 30 scores
 Ask question, each question earns 5 scores
 Hands-on, 40 scores
 70 scores will pass this course
 Each course credit will be calculated once for each course
finished
 The course credit will be sent to you and your supervisor by
mail
3

Hadoop RPC framework
 Writable interface
 void write(DataOutput out) throws IOException;
 Serialize the Object data and send to remote
 void readFields(DataInput in) throws IOException;
 New an instance and deserialize the remote-data for subsequent
operations
 Parameterless Constructor
 Hadoop will instantiate a empty Object
 Call the readFields method to deserialize the remote data
4

HTableDescriptor
 Constructor
 HTableDescriptor();
 HTableDescriptor(String name);
 HTableDescriptor(byte[] name);
 HTableDescriptor(HTableDescriptor desc);
 ch05/admin.CreateTableExample
 Can be used to fine-tune the table’s performance
5

HTableDescriptor – Logical V.S. physical views
6

HTableDescriptor - Properties
Property Description
Name SpecifyTable Name
byte[] getName();
String getNameAsString();
void setName(byte[] name);
Column Families Specify column family
void addFamily(HColumnDescriptor family);
boolean hasFamily(byte[] c);
HColumnDescriptor[] getColumnFamilies();
HColumnDescriptor getFamily(byte[]column);
HColumnDescriptor removeFamily(byte[] column);
Maximum File Size Specify maximum size a region within the table can grow to
long getMaxFileSize();
void setMaxFileSize(long maxFileSize);
It really about the maximum size of each store, the better name would be
maxStoreSize; By default, it’s size is 256 MB, a larger value may be required
when you have a lot of data.7

HTableDescriptor - Properties
Read-only By default, all tables are writable, If the flag is set to true, you can only read
from the table and not modify it at all.
boolean isReadOnly();
void setReadOnly(boolean readOnly);
Memstore flush size An in-memory store to buffer values before writing them to disk as a new
storage file. default 64 MB.
long getMemStoreFlushSize();
void setMemStoreFlushSize(long memstoreFlushSize);
Deferred log flush Save write-ahead-log entries to disk, by default, set to false.
synchronized boolean isDeferredLogFlush();
void setDeferredLogFlush(boolean isDeferredLogFlush);
Miscellaneous options Stored with the table definition and can be retrieved if necessary.
byte[] getValue(byte[] key)
String getValue(String key)
Map<ImmutableBytesWritable,ImmutableBytesWritable> getValues()
void setValue(byte[] key,byte[] value)
void setValue(String key,String value)
void remove(byte[] key)
8

HColumnDescriptor
 A more appropriate name would be HColumnFamilyDescriptor
 The family name must be printable
 You cannot simply rename them later
 Constructor
 HColumnDescriptor();
 HColumnDescriptor(String familyName),
 HColumnDescriptor(byte[] familyName);
 HColumnDescriptor(HColumnDescriptor desc);
 HColumnDescriptor(byte[] familyName,int maxVersions,String compression,
 boolean inMemory,boolean blockCacheEnabled,int timeToLive,
 String bloomFilter);
 HColumnDescriptor(byte [] familyName,int maxVersions,String compression,
 boolean inMemory,boolean blockCacheEnabled,int blocksize,
 int timeToLive,String bloomFilter,int scope);
9

HColumnDescriptor –
Column families V.S. store files
10

Name Specify column family name.A column family cannot be renamed, create a new family
with the desired name and copy the data over, using theAPI
byte[] getName();
String getNameAsString();
Maximum
versions
Predicate deletion. How many versions of each value you want to keep. Default value is 3
int getMaxVersions();
void setMaxVersions(int maxVersions);
Compression HBase has pluggable compression algorithm support. Default value is NONE.
HColumnDescriptor – Properties
11

Block size All stored files are divided into smaller blocks that are loaded during a get or scan
operation, default value is 64KB.
synchronized int getBlocksize();
void setBlocksize(int s);
HDFS is using a block size of—by default—64 MB
Block cache HBase reads entire blocks of data for efficient I/O usage and retains these blocks
in an in-memory cache so that subsequent reads do not need any disk operation.The
default is true.
boolean isBlockCacheEnabled();
void setBlockCacheEnabled(boolean blockCacheEnabled);
if your use case only ever has sequential reads on a particular column family, it is
advisable that you disable it.
Time-to-live (TTL) Predicate deletion.A threshold based on the timestamp of a value and the internal
housekeeping is checking automatically if a value exceeds itsTTL.
int getTimeToLive();
void setTimeToLive(int timeToLive);
By default, keeping the values forever (set to Integer.MAX_VALUE)
12

In-memory lock cache and how HBase is using it to keep entire blocks of data in memory for
efficient sequential access to values.The in-memory flag defaults to false.
boolean isInMemory();
void setInMemory(boolean inMemory);
is good for small column families with few values, such as the passwords of a user
table, so that logins can be processed very fast.
Bloom filter Allowing you to improve lookup times given you have a specific access pattern.
Since they add overhead in terms of storage and memory, they are turned off by
default.
Replication scope It enables you to have multiple clusters that ship local updates across the network so
that they are applied to the remote copies. By default is 0.
13

HBaseAdmin
 Just like a DDL in RDBMSs
 Create tables with specific column families
 Check for table existence
 Alter table and column family definitions
 Drop tables
 And more…
14

HBaseAdmin – Basic Operations
 boolean isMasterRunning()
 HConnection getConnection()
 Configuration getConfiguration()
 close()
15

HBaseAdmin – Table Operations
 Table-related admin.API
 They are asynchronous in nature
 createTable() V.S. createTableAsync(), etc
 CreateTable
 ch05/admin.CreateTableExample
 ch05/admin.CreateTableWithRegionsExample
 A numRegions that is at least 3: otherwise, the call will return with an
exception
 This is to ensure that you end up with at least a minimum set of regions
16

 DoesTable exist
 ch05/admin.ListTablesExample
 You should be using existing table names
 Otherwise, org.apache.hadoop.hbase.TableNotFoundException will be thrown
 DeleteTable
 ch05/admin.TableOperationsExample
 Disabling a table can potentially take a very long time, up to several
minutes
 Depending on how much data is residual in the server’s memory and
not yet persisted to disk
 Undeploying a region requires all the data to be written to disk first
 isTableAvailable() V.S. isTableEnabled()/isTableDisabled()
17

 ModifyTable
 ch05/admin.ModifyTableExample
 HTableDescriptor.equals()
 Compares the current with the specified instance
 Returns true if they match in all properties
 Also including the contained column families and their respective settings
18

HBaseAdmin – Schema Operations
 Besides using the modifyTable() call, there are dedicated
methods provided by the HBaseAdmin
 Make sure the table to be modified is disabled first
 All of these calls are asynchronous
 void addColumn(String tableName,HColumnDescriptor column)
 void addColumn(byte[] tableName,HColumnDescriptor column)
 void deleteColumn(String tableName,String columnName)
 void deleteColumn(byte[] tableName,byte[] columnName)
 void modifyColumn(String tableName,HColumnDescriptor descriptor)
 void modifyColumn(byte[] tableName,HColumnDescriptor descriptor)
19

HBaseAdmin – Cluster Operations
Methods in HBaseAdmin Class Description
• static void
checkHBaseAvailable(Configuration
conf)
• ClusterStatus getClusterStatus()
• Client application can com-municate with the remote
HBase cluster, either silently succeeds, or throws said error
• Retrieve an instance of the ClusterStatus class,
containing detailed information about the cluster status
• void closeRegion(String regionname,
String hostAndPort)
• void closeRegion(byte[] regionname,
String hostAndPort)
Close regions that have previously been deployed to region
servers. Does bypass any master notification, the region is
directly closed by the region server, unseen by the master
node.
• void flush(String
tableNameOrRegionName)
• void flush(byte[]
Call the MemStore instances of the region or table, to flush
the cached modification data into disk. Or the data would be
written by hitting the memstore flush size.
For advanced users, so please check theseAPI in the document and handle with care
20

Methods in HBaseAdmin
Class
Description
• void compact(String
• void compact(byte[]
Minor-compaction, compactions can potentially take a long
time to complete. It is executed in the background by the
server hosting the named region, or by all servers hosting
any region of the given table
• void majorCompact(String
• void majorCompact(byte[]
Major-compaction
• void split(String
• void split(byte[]
• …
These calls allows you to split a specific region, or table
21

Methods in HBaseAdmin
Class
Description
• void assign(byte[] regionName,
boolean force)
• void unassign(byte[]
regionName,boolean force)
A client requires a region to be deployed or undeployed from
the region servers, it can invoke these calls.
• void move(byte[]
encodedRegionName,byte[]
destServerName)
Move a region from its current region server to a new one.
The destServerName parameter can be set to null to pick a new
server at random.
• boolean balanceSwitch(boolean
b)
• boolean balancer()
• Allows you to switch the region balancer on or off.
• A call to balancer() will start the process of moving regions
• from the servers, with more deployed to those with less
deployed regions.
• void shutdown()
• void stopMaster()
• void stopRegionServer(String
hostnamePort)
• Shut down the entire cluster
• Stop the master server
• Stop a particular region server only
• Once invoked, the affected servers will be stopped, that is,
there is no delay nor a way to revert the process22

HBaseAdmin –
Cluster Status Information
 You can get more details info. about your HBase cluster from
HBaseAdmin.getClusterStatus()
 Related Classes
 ClusterStatus
 ServerName => HServerInfo
 HServerLoad
 RegionLoad
 ch05/admin.ClusterStatusExample
23

Available Clients
 HBase comes with a variety of clients that can be used from
various programming languages
 Native JavaAPI
 REST
 Thrift
 Avro
 Batch Clients
 MapReduce
 Hive
 Pig
 Shell
 Web-based UI
24

Available Clients
 Native JavaAPI
 REST
 Thrift
 Avro
 Batch Clients
 MapReduce
 Hive
 Pig
 Shell
 Web-based UI
We’ve already done
25

Batch Clients – MapReduce framework
 HDFS:A distributed filesystem
 MapReduce:A distributedAlgorithm
26

Batch Clients - MapReduce framework
27

Batch Clients - MapReduce
 InputFormat and TableInputFormat
28

 Mapper and TableMapper
29

 Reducer and TableReducer
30

 OutputFormat and TableOutputFomrat
31

 Sample
 ch07/mapreduce.Driver
 How to run
//in root account
 In hbase shell
 create‘testtable_mr’,‘data’
//in hbase-user account
 cd ${GIT_HOME}/hbase-training/002/projects/hbase-book/ch07
 Hadoop fs –copyFromLocal
 hadoop fs -copyFromLocal test-data.txt /tmp
 hadoop jar target/hbase-book-ch07-1.0.jar ImportFromFile -t testtable -i
/tmp/test-data.txt -c data:json
 How to use
 hadoop jar target/hbase-book-ch07-1.0.jar //will show usage
32

 Apache Pig project
 A platform to analyze large amounts of data
 It has its own high-level query language, called Pig Latin
 uses an imperative programming style to formulate the steps
involved in transforming the input data to the final output
 Opposite of Hive’s declarative approach to emulate SQL (HiveQL)
 Combined with the power of Hadoop and the MapReduce
framework
Batch Clients - Pig
33

Batch Clients – Pig Latin Sample
--Load data from a file and write to HBase
raw = LOAD 'tutorial/data/excite-small.log' USING PigStorage('t')
AS (user, time, query);
T = FOREACH raw GENERATE
CONCAT(CONCAT(user, 'u0000'), time), query;
STORET INTO 'excite' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfam1:query');
--Load records which just been written from HBase
R = LOAD 'excite' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfam1:query',
'-loadKey')AS (key: chararray, query: chararray);
34

Shell
 We already used on course #1
 hbase shell
 The majority of commands have a direct match with a
method provided by either the client or administrative API
 Grouped into five different categories, representing their
semantic relationships
35

Shell – Data manipulation
38

Web-based UI
 Master UI (http://${your_host}:8110/master.jsp)
 Main page
 UserTable page
 Zookeeper page
 Region Server UI
 Shared pages
 Local logs
 Thread Dump
 Log level
41

003 admin featuresandclients

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a 003 admin featuresandclients

Similar a 003 admin featuresandclients (20)

Más de Scott Miao

Más de Scott Miao (9)

Último

Último (20)

003 admin featuresandclients