1. Hands On MapR
CLI only, no GUI☺
Viadea Zhu
http://weibo.com/viadea
March. 2012
2. Agenda
• MapR Architecture
• Cluster Management
• Volume
• Mirror
• Schedule
• Snapshot
• NFS
• Managing Data
• Users and Groups
• Troubleshooting and Performance tunning
3. MapR Architecture
• Basic Services
– CLDB
– FileServer
– Jobtracker
– Tasktracker
– Zookeeper
– NFS
– WebServer
• warden
A process called the warden runs on all nodes to manage,
monitor, and report on the other services on each node.
The warden will not start any services unless ZooKeeper is
reachable and more than half of the configured ZooKeeper
nodes are live.
4. Cluster Management
• Bring up cluster:
1.Start ZooKeeper on all nodes where it is installed, by issuing the following
command
/etc/init.d/mapr-zookeeper start
2.On one of the CLDB nodes and the node running the mapr-webserver
service, start the warden:
/etc/init.d/mapr-warden start
5. Cluster Management
• Stop cluster(1):
1. Determine which nodes are running the NFS gateway.
[root@mdw]# /opt/mapr/bin/maprcli node list -filter
"[rp==/*]and[svc==nfs]" -columns id,h,hn,svc, rp
id service
hostname health ip
4277269757083023248
tasktracker,webserver,cldb,fileserver,nfs,hoststats,jobtracker mdw
2 172.28.4.250,10.32.190.66,172.28.8.250,172.28.12.250
3528082726925061986 tasktracker,fileserver,nfs,hoststats
sdw1 2 172.28.4.1,172.28.8.1,172.28.12.1
5521777324064226112 fileserver,tasktracker,nfs,hoststats
sdw3 0 172.28.8.3,172.28.12.3,172.28.4.3
3482126520576246764 fileserver,tasktracker,nfs,hoststats
sdw5 0 172.28.4.5,172.28.8.5,172.28.12.5
4667932985226440135 fileserver,tasktracker,nfs,hoststats
sdw7 0 172.28.8.7,172.28.12.7,172.28.4.7
6. Cluster Management
• Stop cluster(2):
2. Determine which nodes are running the CLDB.
[root@mdw]# /opt/mapr/bin/maprcli node list -filter
"[rp==/*]and[svc==cldb]" -columns id,h,hn,svc, rp
id service
hostname health ip
4277269757083023248
tasktracker,webserver,cldb,fileserver,nfs,hoststats,jobtracker mdw
2 172.28.4.250,10.32.190.66,172.28.8.250,172.28.12.250
7. Cluster Management
• Stop cluster(3):
3. List all non-CLDB nodes.
[root@mdw]# /opt/mapr/bin/maprcli node list -filter
"[rp==/*]and[svc!=cldb]" -columns id,h,hn,svc, rp
id service hostname
health ip
3528082726925061986 tasktracker,fileserver,nfs,hoststats sdw1 2
172.28.4.1,172.28.8.1,172.28.12.1
5521777324064226112 fileserver,tasktracker,nfs,hoststats sdw3 0
172.28.8.3,172.28.12.3,172.28.4.3
3482126520576246764 fileserver,tasktracker,nfs,hoststats sdw5 0
172.28.4.5,172.28.8.5,172.28.12.5
4667932985226440135 fileserver,tasktracker,nfs,hoststats sdw7 0
172.28.8.7,172.28.12.7,172.28.4.7
8. Cluster Management
• Stop cluster(4):
4. Shut down all NFS instances.
/opt/mapr/bin/maprcli node services -nfs stop -nodes mdw sdw1 sdw3
sdw5 sdw7
5. SSH into each CLDB node and stop the warden.
/etc/init.d/mapr-warden stop
6. SSH into each of the remaining nodes and stop the warden.
/etc/init.d/mapr-warden stop
7. Stop the zookeeper on zookeeper node(s).
/etc/init.d/mapr-zookeeper stop
9. Cluster Management
• Restart Webserver:
/opt/mapr/adminuiapp/webserver stop
/opt/mapr/adminuiapp/webserver start
• Restart Services: (eg, tasktracker)
maprcli node services -nodes mdw -tasktracker stop
maprcli node services -nodes mdw -tasktracker start
• Grant full permission to chosen administrator OS user
/opt/mapr/bin/maprcli acl edit -type cluster -user <user>:fc
10. Cluster Management
• Alarm Email
maprcli alarm config save -values "AE_ALARM_AEQUOTA_EXCEEDED,1,test@example.com"
maprcli alarm config save -values "NODE_ALARM_CORE_PRESENT,1,viadea.zhu@emc.com“
• List Alarm
[gpadmin@mdw]$ maprcli alarm list -type cluster
alarm state description entity
alarm name alarm statechange time
1 One or more licenses is about to expire within 28 days CLUSTER
CLUSTER_ALARM_LICENSE_NEAR_EXPIRATION 1330171978541
[gpadmin@mdw]$ maprcli alarm list -type node
alarm state description
entity alarm name alarm statechange time
1 Can not determine if service: cldb is running. Check logs at:
/opt/mapr/logs/cldb.log sdw1 NODE_ALARM_SERVICE_CLDB_DOWN 1324274386763
1 Node has core file(s)
mdw NODE_ALARM_CORE_PRESENT 1330145172579
11. Cluster Management
• List Nodes
maprcli node list -columns id,h,hn,br,da,dtotal,dused,davail,fs-heartbeat
maprcli node list -columns id,br,fs-heartbeat,jt-heartbeat
• Remove Nodes
Take sdw5 for example:
1. Stop warden on sdw5:
/etc/init.d/mapr-warden stop
2. Remove on CLDB node:
maprcli node remove -nodes sdw5 -zkconnect sdw1:5181
12. Cluster Management
• Reformat a node
Take sdw5 for example:
1. Stop warden:
/etc/init.d/mapr-warden stop
2. Remove the disktab file:
rm /opt/mapr/conf/disktab
3. Create a text file /tmp/disks.txt that lists all the disks and
partitions to format for use by Greenplum HD EE.
[root@sdw5 ~]# cat /tmp/disks.txt
/data2/hdpee/storagefile
4. Use disksetup to re-format the disks:
/opt/mapr/server/disksetup -F /tmp/disks.txt
5. Start the Warden:
/etc/init.d/mapr-warden start
13. Cluster Management
• Add a new node
/opt/mapr/server/configure.sh -C mdw -Z sdw1 -N ViadeaCluster
/opt/mapr/server/disksetup -F /tmp/disks.txt
/etc/init.d/mapr-warden start
20. Mirror
• Sync Mirrors using “push”
[root@mdw ~]# maprcli volume mirror push -name viadeavol
Starting mirroring of volume viadeavol_mirror2
Starting mirroring of volume viadeavol_mirror1
Mirroring complete for volume viadeavol_mirror1
Mirroring complete for volume viadeavol_mirror2
Successfully completed mirror push to all local mirrors of volume
viadeavol
• Sync Mirror using “start”
[root@mdw ~]# maprcli volume mirror start -full false -name
viadeavol_mirror1
messages
Started mirror operation for volume(s) 'viadeavol_mirror1'
21. Mirror
• Stop mirror sync
[gpadmin@mdw viadea]$ maprcli volume mirror stop -name
viadeavol_mirror1
messages
Stopped mirror operation for 'viadeavol_mirror1
http://answers.mapr.com/questions/1773/about-stopping-mirror
Answer:
• Both mirror push and mirror start work the same way ... the destination of
the mirror pulls the data. The difference is that mirror push is synchronous
and the command will wait until the mirroring is complete, while mirror
start is asynchronous and only kicks off the mirroring and returns
immediately without waiting.
• mirror stop works in both situations.
22. Schedule
• Create Schedule
maprcli schedule create -schedule '{"name":"Schedule-
1","rules":[{"frequency":"once","retain":"1w","time":13,"date":"12
/5/2010"}]}'
• List Schedule
[root@mdw binary]# maprcli schedule list -output verbose
id name inuse rules
1 Critical data 0 ...
2 Important data 0 ...
3 Normal data 1 ...
4 mirror_sync 1 ...
5 Schedule-1 0 ...
26. NFS
• Mount
1.List the NFS shares exported on the server:
[gpadmin@smdw ~]$ /usr/sbin/showmount -e mdw
Export list for mdw:
/mapr *
/mapr/ViadeaCluster *
2.Using root to create the directory on smdw:
mkdir /mapr
3.Mount on smdw:
mount mdw:/mapr /mapr
4.Change /etc/fstab on smdw:
mdw:/mapr /mapr nfs rw 0 0
27. NFS
• Setting ChunkSize and Compression for a volume
[root@smdw viadeavol]# more .dfs_attributes
# lines beginning with # are treated as comments
Compression=true
ChunkSize=268435456
[root@smdw viadeavol]# hadoop mfs -setchunksize 13107000 /viadeavol
setchunksize: chunksize should be a multiple of 64K
[root@smdw viadeavol]# hadoop mfs -setchunksize 13107200 /viadeavol
32. Users and Groups
• Cluster Permission
login(including cv): Log in to the Greenplum HD EE Control System, use the API and
command-line interface, read access on cluster and volumes
ss:Start/stop services
cv:Create volumes
a:Admin access
fc:Full control (administrative access and permission to change the cluster ACL)
33. Users and Groups
• Volume Permission
dump:Dump the volume
restore:Mirror or restore the volume
m:Modify volume properties, create and delete snapshots
d:Delete a volume
fc:Full control (admin access and permission to change volume ACL)
34. Users and Groups
• List ACL
[root@mdw conf]# maprcli acl show -type cluster
Principal Allowed actions
User root [login, ss, cv, a, fc]
User gpadmin [login, ss, cv, a, fc]
[root@mdw conf]# maprcli acl show -type volume -name viadeavol -user root
Principal Allowed actions
User root [dump, restore, m, d, fc]
35. Users and Groups
• Modify ACL for a user
maprcli acl edit -type cluster -user viadea:cv
maprcli acl edit -type cluster -user viadea:a
maprcli acl edit -type volume -name viadeavol -user viadea:m
• Modify ACL for a whole cluster or volume
maprcli acl set -type volume -name test-volume -user
jsmith:dump,restore,m rjones:fc
• Setting volume quotum
maprcli volume modify -name viadeavol -quota 2G
• Setting entity quotum
maprcli entity modify -type 0 -name viadea -quota 1T
36. Troubleshooting&Performance
Tunning
• Small Job(1)
mapred-site.xml:
<property>
<name>mapred.fairscheduler.smalljob.schedule.enable</name>
<value>true</value>
<description>Enable small job fast scheduling inside fair
scheduler.
TaskTrackers should reserve a slot called ephemeral slot which
is used for smalljob if cluster is busy.
</description>
</property>
37. Troubleshooting&Performance
Tunning
• Small Job(2)
<!-- Small job definition. If a job does not satisfy any of following limits
it is not considered as a small job and will be moved out of small job pool.
-->
<property>
<name>mapred.fairscheduler.smalljob.max.maps</name>
<value>10</value>
<description>Small job definition. Max number of maps allowed in small job.
</description>
</property>
<property>
<name>mapred.fairscheduler.smalljob.max.reducers</name>
<value>10</value>
<description>Small job definition. Max number of reducers allowed in small
job. </description>
</property>
38. Troubleshooting&Performance
Tunning
• Small Job(3)
<property>
<name>mapred.fairscheduler.smalljob.max.inputsize</name>
<value>10737418240</value>
<description>Small job definition. Max input size in bytes allowed for a
small job.
Default is 10GB.
</description>
</property>
<property>
<name>mapred.fairscheduler.smalljob.max.reducer.inputsize</name>
<value>1073741824</value>
<description>Small job definition.
Max estimated input size for a reducer allowed in small job.
Default is 1GB per reducer.
</description>
</property>
39. Troubleshooting&Performance
Tunning
• Small Job(4)
<property>
<name>mapred.cluster.ephemeral.tasks.memory.limit.mb</name>
<value>200</value>
<description>Small job definition. Max memory in mbytes reserved
for an ephermal slot.
Default is 200mb. This value must be same on JobTracker and
TaskTracker nodes.
</description>
</property>
40. Troubleshooting&Performance
Tunning
• Memory for Greenplum HD EE Services
/opt/mapr/conf/warden.conf
service.command.tt.heapsize.percent=2 #The percentage of heap space reserved for the
TaskTracker.
service.command.tt.heapsize.max=325 #The maximum heap space that can be used by the
TaskTracker.
service.command.tt.heapsize.min=64 #The minimum heap space for use by the TaskTracker.
[gpadmin@mdw viadea]$ cat /opt/mapr/conf/warden.conf|grep size|grep percent
service.command.jt.heapsize.percent=10
service.command.tt.heapsize.percent=2
service.command.hbmaster.heapsize.percent=4
service.command.hbregion.heapsize.percent=25
service.command.cldb.heapsize.percent=8
service.command.mfs.heapsize.percent=20
service.command.webserver.heapsize.percent=3
service.command.os.heapsize.percent=3
41. Troubleshooting&Performance
Tunning
• Memory for MapReduce
/opt/mapr/hadoop/hadoop-0.20.2/conf/mapred-site.xml
<property>
<name>mapreduce.tasktracker.reserved.physicalmemory.mb</name>
<value></value>
<description> Maximum phyiscal memory tasktracker should reserve for
mapreduce tasks.
If tasks use more than the limit, task using maximum memory will be killed.
Expert only: Set this value iff tasktracker should use a certain amount of
memory
for mapreduce tasks. In MapR Distro warden figures this number based
on services configured on a node.
Setting mapreduce.tasktracker.reserved.physicalmemory.mb to -1 will disable
physical memory accounting and task management.
</description>
</property>
42. Troubleshooting&Performance
Tunning
• Memory for MapReduce
Map tasks Memory
Map tasks use memory mainly in two ways:
The application consumes memory to run the map function.
The MapReduce framework uses an intermediate buffer to hold serialized (key, value) pairs.
(io.sort.mb)
/opt/mapr/hadoop/hadoop-0.20.2/conf/mapred-site.xml
io.sort.mb
Buffer used to hold map outputs in memory before writing final map
outputs.
Setting this value very low may cause spills. By default if left
empty value is set to 50% of heapsize for map.
If a average input to map is "MapIn" bytes then typically value of
io.sort.mb should be '1.25 times MapIn' bytes.
43. Troubleshooting&Performance
Tunning
• Memory for MapReduce
Reduce tasks Memory
mapred.reduce.child.java.opts
Java opts for the reduce tasks. Default heapsize(-Xmx) is determined
by memory reserved for mapreduce at tasktracker.
Reduce task is given more memory than map task.
Default memory for a reduce task = (Total Memory reserved for
mapreduce) * (2*#reduceslots / (#mapslots + 2*#reduceslots))
44. Troubleshooting&Performance
Tunning
• Tasks number(1)
Map slots should be based on how many map tasks can fit in memory,
and reduce slots should be based on the number of CPUs
mapred.tasktracker.map.tasks.maximum: (CPUS > 2) ? (CPUS * 0.75) : 1
(At least one Map slot, up to 0.75 times the number of CPUs)
mapred.tasktracker.reduce.tasks.maximum: (CPUS > 2) ? (CPUS * 0.50)
: 1 (At least one Map slot, up to 0.50 times the number of CPUs)
variables in formula:
CPUS - number of CPUs present on the node
DISKS - number of disks present on the node
MEM - memory reserved for MapReduce tasks
45. Troubleshooting&Performance
Tunning
• Tasks number(2)
mapreduce.tasktracker.prefetch.maptasks
How many map tasks should be scheduled in-advance on a
tasktracker.
To be given in % of map slots. Default is 1.0 which
means number of tasks overscheduled = total map
slots on TT.
46. Troubleshooting&Performance
Tunning
• Final&Important : What needs to collect???
/opt/mapr/support/tools/mapr-support-collect.sh -n support-output.txt
[root@mdw collect]# ls -altr /opt/mapr/support/collect/support-output.txt.tar
-rw-r--r-- 1 root root 27607040 Mar 1 22:34 /opt/mapr/support/collect/support-
output.txt.tar
47. Troubleshooting&Performance
Tunning
• What are in the support dump file??
1.“cluster” Directory
2. Directory for each node
• [root@mdw support-output.txt]# ls -altr
• total 32
• drwxr-xr-x 3 root root 4096 Mar 1 22:19 cluster
• drwxr-xr-x 8 root root 4096 Mar 1 22:24 .
• drwxr-xr-x 5 root root 4096 Mar 1 22:33 172.28.4.1
• drwxr-xr-x 2 root root 4096 Mar 1 22:34 172.28.8.7
• drwxr-xr-x 2 root root 4096 Mar 1 22:34 172.28.8.3
• drwxr-xr-x 2 root root 4096 Mar 1 22:34 172.28.4.5
• drwxr-xr-x 2 root root 4096 Mar 1 22:34 172.28.4.250
• drwxr-xr-x 4 root root 4096 Mar 1 22:36 ..
48. Troubleshooting&Performance
Tunning
• What are in the “cluster” directory?
[root@mdw cluster]# cat cluster.txt|grep Output
Output of /opt/mapr/bin/maprcli node list -json
Output of /opt/mapr/bin/maprcli node topo -json
Output of /opt/mapr/bin/maprcli node heatmap -view status -json
Output of /opt/mapr/bin/maprcli volume list -json
Output of /opt/mapr/bin/maprcli dump zkinfo -json
Output of /opt/mapr/bin/maprcli config load -json
Output of /opt/mapr/bin/maprcli alarm list –json
(…)
49. Troubleshooting&Performance
Tunning
• What are in the “node” directory?(1)
“conf” subdirectory: roles, all conf files, disk info,and some other OS
commands.
“logs” subdirectory:all logs, /var/log/message,some mapr status logs.
[root@mdw logs]# cat mfsState.txt|grep Output
Output of /opt/mapr/server/mrconfig -p 5660 info threads
Output of /opt/mapr/server/mrconfig -p 5660 info containers resync local
Output of /opt/mapr/bin/maprcli trace dump -port 5660
Output of /opt/mapr/bin/maprcli dump fileserverworkinfo -fileserverip 172.28.4.1
“pam.d” subdirectory
50. Troubleshooting&Performance
Tunning
• What are in the “node” directory?(2)
MapRBuildVersion
redhat-release
secure.log
sysinfo.txt : some output of OS commands
[gpadmin@mdw 172.28.4.1]$ cat sysinfo.txt|grep Output
Output of lscpu
Output of ifconfig -a
Output of uname -a
Output of netstat -an
Output of netstat -rn
Output of hostname
Output of cat /etc/hostname
(…)