SlideShare una empresa de Scribd logo
1 de 50
Descargar para leer sin conexión
Hands On MapR
  CLI only, no GUI☺
          Viadea Zhu
          March. 2012
• MapR Architecture
• Cluster Management
• Volume
• Mirror
• Schedule
• Snapshot
• Managing Data
• Users and Groups
• Troubleshooting and Performance tunning
MapR Architecture
• Basic Services
   –   CLDB
   –   FileServer
   –   Jobtracker
   –   Tasktracker
   –   Zookeeper
   –   NFS
   –   WebServer
• warden
A process called the warden runs on all nodes to manage,
  monitor, and report on the other services on each node.
The warden will not start any services unless ZooKeeper is
  reachable and more than half of the configured ZooKeeper
  nodes are live.
Cluster Management
• Bring up cluster:
1.Start ZooKeeper on all nodes where it is installed, by issuing the following
  /etc/init.d/mapr-zookeeper start

2.On one of the CLDB nodes and the node running the mapr-webserver
service, start the warden:
  /etc/init.d/mapr-warden start
Cluster Management
• Stop cluster(1):
1. Determine which nodes are running the NFS gateway.

[root@mdw]# /opt/mapr/bin/maprcli node list -filter
"[rp==/*]and[svc==nfs]" -columns id,h,hn,svc, rp
id                   service
hostname health ip
tasktracker,webserver,cldb,fileserver,nfs,hoststats,jobtracker   mdw
3528082726925061986 tasktracker,fileserver,nfs,hoststats
sdw1      2,,
5521777324064226112 fileserver,tasktracker,nfs,hoststats
sdw3      0,,
3482126520576246764 fileserver,tasktracker,nfs,hoststats
sdw5      0,,
4667932985226440135 fileserver,tasktracker,nfs,hoststats
sdw7      0,,
Cluster Management
• Stop cluster(2):
2. Determine which nodes are running the CLDB.

[root@mdw]# /opt/mapr/bin/maprcli node list -filter
"[rp==/*]and[svc==cldb]" -columns id,h,hn,svc, rp
id                   service
hostname health ip
tasktracker,webserver,cldb,fileserver,nfs,hoststats,jobtracker   mdw
Cluster Management
• Stop cluster(3):
3. List all non-CLDB nodes.

[root@mdw]# /opt/mapr/bin/maprcli node list -filter
"[rp==/*]and[svc!=cldb]" -columns id,h,hn,svc, rp
id                   service                               hostname
health ip
3528082726925061986 tasktracker,fileserver,nfs,hoststats   sdw1       2,,
5521777324064226112 fileserver,tasktracker,nfs,hoststats   sdw3       0,,
3482126520576246764 fileserver,tasktracker,nfs,hoststats   sdw5       0,,
4667932985226440135 fileserver,tasktracker,nfs,hoststats   sdw7       0,,
Cluster Management
• Stop cluster(4):
4. Shut down all NFS instances.

/opt/mapr/bin/maprcli node services -nfs stop -nodes mdw sdw1 sdw3
sdw5 sdw7

5. SSH into each CLDB node and stop the warden.
/etc/init.d/mapr-warden stop

6. SSH into each of the remaining nodes and stop the warden.
/etc/init.d/mapr-warden stop

7. Stop the zookeeper on zookeeper node(s).
/etc/init.d/mapr-zookeeper stop
Cluster Management
• Restart Webserver:
/opt/mapr/adminuiapp/webserver stop
/opt/mapr/adminuiapp/webserver start

• Restart Services: (eg, tasktracker)
maprcli node services -nodes mdw -tasktracker stop
maprcli node services -nodes mdw -tasktracker start

• Grant full permission to chosen administrator OS user
/opt/mapr/bin/maprcli acl edit -type cluster -user <user>:fc
Cluster Management
• Alarm Email
maprcli alarm config save -values "AE_ALARM_AEQUOTA_EXCEEDED,1,"
maprcli alarm config save -values "NODE_ALARM_CORE_PRESENT,1,“

• List Alarm
[gpadmin@mdw]$ maprcli alarm list -type cluster
alarm state description                                              entity
  alarm name                             alarm statechange time
1            One or more licenses is about to expire within 28 days CLUSTER
[gpadmin@mdw]$ maprcli alarm list -type node
alarm state description
  entity alarm name                     alarm statechange time
1            Can not determine if service: cldb is running. Check logs at:
  /opt/mapr/logs/cldb.log sdw1     NODE_ALARM_SERVICE_CLDB_DOWN 1324274386763
1            Node has core file(s)
  mdw     NODE_ALARM_CORE_PRESENT       1330145172579
Cluster Management
• List Nodes
maprcli node list -columns id,h,hn,br,da,dtotal,dused,davail,fs-heartbeat
maprcli node list -columns id,br,fs-heartbeat,jt-heartbeat

• Remove Nodes
Take sdw5 for example:
1. Stop warden on sdw5:
/etc/init.d/mapr-warden stop
2. Remove on CLDB node:
maprcli node remove -nodes sdw5 -zkconnect sdw1:5181
Cluster Management
• Reformat a node
Take sdw5 for example:
1. Stop warden:
/etc/init.d/mapr-warden stop
2. Remove the disktab file:
rm /opt/mapr/conf/disktab
3. Create a text file /tmp/disks.txt that lists all the disks and
  partitions to format for use by Greenplum HD EE.
[root@sdw5 ~]# cat /tmp/disks.txt
4. Use disksetup to re-format the disks:
/opt/mapr/server/disksetup -F /tmp/disks.txt
5. Start the Warden:
/etc/init.d/mapr-warden start
Cluster Management
• Add a new node
/opt/mapr/server/ -C mdw -Z sdw1 -N ViadeaCluster
/opt/mapr/server/disksetup -F /tmp/disks.txt
/etc/init.d/mapr-warden start
• Turnoff compression
[root@mdw ~]# hadoop mfs   -ls|grep var
drwxrwxrwx Z   - root      root            1 2011-12-19 13:52
  268435456 /var
[root@mdw ~]# hadoop mfs   -setcompression off /var
[root@mdw ~]# hadoop mfs   -ls|grep var
drwxrwxrwx U   - root      root             1 2011-12-19 13:52
  268435456 /var

• Create volume
maprcli volume create   -name viadeavol -path /viadeavol -quota 1G -
  advisoryquota 200M

maprcli volume create -name viadeavol.mirror -source
  viadeavol@viadeacluster -path /viadeavol_mirror -type 1
• List Volume
maprcli volume list -columns

• Viewing volume properties
maprcli volume info -name viadeavol
maprcli volume info -output terse -name viadeavol

• Modify volume
maprcli volume modify -name viadeavol.mirror -source viadeavol
• Mount/Unmount Volume
maprcli volume unmount -name viadeavol
maprcli volume mount -name viadeavol

• Remove volume
maprcli volume remove -name testvol

• Setting default volume topology
maprcli config save -values

maprcli config save -values
• CLDB only topology(1)
CLDB only nodes: mdw,sdw1
Other nodes: sdw3,sdw5,sdw7

2.Checking node id:
maprcli node list -columns id,hostname,"topo(rack)"

3.Move nodes to topology – “cldbonly”:
maprcli node move -serverids 4277269757083023248,3528082726925061986
  -topology /cldbonly

4.Move CLDB volume to topology – “cldbonly”:
maprcli volume move -name mapr.cldb.internal -topology /cldbonly
• CLDB only topology(2)
5.Move non-CLDB nodes to topology – “noncldb”:
maprcli node move -serverids
  5521777324064226112,3482126520576246764,4667932985226440135 -
  topology /noncldb

6.Move non-CLDB volumes to topology – “noncldb”:
maprcli volume move -name mapr.var -topology /noncldb
maprcli volume move -name viadeavol -topology /noncldb
maprcli volume move -name mapr.hbase -topology /noncldb
maprcli volume move -name mapr.jobtracker.volume -topology /noncldb
maprcli volume move -name mapr.cluster.root -topology /noncldb
• Local/Remote mirror
maprcli volume create -name viadeavol_mirror1 -source
  viadeavol@viadeacluster -path /viadeavol_mirror1 -type 1

maprcli volume create -name viadeavol_mirror2 -source
  viadeavol@viadeacluster -path /viadeavol_mirror2 -type 1

• Mirror Link
maprcli volume link create -volume viadeavol -type mirror -path
• Sync Mirrors using “push”
[root@mdw ~]# maprcli volume mirror push -name viadeavol
Starting mirroring of volume viadeavol_mirror2
Starting mirroring of volume viadeavol_mirror1
Mirroring complete for volume viadeavol_mirror1
Mirroring complete for volume viadeavol_mirror2
Successfully completed mirror push to all local mirrors of volume

• Sync Mirror using “start”
[root@mdw ~]# maprcli volume mirror start -full false -name
Started mirror operation for volume(s) 'viadeavol_mirror1'
• Stop mirror sync
[gpadmin@mdw viadea]$ maprcli volume mirror stop -name
Stopped mirror operation for 'viadeavol_mirror1
• Both mirror push and mirror start work the same way ... the destination of
  the mirror pulls the data. The difference is that mirror push is synchronous
  and the command will wait until the mirroring is complete, while mirror
  start is asynchronous and only kicks off the mirroring and returns
  immediately without waiting.
• mirror stop works in both situations.
• Create Schedule
maprcli schedule create -schedule '{"name":"Schedule-

• List Schedule
[root@mdw binary]# maprcli schedule list -output verbose
id name             inuse rules
1   Critical data   0      ...
2   Important data 0       ...
3   Normal data     1      ...
4   mirror_sync     1      ...
5   Schedule-1      0      ...
• Remove Schedule

• Modify Schedule
maprcli schedule modify -id 0 -name Newname -rules
• View snapshot of one volume
[gpadmin@mdw viadea]$ hadoop fs -ls /viadeavol_mirror2/.snapshot
Found 5 items
drwxrwxrwx    - root root         7 2012-02-24 18:58
drwxrwxrwx    - root root         8 2012-02-24 22:32
drwxrwxrwx    - root root        10 2012-02-25 10:44
drwxrwxrwx    - root root         9 2012-02-24 23:00
drwxrwxrwx    - root root         0 1970-01-01 08:00
• Create snapshot
maprcli volume snapshot create -snapshotname test-snapshot -volume

• List snapshot
maprcli volume snapshot list -volume viadeavol

• Remove snapshot
maprcli volume snapshot remove -snapshotname test-snapshotc3 -volume

• Preserve snapshot
maprcli volume snapshot preserve -snapshots 256000083
• Mount
1.List the NFS shares exported on the server:
[gpadmin@smdw ~]$ /usr/sbin/showmount           -e mdw
Export list for mdw:
/mapr                    *
/mapr/ViadeaCluster *

2.Using root to create the directory on smdw:
mkdir /mapr

3.Mount on smdw:
mount mdw:/mapr /mapr

4.Change /etc/fstab on smdw:
mdw:/mapr /mapr nfs rw 0 0
• Setting ChunkSize and Compression for a volume
[root@smdw viadeavol]# more .dfs_attributes
# lines beginning with # are treated as comments

[root@smdw viadeavol]# hadoop mfs -setchunksize 13107000 /viadeavol
setchunksize: chunksize should be a multiple of 64K
[root@smdw viadeavol]# hadoop mfs -setchunksize 13107200 /viadeavol
• Setting extension of compressed file
maprcli config save -values

[gpadmin@mdw viadea]$ maprcli config load -keys mapr.fs.nocompression
Managing Data
• Dump and Restore Volumes
1.Full dump:
maprcli volume dump create -e endstate -dumpfile fulldump1 -name
2.Do change to viadeavol
3.Incremental dump:
maprcli volume dump create -s endstate -e endstate2 -name viadeavol
   -dumpfile incrdump1
4.Full restore:
maprcli volume dump restore -name viadeavol_restore -dumpfile
  fulldump1 -n
6.Mount viadeavol_restore
7.Incremental restore
maprcli volume dump restore -name viadeavol_restore -dumpfile
Managing Data
• List Disks information
[root@mdw]# /opt/mapr/server/mrconfig disk list
ListDisks resp: status 0 count=1
guid 01C7E418-ACC6-4F15-D202-0141CCEE4E00
size 20480MB
ListDisks /data/hdpee/storagefile
        DG 0: Single SingleDisk50218 Online
        DG 1: Concat Concat12 Online
        SP 0: name SP1, Online, size 9874 MB, free 9379 MB, path

[root@mdw]# /opt/mapr/server/mrconfig sp list
ListSPs resp: status 0:1
No. of SPs (1), totalsize 9874 MB, totalfree 9379 MB

SP 0: name SP1, Online, size 9874 MB, free 9379 MB, path
Users and Groups
• List entity usage
[root@mdw]# maprcli entity list
DiskUsage EntityQuota EntityType EntityName    VolumeCount
    EntityAdvisoryquota EntityId EntityEmail
0           0            0           gpadmin   0             0
212         0            0           root      19            0
0           1048576      0           viadea    1             0
Users and Groups
• Cluster Permission
login(including cv): Log in to the Greenplum HD EE Control System, use the API and
    command-line interface, read access on cluster and volumes
ss:Start/stop services
cv:Create volumes
a:Admin access
fc:Full control (administrative access and permission to change the cluster ACL)
Users and Groups
• Volume Permission
dump:Dump the volume
restore:Mirror or restore the volume
m:Modify volume properties, create and delete snapshots
d:Delete a volume
fc:Full control (admin access and permission to change volume ACL)
Users and Groups
• List ACL
[root@mdw conf]# maprcli acl show -type cluster
Principal     Allowed actions
User root     [login, ss, cv, a, fc]
User gpadmin   [login, ss, cv, a, fc]

[root@mdw conf]# maprcli acl show -type volume -name viadeavol -user root
Principal Allowed actions
User root [dump, restore, m, d, fc]
Users and Groups
• Modify ACL for a user
maprcli acl edit -type cluster -user viadea:cv
maprcli acl edit -type cluster -user viadea:a
maprcli acl edit -type volume -name viadeavol -user viadea:m

• Modify ACL for a whole cluster or volume
maprcli acl set -type volume -name test-volume -user
   jsmith:dump,restore,m rjones:fc

• Setting volume quotum
maprcli volume modify -name viadeavol -quota 2G

• Setting entity quotum
maprcli entity modify -type 0 -name viadea -quota 1T
• Small Job(1)
  <description>Enable small job fast scheduling inside fair
  TaskTrackers should reserve a slot called ephemeral slot which
  is used for smalljob if cluster is busy.
• Small Job(2)
<!-- Small job definition. If a job does not satisfy any of following limits
 it is not considered as a small job and will be moved out of small job pool.
  <description>Small job definition. Max number of maps allowed in small job.

  <description>Small job definition. Max number of reducers allowed in small
    job. </description>
• Small Job(3)
  <description>Small job definition. Max input size in bytes allowed for a
    small job.
  Default is 10GB.
  <description>Small job definition.
  Max estimated input size for a reducer allowed in small job.
  Default is 1GB per reducer.
• Small Job(4)
  <description>Small job definition. Max memory in mbytes reserved
   for an ephermal slot.
  Default is 200mb. This value must be same on JobTracker and
   TaskTracker nodes.
• Memory for Greenplum HD EE Services
/opt/mapr/conf/warden.conf   #The percentage of heap space reserved for the
    TaskTracker.     #The maximum heap space that can be used by the
    TaskTracker.      #The minimum heap space for use by the TaskTracker.

[gpadmin@mdw viadea]$ cat /opt/mapr/conf/warden.conf|grep size|grep percent
• Memory for MapReduce
  <description> Maximum phyiscal memory tasktracker should reserve for
    mapreduce tasks.
  If tasks use more than the limit, task using maximum memory will be killed.
  Expert only: Set this value iff tasktracker should use a certain amount of
  for mapreduce tasks. In MapR Distro warden figures this number based
  on services configured on a node.
  Setting mapreduce.tasktracker.reserved.physicalmemory.mb to -1 will disable
  physical memory accounting and task management.
• Memory for MapReduce
Map tasks Memory
Map tasks use memory mainly in two ways:
The application consumes memory to run the map function.
The MapReduce framework uses an intermediate buffer to hold serialized (key, value) pairs.

Buffer used to hold map outputs in memory before writing final map
Setting this value very low may cause spills. By default if left
   empty value is set to 50% of heapsize for map.
If a average input to map is "MapIn" bytes then typically value of
   io.sort.mb should be '1.25 times MapIn' bytes.
• Memory for MapReduce
Reduce tasks Memory
Java opts for the reduce tasks. Default heapsize(-Xmx) is determined
   by memory reserved for mapreduce at tasktracker.
Reduce task is given more memory than map task.
Default memory for a reduce task = (Total Memory reserved for
   mapreduce) * (2*#reduceslots / (#mapslots + 2*#reduceslots))
• Tasks number(1)
Map slots should be based on how many map tasks can fit in memory,
   and reduce slots should be based on the number of CPUs (CPUS > 2) ? (CPUS * 0.75) : 1
   (At least one Map slot, up to 0.75 times the number of CPUs)
mapred.tasktracker.reduce.tasks.maximum: (CPUS > 2) ? (CPUS * 0.50)
   : 1 (At least one Map slot, up to 0.50 times the number of CPUs)

variables in formula:
CPUS - number of CPUs present on the node
DISKS - number of disks present on the node
MEM - memory reserved for MapReduce tasks
• Tasks number(2)
How many map tasks should be scheduled in-advance on a
To be given in % of map slots. Default is 1.0 which
  means number of tasks overscheduled = total map
  slots on TT.
• Final&Important : What needs to collect???

/opt/mapr/support/tools/ -n support-output.txt

[root@mdw collect]# ls -altr /opt/mapr/support/collect/support-output.txt.tar
-rw-r--r-- 1 root root 27607040 Mar 1 22:34 /opt/mapr/support/collect/support-
• What are in the support dump file??
1.“cluster” Directory
2. Directory for each node
•   [root@mdw support-output.txt]# ls -altr
•   total 32
•   drwxr-xr-x 3 root root 4096 Mar 1 22:19   cluster
•   drwxr-xr-x 8 root root 4096 Mar 1 22:24   .
•   drwxr-xr-x 5 root root 4096 Mar 1 22:33
•   drwxr-xr-x 2 root root 4096 Mar 1 22:34
•   drwxr-xr-x 2 root root 4096 Mar 1 22:34
•   drwxr-xr-x 2 root root 4096 Mar 1 22:34
•   drwxr-xr-x 2 root root 4096 Mar 1 22:34
•   drwxr-xr-x 4 root root 4096 Mar 1 22:36   ..
• What are in the “cluster” directory?

[root@mdw   cluster]# cat cluster.txt|grep Output
Output of   /opt/mapr/bin/maprcli node list -json
Output of   /opt/mapr/bin/maprcli node topo -json
Output of   /opt/mapr/bin/maprcli node heatmap -view status -json
Output of   /opt/mapr/bin/maprcli volume list -json
Output of   /opt/mapr/bin/maprcli dump zkinfo -json
Output of   /opt/mapr/bin/maprcli config load -json
Output of   /opt/mapr/bin/maprcli alarm list –json
• What are in the “node” directory?(1)
“conf” subdirectory: roles, all conf files, disk info,and some other OS
“logs” subdirectory:all logs, /var/log/message,some mapr status logs.
[root@mdw   logs]# cat mfsState.txt|grep Output
Output of   /opt/mapr/server/mrconfig -p 5660 info threads
Output of   /opt/mapr/server/mrconfig -p 5660 info containers resync local
Output of   /opt/mapr/bin/maprcli trace dump -port 5660
Output of   /opt/mapr/bin/maprcli dump fileserverworkinfo -fileserverip

“pam.d” subdirectory
• What are in the “node” directory?(2)
sysinfo.txt : some output of OS commands
[gpadmin@mdw]$ cat sysinfo.txt|grep Output
Output of lscpu
Output of ifconfig -a
Output of uname -a
Output of netstat -an
Output of netstat -rn
Output of hostname
Output of cat /etc/hostname

Más contenido relacionado

La actualidad más candente

Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Databricks
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distributionmcsrivas
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephScyllaDB
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...Databricks Formation Big Data & Hadoop : Le Guide Complet Formation Big Data & Hadoop : Le Guide Formation Big Data & Hadoop : Le Guide Complet Formation Big Data & Hadoop : Le Guide CompletAlphorm
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideWhizlabs
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overviewDataArt
Inside Parquet Format
Inside Parquet FormatInside Parquet Format
Inside Parquet FormatYue Chen
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...Databricks
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
Ganglia Monitoring Tool
Ganglia Monitoring ToolGanglia Monitoring Tool
Ganglia Monitoring Toolsudhirpg
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInMagnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInDatabricks
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab

La actualidad más candente (20)

Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distribution
Map reduce vs spark
Map reduce vs sparkMap reduce vs spark
Map reduce vs spark
Hive tuning
Hive tuningHive tuning
Hive tuning
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Formation Big Data & Hadoop : Le Guide Complet Formation Big Data & Hadoop : Le Guide Formation Big Data & Hadoop : Le Guide Complet Formation Big Data & Hadoop : Le Guide Complet
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive Guide
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
Inside Parquet Format
Inside Parquet FormatInside Parquet Format
Inside Parquet Format
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Ganglia Monitoring Tool
Ganglia Monitoring ToolGanglia Monitoring Tool
Ganglia Monitoring Tool
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInMagnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
Redis 101
Redis 101Redis 101
Redis 101
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab


AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)
AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)
AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)Amazon Web Services
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkDatabricks
MapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APIMapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APImcsrivas
MapR Data Analyst
MapR Data AnalystMapR Data Analyst
MapR Data Analystselvaraaju
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterDatabricks
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark InternalsPietro Michiardi

Destacado (11)

Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)
AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)
AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
MapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APIMapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase API
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
MapR Data Analyst
MapR Data AnalystMapR Data Analyst
MapR Data Analyst
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark Internals
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture

Similar a Hands on MapR -- Viadea

SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-DeviceSUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-DeviceSUSE
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosHeiko Loewe
My SQL Portal Database (Cluster)
My SQL Portal Database (Cluster)My SQL Portal Database (Cluster)
My SQL Portal Database (Cluster)Nicholas Adu Gyamfi
Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Clusterpercona2013
Orchestrating Redis & K8s Operators
Orchestrating Redis & K8s OperatorsOrchestrating Redis & K8s Operators
Orchestrating Redis & K8s OperatorsDoiT International
brief introduction of drbd in SLE12SP2
brief introduction of drbd in SLE12SP2brief introduction of drbd in SLE12SP2
brief introduction of drbd in SLE12SP2Nick Wang
Upgrade 11gR2 to 12cR1 Clusterware
Upgrade 11gR2 to 12cR1 ClusterwareUpgrade 11gR2 to 12cR1 Clusterware
Upgrade 11gR2 to 12cR1 ClusterwareNikhil Kumar
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterAndrey Kudryavtsev
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephRongze Zhu
Distributed replicated block device
Distributed replicated block deviceDistributed replicated block device
Distributed replicated block deviceChanaka Lasantha
Oracle sharding : Installation & Configuration
Oracle sharding : Installation & ConfigurationOracle sharding : Installation & Configuration
Oracle sharding : Installation & Configurationsuresh gandhi
MySQL Galera 集群
MySQL Galera 集群MySQL Galera 集群
Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Clusterpercona2013
High Availability Storage (susecon2016)
High Availability Storage (susecon2016)High Availability Storage (susecon2016)
High Availability Storage (susecon2016)Roger Zhou 周志强
Redis Meetup TLV - K8s Session 28/10/2018
Redis Meetup TLV - K8s Session 28/10/2018Redis Meetup TLV - K8s Session 28/10/2018
Redis Meetup TLV - K8s Session 28/10/2018Danni Moiseyev
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Anne Nicolas
SAOUG - Connect 2014 - Flex Cluster and Flex ASM
SAOUG - Connect 2014 - Flex Cluster and Flex ASMSAOUG - Connect 2014 - Flex Cluster and Flex ASM
SAOUG - Connect 2014 - Flex Cluster and Flex ASMAlex Zaballa

Similar a Hands on MapR -- Viadea (20)

SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-DeviceSUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
My SQL Portal Database (Cluster)
My SQL Portal Database (Cluster)My SQL Portal Database (Cluster)
My SQL Portal Database (Cluster)
Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Cluster
Orchestrating Redis & K8s Operators
Orchestrating Redis & K8s OperatorsOrchestrating Redis & K8s Operators
Orchestrating Redis & K8s Operators
brief introduction of drbd in SLE12SP2
brief introduction of drbd in SLE12SP2brief introduction of drbd in SLE12SP2
brief introduction of drbd in SLE12SP2
Upgrade 11gR2 to 12cR1 Clusterware
Upgrade 11gR2 to 12cR1 ClusterwareUpgrade 11gR2 to 12cR1 Clusterware
Upgrade 11gR2 to 12cR1 Clusterware
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Distributed replicated block device
Distributed replicated block deviceDistributed replicated block device
Distributed replicated block device
Oracle sharding : Installation & Configuration
Oracle sharding : Installation & ConfigurationOracle sharding : Installation & Configuration
Oracle sharding : Installation & Configuration
MySQL Galera 集群
MySQL Galera 集群MySQL Galera 集群
MySQL Galera 集群
Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Cluster
High Availability Storage (susecon2016)
High Availability Storage (susecon2016)High Availability Storage (susecon2016)
High Availability Storage (susecon2016)
Redis Meetup TLV - K8s Session 28/10/2018
Redis Meetup TLV - K8s Session 28/10/2018Redis Meetup TLV - K8s Session 28/10/2018
Redis Meetup TLV - K8s Session 28/10/2018
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
SAOUG - Connect 2014 - Flex Cluster and Flex ASM
SAOUG - Connect 2014 - Flex Cluster and Flex ASMSAOUG - Connect 2014 - Flex Cluster and Flex ASM
SAOUG - Connect 2014 - Flex Cluster and Flex ASM
MySQL Cluster Basics
MySQL Cluster BasicsMySQL Cluster Basics
MySQL Cluster Basics


Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825

Último (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx

Hands on MapR -- Viadea

  • 1. Hands On MapR CLI only, no GUI☺ Viadea Zhu March. 2012
  • 2. Agenda • MapR Architecture • Cluster Management • Volume • Mirror • Schedule • Snapshot • NFS • Managing Data • Users and Groups • Troubleshooting and Performance tunning
  • 3. MapR Architecture • Basic Services – CLDB – FileServer – Jobtracker – Tasktracker – Zookeeper – NFS – WebServer • warden A process called the warden runs on all nodes to manage, monitor, and report on the other services on each node. The warden will not start any services unless ZooKeeper is reachable and more than half of the configured ZooKeeper nodes are live.
  • 4. Cluster Management • Bring up cluster: 1.Start ZooKeeper on all nodes where it is installed, by issuing the following command /etc/init.d/mapr-zookeeper start 2.On one of the CLDB nodes and the node running the mapr-webserver service, start the warden: /etc/init.d/mapr-warden start
  • 5. Cluster Management • Stop cluster(1): 1. Determine which nodes are running the NFS gateway. [root@mdw]# /opt/mapr/bin/maprcli node list -filter "[rp==/*]and[svc==nfs]" -columns id,h,hn,svc, rp id service hostname health ip 4277269757083023248 tasktracker,webserver,cldb,fileserver,nfs,hoststats,jobtracker mdw 2,,, 3528082726925061986 tasktracker,fileserver,nfs,hoststats sdw1 2,, 5521777324064226112 fileserver,tasktracker,nfs,hoststats sdw3 0,, 3482126520576246764 fileserver,tasktracker,nfs,hoststats sdw5 0,, 4667932985226440135 fileserver,tasktracker,nfs,hoststats sdw7 0,,
  • 6. Cluster Management • Stop cluster(2): 2. Determine which nodes are running the CLDB. [root@mdw]# /opt/mapr/bin/maprcli node list -filter "[rp==/*]and[svc==cldb]" -columns id,h,hn,svc, rp id service hostname health ip 4277269757083023248 tasktracker,webserver,cldb,fileserver,nfs,hoststats,jobtracker mdw 2,,,
  • 7. Cluster Management • Stop cluster(3): 3. List all non-CLDB nodes. [root@mdw]# /opt/mapr/bin/maprcli node list -filter "[rp==/*]and[svc!=cldb]" -columns id,h,hn,svc, rp id service hostname health ip 3528082726925061986 tasktracker,fileserver,nfs,hoststats sdw1 2,, 5521777324064226112 fileserver,tasktracker,nfs,hoststats sdw3 0,, 3482126520576246764 fileserver,tasktracker,nfs,hoststats sdw5 0,, 4667932985226440135 fileserver,tasktracker,nfs,hoststats sdw7 0,,
  • 8. Cluster Management • Stop cluster(4): 4. Shut down all NFS instances. /opt/mapr/bin/maprcli node services -nfs stop -nodes mdw sdw1 sdw3 sdw5 sdw7 5. SSH into each CLDB node and stop the warden. /etc/init.d/mapr-warden stop 6. SSH into each of the remaining nodes and stop the warden. /etc/init.d/mapr-warden stop 7. Stop the zookeeper on zookeeper node(s). /etc/init.d/mapr-zookeeper stop
  • 9. Cluster Management • Restart Webserver: /opt/mapr/adminuiapp/webserver stop /opt/mapr/adminuiapp/webserver start • Restart Services: (eg, tasktracker) maprcli node services -nodes mdw -tasktracker stop maprcli node services -nodes mdw -tasktracker start • Grant full permission to chosen administrator OS user /opt/mapr/bin/maprcli acl edit -type cluster -user <user>:fc
  • 10. Cluster Management • Alarm Email maprcli alarm config save -values "AE_ALARM_AEQUOTA_EXCEEDED,1," maprcli alarm config save -values "NODE_ALARM_CORE_PRESENT,1,“ • List Alarm [gpadmin@mdw]$ maprcli alarm list -type cluster alarm state description entity alarm name alarm statechange time 1 One or more licenses is about to expire within 28 days CLUSTER CLUSTER_ALARM_LICENSE_NEAR_EXPIRATION 1330171978541 [gpadmin@mdw]$ maprcli alarm list -type node alarm state description entity alarm name alarm statechange time 1 Can not determine if service: cldb is running. Check logs at: /opt/mapr/logs/cldb.log sdw1 NODE_ALARM_SERVICE_CLDB_DOWN 1324274386763 1 Node has core file(s) mdw NODE_ALARM_CORE_PRESENT 1330145172579
  • 11. Cluster Management • List Nodes maprcli node list -columns id,h,hn,br,da,dtotal,dused,davail,fs-heartbeat maprcli node list -columns id,br,fs-heartbeat,jt-heartbeat • Remove Nodes Take sdw5 for example: 1. Stop warden on sdw5: /etc/init.d/mapr-warden stop 2. Remove on CLDB node: maprcli node remove -nodes sdw5 -zkconnect sdw1:5181
  • 12. Cluster Management • Reformat a node Take sdw5 for example: 1. Stop warden: /etc/init.d/mapr-warden stop 2. Remove the disktab file: rm /opt/mapr/conf/disktab 3. Create a text file /tmp/disks.txt that lists all the disks and partitions to format for use by Greenplum HD EE. [root@sdw5 ~]# cat /tmp/disks.txt /data2/hdpee/storagefile 4. Use disksetup to re-format the disks: /opt/mapr/server/disksetup -F /tmp/disks.txt 5. Start the Warden: /etc/init.d/mapr-warden start
  • 13. Cluster Management • Add a new node /opt/mapr/server/ -C mdw -Z sdw1 -N ViadeaCluster /opt/mapr/server/disksetup -F /tmp/disks.txt /etc/init.d/mapr-warden start
  • 14. Volume • Turnoff compression [root@mdw ~]# hadoop mfs -ls|grep var drwxrwxrwx Z - root root 1 2011-12-19 13:52 268435456 /var [root@mdw ~]# hadoop mfs -setcompression off /var [root@mdw ~]# hadoop mfs -ls|grep var drwxrwxrwx U - root root 1 2011-12-19 13:52 268435456 /var • Create volume maprcli volume create -name viadeavol -path /viadeavol -quota 1G - advisoryquota 200M maprcli volume create -name viadeavol.mirror -source viadeavol@viadeacluster -path /viadeavol_mirror -type 1
  • 15. Volume • List Volume maprcli volume list -columns volumeid,volumetype,volumename,mountdir,mounted,aename,quota,used, totalused,actualreplication,rackpath • Viewing volume properties maprcli volume info -name viadeavol maprcli volume info -output terse -name viadeavol • Modify volume maprcli volume modify -name viadeavol.mirror -source viadeavol
  • 16. Volume • Mount/Unmount Volume maprcli volume unmount -name viadeavol maprcli volume mount -name viadeavol • Remove volume maprcli volume remove -name testvol • Setting default volume topology maprcli config save -values "{"cldb.default.volume.topology":"/default-rack"}" maprcli config save -values "{"cldb.default.volume.topology":"/"}"
  • 17. Volume • CLDB only topology(1) 1.Planning: CLDB only nodes: mdw,sdw1 Other nodes: sdw3,sdw5,sdw7 2.Checking node id: maprcli node list -columns id,hostname,"topo(rack)" 3.Move nodes to topology – “cldbonly”: maprcli node move -serverids 4277269757083023248,3528082726925061986 -topology /cldbonly 4.Move CLDB volume to topology – “cldbonly”: maprcli volume move -name mapr.cldb.internal -topology /cldbonly
  • 18. Volume • CLDB only topology(2) 5.Move non-CLDB nodes to topology – “noncldb”: maprcli node move -serverids 5521777324064226112,3482126520576246764,4667932985226440135 - topology /noncldb 6.Move non-CLDB volumes to topology – “noncldb”: maprcli volume move -name mapr.var -topology /noncldb maprcli volume move -name viadeavol -topology /noncldb maprcli volume move -name mapr.hbase -topology /noncldb maprcli volume move -name mapr.jobtracker.volume -topology /noncldb maprcli volume move -name mapr.cluster.root -topology /noncldb
  • 19. Mirror • Local/Remote mirror maprcli volume create -name viadeavol_mirror1 -source viadeavol@viadeacluster -path /viadeavol_mirror1 -type 1 maprcli volume create -name viadeavol_mirror2 -source viadeavol@viadeacluster -path /viadeavol_mirror2 -type 1 • Mirror Link maprcli volume link create -volume viadeavol -type mirror -path /maprfs::mirror::viadeavol
  • 20. Mirror • Sync Mirrors using “push” [root@mdw ~]# maprcli volume mirror push -name viadeavol Starting mirroring of volume viadeavol_mirror2 Starting mirroring of volume viadeavol_mirror1 Mirroring complete for volume viadeavol_mirror1 Mirroring complete for volume viadeavol_mirror2 Successfully completed mirror push to all local mirrors of volume viadeavol • Sync Mirror using “start” [root@mdw ~]# maprcli volume mirror start -full false -name viadeavol_mirror1 messages Started mirror operation for volume(s) 'viadeavol_mirror1'
  • 21. Mirror • Stop mirror sync [gpadmin@mdw viadea]$ maprcli volume mirror stop -name viadeavol_mirror1 messages Stopped mirror operation for 'viadeavol_mirror1 Answer: • Both mirror push and mirror start work the same way ... the destination of the mirror pulls the data. The difference is that mirror push is synchronous and the command will wait until the mirroring is complete, while mirror start is asynchronous and only kicks off the mirroring and returns immediately without waiting. • mirror stop works in both situations.
  • 22. Schedule • Create Schedule maprcli schedule create -schedule '{"name":"Schedule- 1","rules":[{"frequency":"once","retain":"1w","time":13,"date":"12 /5/2010"}]}' • List Schedule [root@mdw binary]# maprcli schedule list -output verbose id name inuse rules 1 Critical data 0 ... 2 Important data 0 ... 3 Normal data 1 ... 4 mirror_sync 1 ... 5 Schedule-1 0 ...
  • 23. Schedule • Remove Schedule • Modify Schedule maprcli schedule modify -id 0 -name Newname -rules '[{"frequency":"weekly","date":"sun","time":7,"retain":"2w"},{"fre quency":"daily","time":14,"retain":"1w"}]'
  • 24. Snapshot • View snapshot of one volume [gpadmin@mdw viadea]$ hadoop fs -ls /viadeavol_mirror2/.snapshot Found 5 items drwxrwxrwx - root root 7 2012-02-24 18:58 /viadeavol_mirror2/.snapshot/viadeavol_mirror2.mirrorsnap.24-Feb-2012-22-35-51 drwxrwxrwx - root root 8 2012-02-24 22:32 /viadeavol_mirror2/.snapshot/viadeavol_mirror2.mirrorsnap.25-Feb-2012-01-48-25 drwxrwxrwx - root root 10 2012-02-25 10:44 /viadeavol_mirror2/.snapshot/viadeavol_mirror2.mirrorsnap.25-Feb-2012-12-05-43 drwxrwxrwx - root root 9 2012-02-24 23:00 /viadeavol_mirror2/.snapshot/viadeavol_mirror2.mirrorsnap.25-Feb-2012-11-09-49 drwxrwxrwx - root root 0 1970-01-01 08:00 /viadeavol_mirror2/.snapshot/viadeavol_mirror2.mirrorsnap.24-Feb-2012-22-26-18
  • 25. Snapshot • Create snapshot maprcli volume snapshot create -snapshotname test-snapshot -volume viadeavol • List snapshot maprcli volume snapshot list -volume viadeavol • Remove snapshot maprcli volume snapshot remove -snapshotname test-snapshotc3 -volume viadeavol • Preserve snapshot maprcli volume snapshot preserve -snapshots 256000083
  • 26. NFS • Mount 1.List the NFS shares exported on the server: [gpadmin@smdw ~]$ /usr/sbin/showmount -e mdw Export list for mdw: /mapr * /mapr/ViadeaCluster * 2.Using root to create the directory on smdw: mkdir /mapr 3.Mount on smdw: mount mdw:/mapr /mapr 4.Change /etc/fstab on smdw: mdw:/mapr /mapr nfs rw 0 0
  • 27. NFS • Setting ChunkSize and Compression for a volume [root@smdw viadeavol]# more .dfs_attributes # lines beginning with # are treated as comments Compression=true ChunkSize=268435456 [root@smdw viadeavol]# hadoop mfs -setchunksize 13107000 /viadeavol setchunksize: chunksize should be a multiple of 64K [root@smdw viadeavol]# hadoop mfs -setchunksize 13107200 /viadeavol
  • 28. NFS • Setting extension of compressed file maprcli config save -values {"mapr.fs.nocompression":"bz2,gz,tgz,tbz2,zip,z,Z,mp3,jpg,jpeg,mpg ,mpeg,avi,gif,png"} [gpadmin@mdw viadea]$ maprcli config load -keys mapr.fs.nocompression mapr.fs.nocompression bz2,gz,tgz,tbz2,zip,z,Z,mp3,jpg,jpeg,mpg,mpeg,avi,gif,png
  • 29. Managing Data • Dump and Restore Volumes 1.Full dump: maprcli volume dump create -e endstate -dumpfile fulldump1 -name viadeavol 2.Do change to viadeavol 3.Incremental dump: maprcli volume dump create -s endstate -e endstate2 -name viadeavol -dumpfile incrdump1 4.Full restore: maprcli volume dump restore -name viadeavol_restore -dumpfile fulldump1 -n 6.Mount viadeavol_restore 7.Incremental restore maprcli volume dump restore -name viadeavol_restore -dumpfile incrdump1
  • 30. Managing Data • List Disks information [root@mdw]# /opt/mapr/server/mrconfig disk list ListDisks resp: status 0 count=1 guid 01C7E418-ACC6-4F15-D202-0141CCEE4E00 size 20480MB ListDisks /data/hdpee/storagefile DG 0: Single SingleDisk50218 Online DG 1: Concat Concat12 Online SP 0: name SP1, Online, size 9874 MB, free 9379 MB, path /data/hdpee/storagefile [root@mdw]# /opt/mapr/server/mrconfig sp list ListSPs resp: status 0:1 No. of SPs (1), totalsize 9874 MB, totalfree 9379 MB SP 0: name SP1, Online, size 9874 MB, free 9379 MB, path /data/hdpee/storagefile
  • 31. Users and Groups • List entity usage [root@mdw]# maprcli entity list DiskUsage EntityQuota EntityType EntityName VolumeCount EntityAdvisoryquota EntityId EntityEmail 0 0 0 gpadmin 0 0 500 212 0 0 root 19 0 0 0 1048576 0 viadea 1 0 666
  • 32. Users and Groups • Cluster Permission login(including cv): Log in to the Greenplum HD EE Control System, use the API and command-line interface, read access on cluster and volumes ss:Start/stop services cv:Create volumes a:Admin access fc:Full control (administrative access and permission to change the cluster ACL)
  • 33. Users and Groups • Volume Permission dump:Dump the volume restore:Mirror or restore the volume m:Modify volume properties, create and delete snapshots d:Delete a volume fc:Full control (admin access and permission to change volume ACL)
  • 34. Users and Groups • List ACL [root@mdw conf]# maprcli acl show -type cluster Principal Allowed actions User root [login, ss, cv, a, fc] User gpadmin [login, ss, cv, a, fc] [root@mdw conf]# maprcli acl show -type volume -name viadeavol -user root Principal Allowed actions User root [dump, restore, m, d, fc]
  • 35. Users and Groups • Modify ACL for a user maprcli acl edit -type cluster -user viadea:cv maprcli acl edit -type cluster -user viadea:a maprcli acl edit -type volume -name viadeavol -user viadea:m • Modify ACL for a whole cluster or volume maprcli acl set -type volume -name test-volume -user jsmith:dump,restore,m rjones:fc • Setting volume quotum maprcli volume modify -name viadeavol -quota 2G • Setting entity quotum maprcli entity modify -type 0 -name viadea -quota 1T
  • 36. Troubleshooting&Performance Tunning • Small Job(1) mapred-site.xml: <property> <name>mapred.fairscheduler.smalljob.schedule.enable</name> <value>true</value> <description>Enable small job fast scheduling inside fair scheduler. TaskTrackers should reserve a slot called ephemeral slot which is used for smalljob if cluster is busy. </description> </property>
  • 37. Troubleshooting&Performance Tunning • Small Job(2) <!-- Small job definition. If a job does not satisfy any of following limits it is not considered as a small job and will be moved out of small job pool. --> <property> <name>mapred.fairscheduler.smalljob.max.maps</name> <value>10</value> <description>Small job definition. Max number of maps allowed in small job. </description> </property> <property> <name>mapred.fairscheduler.smalljob.max.reducers</name> <value>10</value> <description>Small job definition. Max number of reducers allowed in small job. </description> </property>
  • 38. Troubleshooting&Performance Tunning • Small Job(3) <property> <name>mapred.fairscheduler.smalljob.max.inputsize</name> <value>10737418240</value> <description>Small job definition. Max input size in bytes allowed for a small job. Default is 10GB. </description> </property> <property> <name>mapred.fairscheduler.smalljob.max.reducer.inputsize</name> <value>1073741824</value> <description>Small job definition. Max estimated input size for a reducer allowed in small job. Default is 1GB per reducer. </description> </property>
  • 39. Troubleshooting&Performance Tunning • Small Job(4) <property> <name>mapred.cluster.ephemeral.tasks.memory.limit.mb</name> <value>200</value> <description>Small job definition. Max memory in mbytes reserved for an ephermal slot. Default is 200mb. This value must be same on JobTracker and TaskTracker nodes. </description> </property>
  • 40. Troubleshooting&Performance Tunning • Memory for Greenplum HD EE Services /opt/mapr/conf/warden.conf #The percentage of heap space reserved for the TaskTracker. #The maximum heap space that can be used by the TaskTracker. #The minimum heap space for use by the TaskTracker. [gpadmin@mdw viadea]$ cat /opt/mapr/conf/warden.conf|grep size|grep percent service.command.jt.heapsize.percent=10 service.command.hbmaster.heapsize.percent=4 service.command.hbregion.heapsize.percent=25 service.command.cldb.heapsize.percent=8 service.command.mfs.heapsize.percent=20 service.command.webserver.heapsize.percent=3 service.command.os.heapsize.percent=3
  • 41. Troubleshooting&Performance Tunning • Memory for MapReduce /opt/mapr/hadoop/hadoop-0.20.2/conf/mapred-site.xml <property> <name>mapreduce.tasktracker.reserved.physicalmemory.mb</name> <value></value> <description> Maximum phyiscal memory tasktracker should reserve for mapreduce tasks. If tasks use more than the limit, task using maximum memory will be killed. Expert only: Set this value iff tasktracker should use a certain amount of memory for mapreduce tasks. In MapR Distro warden figures this number based on services configured on a node. Setting mapreduce.tasktracker.reserved.physicalmemory.mb to -1 will disable physical memory accounting and task management. </description> </property>
  • 42. Troubleshooting&Performance Tunning • Memory for MapReduce Map tasks Memory Map tasks use memory mainly in two ways: The application consumes memory to run the map function. The MapReduce framework uses an intermediate buffer to hold serialized (key, value) pairs. (io.sort.mb) /opt/mapr/hadoop/hadoop-0.20.2/conf/mapred-site.xml io.sort.mb Buffer used to hold map outputs in memory before writing final map outputs. Setting this value very low may cause spills. By default if left empty value is set to 50% of heapsize for map. If a average input to map is "MapIn" bytes then typically value of io.sort.mb should be '1.25 times MapIn' bytes.
  • 43. Troubleshooting&Performance Tunning • Memory for MapReduce Reduce tasks Memory Java opts for the reduce tasks. Default heapsize(-Xmx) is determined by memory reserved for mapreduce at tasktracker. Reduce task is given more memory than map task. Default memory for a reduce task = (Total Memory reserved for mapreduce) * (2*#reduceslots / (#mapslots + 2*#reduceslots))
  • 44. Troubleshooting&Performance Tunning • Tasks number(1) Map slots should be based on how many map tasks can fit in memory, and reduce slots should be based on the number of CPUs (CPUS > 2) ? (CPUS * 0.75) : 1 (At least one Map slot, up to 0.75 times the number of CPUs) mapred.tasktracker.reduce.tasks.maximum: (CPUS > 2) ? (CPUS * 0.50) : 1 (At least one Map slot, up to 0.50 times the number of CPUs) variables in formula: CPUS - number of CPUs present on the node DISKS - number of disks present on the node MEM - memory reserved for MapReduce tasks
  • 45. Troubleshooting&Performance Tunning • Tasks number(2) mapreduce.tasktracker.prefetch.maptasks How many map tasks should be scheduled in-advance on a tasktracker. To be given in % of map slots. Default is 1.0 which means number of tasks overscheduled = total map slots on TT.
  • 46. Troubleshooting&Performance Tunning • Final&Important : What needs to collect??? /opt/mapr/support/tools/ -n support-output.txt [root@mdw collect]# ls -altr /opt/mapr/support/collect/support-output.txt.tar -rw-r--r-- 1 root root 27607040 Mar 1 22:34 /opt/mapr/support/collect/support- output.txt.tar
  • 47. Troubleshooting&Performance Tunning • What are in the support dump file?? 1.“cluster” Directory 2. Directory for each node • [root@mdw support-output.txt]# ls -altr • total 32 • drwxr-xr-x 3 root root 4096 Mar 1 22:19 cluster • drwxr-xr-x 8 root root 4096 Mar 1 22:24 . • drwxr-xr-x 5 root root 4096 Mar 1 22:33 • drwxr-xr-x 2 root root 4096 Mar 1 22:34 • drwxr-xr-x 2 root root 4096 Mar 1 22:34 • drwxr-xr-x 2 root root 4096 Mar 1 22:34 • drwxr-xr-x 2 root root 4096 Mar 1 22:34 • drwxr-xr-x 4 root root 4096 Mar 1 22:36 ..
  • 48. Troubleshooting&Performance Tunning • What are in the “cluster” directory? [root@mdw cluster]# cat cluster.txt|grep Output Output of /opt/mapr/bin/maprcli node list -json Output of /opt/mapr/bin/maprcli node topo -json Output of /opt/mapr/bin/maprcli node heatmap -view status -json Output of /opt/mapr/bin/maprcli volume list -json Output of /opt/mapr/bin/maprcli dump zkinfo -json Output of /opt/mapr/bin/maprcli config load -json Output of /opt/mapr/bin/maprcli alarm list –json (…)
  • 49. Troubleshooting&Performance Tunning • What are in the “node” directory?(1) “conf” subdirectory: roles, all conf files, disk info,and some other OS commands. “logs” subdirectory:all logs, /var/log/message,some mapr status logs. [root@mdw logs]# cat mfsState.txt|grep Output Output of /opt/mapr/server/mrconfig -p 5660 info threads Output of /opt/mapr/server/mrconfig -p 5660 info containers resync local Output of /opt/mapr/bin/maprcli trace dump -port 5660 Output of /opt/mapr/bin/maprcli dump fileserverworkinfo -fileserverip “pam.d” subdirectory
  • 50. Troubleshooting&Performance Tunning • What are in the “node” directory?(2) MapRBuildVersion redhat-release secure.log sysinfo.txt : some output of OS commands [gpadmin@mdw]$ cat sysinfo.txt|grep Output Output of lscpu Output of ifconfig -a Output of uname -a Output of netstat -an Output of netstat -rn Output of hostname Output of cat /etc/hostname (…)