SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Red Hat Storage Cluster
 (GFS, CLVM, GNBD)
  GFS allows multiple nodes to share
    storage at a block level as if the
storage were connected locally to each
             cluster node.


           Schubert Zhang, Guangxian Liao
                      Jul, 2008

                                            1
Deployment
  GFS node                          GFS node                           GFS node

 Applications                      Applications                       Applications

     GFS                               GFS                                GFS
   /mnt/gfs                          /mnt/gfs                           /mnt/gfs
 CLVM/LVM                          CLVM/LVM                           CLVM/LVM
    vg-lv                             vg-lv                             vg-lv

 GNBD Client                       GNBD Client                        GNBD Client
     import                            import                             import
  /dev/gnbd/x                       /dev/gnbd/x                        /dev/gnbd/x




                                  TCP/IP Network
                                                                            Storage Level


                  GNBD Server                       GNBD Server
                    export                            export


                  Block Devices                     Block Devices
                     Disks...                          Disks...


                GNBD Server Node                   GNBD Server Node




“Economy and Performance” deployment.
                                                                                            2
Architecture




• Pay attention to the relationship of cluster software
  components.
   –   Cluster Infrastructure (Common): CMAN, DLM, CCS, Fencing
   –   GNBD: Server and Client
   –   CLVM
   –   GFS

                                                                  3
Cluster Infrastructure
•   We only concern cluster type for storage.
•   Necessary infrastructure components:
     – Cluster management (CMAN)
         •   It’s a distributed cluster manager and runs in each cluster node.
         •   Keeps track of cluster quorum, avoids "split-brain". (Like chubby’s Paxos Protocol?)
         •   Keeps track of membership.
         •   libcman.so library, cman_tool
         •   dlm_controld: started by cman init script to manage dlm in kernel
         •   gfs_controld : started by cman init script to manage gfs in kernel
         •   groupd: started by cman init script to interface between openais/cman and
             dlm_controld/gfs_controld/fenced , group_tool
     – Lock management (DLM)
         • To synchronize access to shared resources (shared storage, etc.).
         • Runs in each cluster node.
         • libdlm.so library.
         • GFS and CLVM use locks from DLM.
         • GFS uses locks from the lock manager to synchronize access to file system metadata
           (on shared storage).
         • CLVM uses locks from the lock manager to synchronize updates to LVM volumes and
           volume groups (also on shared storage).
         • Like chubby’s lock function?

                                                                                                    4
Cluster Infrastructure (cont.)
– Cluster configuration management (CCS)
    • Runs in each cluster node. (ccsd)
    • Synchronization/up-to-date of cluster configuration file. Propagates
      modification in a cluster.
    • Other components (eg. CMAN) access configuration info via CCS.
    • /etc/cluster/cluster.conf
         (Cluster Name, Cluster nodes, Fence, Resources, etc.)
    • ccs_tool: to make online updates of CCS configuration files
    • ccs_test: to retrieve information from configuration files through ccsd.
– Fencing
    • Fencing is the disconnection of a node from the cluster's shared storage.
      Fencing cuts off I/O from shared storage, thus ensuring data integrity. The
      cluster infrastructure performs fencing through the fence daemon, fenced.
    • GNBD fencing: fence_gnbd?
    • fence_tool
    • cluster configuration file: fencing-method, fencing agent, fencing device for
      each node in the cluster.



                                                                                      5
GNBD
• An ancillary component of GFS that exports
  block-level storage to Ethernet.
• Global Network Block Device
  – GNBD provides block-device access for Red Hat GFS
    over TCP/IP. GNBD is similar in concept to NBD;
  – GNBD is GFS-specific and tuned solely for use with
    GFS.
• Two major components
  – GNBD Server
  – GNBD Client
                                                     6
GNBD (server)
• Exports block-level storage from its local
  storage.
• gnbd_serv process
  – GNBD server need not join the cluster
    manager.
• gnbd_export
  – Export block devices.


                                               7
GNBD (client)
• A GNBD client runs in a node with GFS.
• Imports block device exported by GNBD server.
• Multiple GNBD clients can access a device
  exported by a GNBD server, thus making a
  GNBD suitable for use by a group of nodes
  running GFS.
• gnbd.ko
  – A kernel module
• gnbd_import
  – Import remote block devices form GNBD server.

                                                    8
CLVM
• Provides volume management of cluster storage.
• A cluster-wide version of LVM2
• CLVM provides the same capabilities as LVM2 on a single node, but
  makes the logical volumes created with CLVM available to all nodes
  in a cluster.
• CLVM uses the lock-management service provided by the cluster
  infrastructure.
• Using CLVM requires minor changes to /etc/lvm/lvm.conf for cluster-
  wide locking.
• clvmd:
   – A daemon that provides clustering extensions to the standard LVM2 tool
     set and allows LVM2 commands to manage shared storage.
   – Runs on each cluster node.
   – Distributes LVM metadata updates in a cluster, thereby presenting each
     cluster node with the same view of the logical volumes


                                                                          9
CLVM (cont.)
• lvm: LVM2 command line tools.
• /etc/lvm/lvm.conf
• pvcreate
  – block devices/partitions -> PV
• vgcreate
  – PV(s) -> VG
• lvcreate
  – VG -> LV(s)
• Ready for make file system on LV.
                                      10
GFS
• To simultaneously access a block device that is shared among the
  nodes.
• Single, consistent view of the FS name space across GFS nodes in
  a cluster.
• Native FS under VFS, POSIX interface to applications.
• Distributed metadata and multiple journals.
• Uses lock manager to coordinate I/O.
• When one node changes data on a GFS file system, that change is
  immediately visible to the other cluster nodes using that file system.
• Scale the cluster seamlessly by adding servers or storage on the fly.
• We use a “Economy and Performance” deployment.




                                                                      11
GFS (cont.)
• gfs.ko: kernel module, loaded on each GFS
  cluster node.
• gfs_mkfs: create a GFS on a storage device.
• gfs_tool: configures or tunes a GFS.
• gfs_grow: grows a mounted GFS.
• gfs_jadd: adds journals to a mounted GFS.
• gfs_quota: manages quotas on a mounted GFS.
• gfs_fsck: repairs an unmounted GFS.
• mount.gfs: mount helper called by mount.

                                            12
Fencing
• We must configure each GFS node in the
  cluster for at least one form of fencing.




                                              13
Setup a Cluster
                                 (prepare)
•   Software Installation
     –   Install default packages of “Clustering” and “Storage Clustering” in each node.
     –   The rpms in cdrom: /ClusterStorage and /Cluster
     –   cman’s rpm in cdrom: /Server
     –   Major RPM (if need dependence, install depended rpm)
           •   cman-2.0.60-1.el5.i386.rpm
           •   modcluster-0.8-27.el5.i386.rpm
           •   gnbd-1.1.5-1.el5.i386.rpm
           •   kmod-gnbd-0.1.3-4.2.6.18_8.el5.i686.rpm
           •   kmod-gfs-0.1.16-5.2.6.18_8.el5.i686.rpm
           •   lvm2-cluster-2.02.16-3.el5.i386.rpm
           •   Global_File_System-en-US-5.0.0-4.noarch.rpm
           •   gfs-utils-0.1.11-1.el5.i386.rpm
           •   gfs2-utils-0.1.25-1.el5.i386.rpm
           •   etc.
•   Network
     –   Disable firewall and SELinux
     –   Enable multicast and IGMP.
     –   Configure /etc/hosts or DNS for hostname (import!)
•   Machine hostnames
     –   192.168.1.251              test1   (gnbd server)
     –   192.168.1.252              test2   (gfs node)
     –   192.168.1.253              test3   (gfs node)
     –   192.168.1.254              test4   (gnbd server)                                  14
Setup a Cluster
                            (GNBD server)
•   In GNBD server node, need not start cman, i.e., the GNBD server is not a member of
    the cluster.

(1) Start GNBD server process (man gnbd_serv)
     # gnbd_serv -n

(2) Export block device (man gnbd_export)
     # gnbd_export -v -d /dev/sda3 -e gnbdnode1 –c
    Note1: must enable cache since no cman.
    Note2: the block device should be a disk partition (how about LV? The document
    says LV is not supported).

(3) Check the export
    # gnbd_export      -l

(4) Add to /etc/rc.local
     gnbd_serv –n
     gnbd_export -v /dev/sda3 -e gnbdnode1 -c



                                                                                     15
Setup a Cluster
                       (cluster infrastructure)
•   Initially configure a cluster
     –   /etc/cluster/cluster.conf: generated by “system-config-cluster” or manually. (only config clustername and
         node members)
     –   In one GFS node, use “system-config-cluster” to create a new cluster(name:cluster1) and add a node
         (name: test2)
     –   The cluster.conf
            <?xml version="1.0" ?>
            <cluster alias="cluster1" config_version="5" name="cluster1">
                    <fence_daemon post_fail_delay="0" post_join_delay="3"/>
                    <clusternodes>
                            <clusternode name="test2" nodeid="1" votes="1">
                                    <fence/>
                            </clusternode>
            </clusternodes>
                    <cman expected_votes="1" two_node="1"/>
                    <fencedevices/>
                    <rm>
                            <failoverdomains/>
                            <resources/>
                    </rm>
            </cluster>
•   Start infrastructure components
     –   service cman start (refer to /etc/init.d/cman)
            •   Load kernel modules (configfs, dlm, lock_dlm)
            •   Mount configfs (I think it like chubby’s files ystem space)
            •   Start ccsd daemon
            •   Start cman (no daemon, use cman_tool join to join this node to cluster)
            •   Start daemons (start groupd, fenced, dlm_controld, gfs_controld)
            •   Start fencing (start fenced daemon, and use fence_tool to join this node to fence domain)
                                                                                                                     16
Setup a Cluster
                              (GNBD client)
•   Load kernel module gnbd.ko
     # echo “modprobe gnbd” >/etc/sysconfig/modules/gnbd.modules
     # chmod 755 /etc/sysconfig/modules/gnbd.modules
     # modprobe gnbd
     Then, the gnbd.ko will be loaded when the node boot-up.
•   Import GNBD
     # gnbd_import -i test1
     Then, we can find a block device /dev/gnbd/gnbdnode1, and it is same as /dev/gnbd0.

     create a /etc/init.d/gnbd-client as a service script
     # chmod 755 /etc/init.d/gnbd-client
     # chkconfig --add gnbd-client

        Since gnbd_import -i should be done early then clvmd and gnbd_import –R should be done
        later then clvmd, we assign special start-number (23<24) and stop-number(77>76) in script
        /etc/init.d/gnbd-client.

     Thus, the gnbd will be imported automatically when node boot-up.




                                                                                                17
Setup a Cluster
                               (CLVM)
•   Start clvmd
     # service clvmd start
     # chkconfig --level 35 clvmd

•   pvcreate
     # pvcreate /dev/gnbd0, or
     # pvcreate /dev/gnbd/gnbdnode1
     Use lvmdiskscan or pvdisplay or pvscan to display status.
•   vgcteare
     # vgcreate vg1 /dev/gnbd0
     Use vgdisplay or vgscan to display status.
•   lvcreate
     # lvcreate -l 100%FREE -n lv1 vg1
     Use lvdisplay and lvscan to display status.

•   Restart clvmd
     # service clvmd rerestart
     clvmd is responsible for sync the lvm configuration among the cluster nodes.

        Now, we can find a new logical volume block device @ /dev/vg1/lv1, and we can make file
        system on it.
                                                                                                  18
Setup a Cluster
                           (GFS)
•    Make sure that the clocks on the GFS nodes are synchronized. (Use
     NTP)
•    Make GFS file system
     # gfs_mkfs -p lock_dlm -t cluster1:testgfs -j 4 /dev/vg1/lv1
         Note: -j number same node number. One journal is required for each node
         that mounts a GFS file system. Make sure to account for additional journals
         needed for future expansion.

•     Mount GFS
     # mkdir /mnt/gfs
    (1) # mount -t gfs -o acl /dev/vg1/lv1 /mnt/gfs
    (2) add to /etc/fstab
        /dev/vg1/lv1 /mnt/gfs    gfs defaults,acl 0 0
        # mount -a -t gfs
        # chkconfig --level 35 gfs
        (refer to /etc/init.d/gfs)

    Now, the GFS file system is accessible @ /mnt/gfs/

                                                                                       19
Setup a Cluster
                 (add a new GFS node)
•   Install packages in the new node
•   Add new node information in cluster.conf in a existing node.
     –   Use ccs_tool addnode, or
     –   Use system-config-cluster
•   Copy (scp) the /etc/cluster.conf to new node.
•   Import GNBD on the new node
•   Stop components on all running node (if there are more than 2 node existing, need
    not do this step)
     # service gfs stop
     # service clvmd stop
     # service cman stop
•   Then start components on all running node and the new added node
     # service cman start
     # service clvmd start
     # service gfs start

         The clvmd will sync the metadata of logical volumes to the new added node, so, when clvmd
         is started, the /dev/vg1/lv1 will be visible on the now node.




                                                                                                20
Setup a Cluster
                (add a new GNBD node)
•   Setup a new GNBD server (machine test4)              # gnbd_import -l
    and export a new block device.                       Device name : gnbdnode1
     # gnbd_serv -n                                      ----------------------
     # gnbd_export -v -d /dev/sda3 -e
        gnbdnode2 –c                                         Minor # : 0
     # gnbd_export -l                                     sysfs name : /block/gnbd0
•   Import the new GNBD (on all of the cluster                Server : test1
    nodes)                                                      Port : 14567
     # gnbd_import -i test4                                    State : Open Connected Clear
     # gnbd_import –l
                                                            Readonly : No
•   Make a new PV (on one of the cluster node)               Sectors : 3984120
     # pvcreate –v /dev/gnbd1
     Then, we can find the new PV on all nodes in the
        cluster, by pvdisplay or lvmdiskscan or pvscan   Device name : gnbdnode2
        or pvs.                                          ----------------------
•   Extend the VG (on one of the cluster node)               Minor # : 1
     # vgextend -v vg1 /dev/gnbd1
                                                          sysfs name : /block/gnbd1
     Then, we can find the extended VG and changed PV
        on all nodes in the cluster, by vgdisplay or          Server : test4
        vgfscan or vgs.                                         Port : 14567
•   Extend the LV (on one of the cluster node)                 State : Close Connected
     # lvextend -v -l +100%FREE                             Clear
        /dev/vg1/lv1
     Then, we can find the extended LV on all nodes in
                                                            Readonly : No
        the cluster, by lvdisplay or lvfscan or lvs.         Sectors : 4273290
•   Grow the GFS (on one of the cluster node)
    (next page)                                                                          21
Setup a Cluster
            (grow the GFS)
• The gfs_grow command must be run on a
  mounted file system. Only needs to be run
  on one node in a cluster.
• Grow
  # gfs_grow /mnt/gfs
• Sometimes after gfs_grow, df and
  lvdisplay are hung. Need reboot the
  system.
                                          22
Evaluation
• Size
   – GFS is based on a 64-bit architecture, which can theoretically
     accommodate an 8 EB file system. However, the current supported
     maximum size of a GFS file system is 25 TB. (But there is a note in
     Red Hat document: “If your system requires GFS file systems larger
     than 25 TB, contact your Red Hat service representative.”)
• Essential benchmarks
   – Refer to GlusterFS’s benchmarks:
     http://www.gluster.org/docs/index.php/GlusterFS#GlusterFS_Benchmar
     ks
• Iozone benchmarks
   – Refer to http://www.iozone.org
• Lock test
   – Always blocks the process when it set a new lock which conflict with
     other process.



                                                                            23
Conclusions
•   The cluster infrastructure is too complex and weak, sometimes fail (such as
    cman, fencing, etc).
•   The GNDB is simple and robust, but lacks flexibility.
•   The CLVM is ok, but too complex to use.
•   The GFS is very weak, sometime fail. (mount is ok, but umount often fail)
•   The “two levels” (GNBDStorage+GFSCluster) deployment is not meet the
    “cloud” goal.
•   Not easy to add a new GFS cluster node or a GNBD storage node.
•   No data replicas for safety.
•   Risks when a GNBD node fails. When one GNBD node fails, the data GFS
    is not accessible.

    The Red Hat Cluster solution is not based on the Assumptions like
    GoogleFS and GlusterFS, (i.e. “The system is built from many computer
    that often fail.”), it is not easy and not safe to use in a moderate scale
    cluster. So, I think the Red Hat Storage Cluster is not a good designed
    solution, and has no good future.

                                                                                 24

Más contenido relacionado

La actualidad más candente

LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceBrendan Gregg
 
Linux presentation
Linux presentationLinux presentation
Linux presentationNikhil Jain
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdfAdrian Huang
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InSage Weil
 
Rhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformanceRhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformancesprdd
 
Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryScyllaDB
 
Kdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysisKdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysisBuland Singh
 
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요Jo Hoon
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernelAdrian Huang
 
Revisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerRevisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerYongseok Oh
 
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...xKinAnx
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephScyllaDB
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringShapeBlue
 
Linux Kernel Crashdump
Linux Kernel CrashdumpLinux Kernel Crashdump
Linux Kernel CrashdumpMarian Marinov
 
Glusterfs 소개 v1.0_난공불락세미나
Glusterfs 소개 v1.0_난공불락세미나Glusterfs 소개 v1.0_난공불락세미나
Glusterfs 소개 v1.0_난공불락세미나sprdd
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux KernelAdrian Huang
 
Testing real-time Linux. What to test and how
Testing real-time Linux. What to test and how Testing real-time Linux. What to test and how
Testing real-time Linux. What to test and how Chirag Jog
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdfAdrian Huang
 

La actualidad más candente (20)

LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
Linux presentation
Linux presentationLinux presentation
Linux presentation
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdf
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
Rhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformanceRhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformance
 
Kernel crashdump
Kernel crashdumpKernel crashdump
Kernel crashdump
 
Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent Memory
 
Kdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysisKdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysis
 
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernel
 
Revisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerRevisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS Scheduler
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uring
 
Linux Kernel Crashdump
Linux Kernel CrashdumpLinux Kernel Crashdump
Linux Kernel Crashdump
 
Glusterfs 소개 v1.0_난공불락세미나
Glusterfs 소개 v1.0_난공불락세미나Glusterfs 소개 v1.0_난공불락세미나
Glusterfs 소개 v1.0_난공불락세미나
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
 
Testing real-time Linux. What to test and how
Testing real-time Linux. What to test and how Testing real-time Linux. What to test and how
Testing real-time Linux. What to test and how
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdf
 

Similar a Red Hat Global File System (GFS)

How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13Gosuke Miyashita
 
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-DeviceSUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-DeviceSUSE
 
Systemd: the modern Linux init system you will learn to love
Systemd: the modern Linux init system you will learn to loveSystemd: the modern Linux init system you will learn to love
Systemd: the modern Linux init system you will learn to loveAlison Chaiken
 
Using CloudStack With Clustered LVM
Using CloudStack With Clustered LVMUsing CloudStack With Clustered LVM
Using CloudStack With Clustered LVMMarcus L Sorensen
 
Rhel cluster basics 2
Rhel cluster basics   2Rhel cluster basics   2
Rhel cluster basics 2Manoj Singh
 
ELC-E Linux Awareness
ELC-E Linux AwarenessELC-E Linux Awareness
ELC-E Linux AwarenessPeter Griffin
 
State of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigDataState of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigDatainside-BigData.com
 
Advanced Namespaces and cgroups
Advanced Namespaces and cgroupsAdvanced Namespaces and cgroups
Advanced Namespaces and cgroupsKernel TLV
 
A3Sec Advanced Deployment System
A3Sec Advanced Deployment SystemA3Sec Advanced Deployment System
A3Sec Advanced Deployment Systema3sec
 
Systemd for developers
Systemd for developersSystemd for developers
Systemd for developersAlison Chaiken
 
Hands on Virtualization with Ganeti (part 1) - LinuxCon 2012
Hands on Virtualization with Ganeti (part 1)  - LinuxCon 2012Hands on Virtualization with Ganeti (part 1)  - LinuxCon 2012
Hands on Virtualization with Ganeti (part 1) - LinuxCon 2012Lance Albertson
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...Yandex
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...Adrian Huang
 
From printk to QEMU: Xen/Linux Kernel debugging
From printk to QEMU: Xen/Linux Kernel debuggingFrom printk to QEMU: Xen/Linux Kernel debugging
From printk to QEMU: Xen/Linux Kernel debuggingThe Linux Foundation
 
[KubeCon NA 2020] containerd: Rootless Containers 2020
[KubeCon NA 2020] containerd: Rootless Containers 2020[KubeCon NA 2020] containerd: Rootless Containers 2020
[KubeCon NA 2020] containerd: Rootless Containers 2020Akihiro Suda
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013dotCloud
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Docker, Inc.
 
High Availability Storage (susecon2016)
High Availability Storage (susecon2016)High Availability Storage (susecon2016)
High Availability Storage (susecon2016)Roger Zhou 周志强
 
Tuning systemd for embedded
Tuning systemd for embeddedTuning systemd for embedded
Tuning systemd for embeddedAlison Chaiken
 

Similar a Red Hat Global File System (GFS) (20)

How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
 
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-DeviceSUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
 
OpenStack Cinder
OpenStack CinderOpenStack Cinder
OpenStack Cinder
 
Systemd: the modern Linux init system you will learn to love
Systemd: the modern Linux init system you will learn to loveSystemd: the modern Linux init system you will learn to love
Systemd: the modern Linux init system you will learn to love
 
Using CloudStack With Clustered LVM
Using CloudStack With Clustered LVMUsing CloudStack With Clustered LVM
Using CloudStack With Clustered LVM
 
Rhel cluster basics 2
Rhel cluster basics   2Rhel cluster basics   2
Rhel cluster basics 2
 
ELC-E Linux Awareness
ELC-E Linux AwarenessELC-E Linux Awareness
ELC-E Linux Awareness
 
State of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigDataState of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigData
 
Advanced Namespaces and cgroups
Advanced Namespaces and cgroupsAdvanced Namespaces and cgroups
Advanced Namespaces and cgroups
 
A3Sec Advanced Deployment System
A3Sec Advanced Deployment SystemA3Sec Advanced Deployment System
A3Sec Advanced Deployment System
 
Systemd for developers
Systemd for developersSystemd for developers
Systemd for developers
 
Hands on Virtualization with Ganeti (part 1) - LinuxCon 2012
Hands on Virtualization with Ganeti (part 1)  - LinuxCon 2012Hands on Virtualization with Ganeti (part 1)  - LinuxCon 2012
Hands on Virtualization with Ganeti (part 1) - LinuxCon 2012
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
 
From printk to QEMU: Xen/Linux Kernel debugging
From printk to QEMU: Xen/Linux Kernel debuggingFrom printk to QEMU: Xen/Linux Kernel debugging
From printk to QEMU: Xen/Linux Kernel debugging
 
[KubeCon NA 2020] containerd: Rootless Containers 2020
[KubeCon NA 2020] containerd: Rootless Containers 2020[KubeCon NA 2020] containerd: Rootless Containers 2020
[KubeCon NA 2020] containerd: Rootless Containers 2020
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
 
High Availability Storage (susecon2016)
High Availability Storage (susecon2016)High Availability Storage (susecon2016)
High Availability Storage (susecon2016)
 
Tuning systemd for embedded
Tuning systemd for embeddedTuning systemd for embedded
Tuning systemd for embedded
 

Más de Schubert Zhang

Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and InfrastructureSchubert Zhang
 
Simple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSimple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSchubert Zhang
 
Scrum Agile Development
Scrum Agile DevelopmentScrum Agile Development
Scrum Agile DevelopmentSchubert Zhang
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processingSchubert Zhang
 
Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Schubert Zhang
 
Big Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aBig Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aSchubert Zhang
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor IntroductionSchubert Zhang
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验Schubert Zhang
 
Wild Thinking of BigdataBase
Wild Thinking of BigdataBaseWild Thinking of BigdataBase
Wild Thinking of BigdataBaseSchubert Zhang
 
RockStor - A Cloud Object System based on Hadoop
RockStor -  A Cloud Object System based on HadoopRockStor -  A Cloud Object System based on Hadoop
RockStor - A Cloud Object System based on HadoopSchubert Zhang
 
Hadoop compress-stream
Hadoop compress-streamHadoop compress-stream
Hadoop compress-streamSchubert Zhang
 
Ganglia轻度使用指南
Ganglia轻度使用指南Ganglia轻度使用指南
Ganglia轻度使用指南Schubert Zhang
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionSchubert Zhang
 

Más de Schubert Zhang (20)

Blockchain in Action
Blockchain in ActionBlockchain in Action
Blockchain in Action
 
科普区块链
科普区块链科普区块链
科普区块链
 
Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and Infrastructure
 
Simple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSimple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluation
 
Scrum Agile Development
Scrum Agile DevelopmentScrum Agile Development
Scrum Agile Development
 
Career Advice
Career AdviceCareer Advice
Career Advice
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
 
HiveServer2
HiveServer2HiveServer2
HiveServer2
 
Horizon for Big Data
Horizon for Big DataHorizon for Big Data
Horizon for Big Data
 
Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算
 
Big Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aBig Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223a
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor Introduction
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验
 
Wild Thinking of BigdataBase
Wild Thinking of BigdataBaseWild Thinking of BigdataBase
Wild Thinking of BigdataBase
 
RockStor - A Cloud Object System based on Hadoop
RockStor -  A Cloud Object System based on HadoopRockStor -  A Cloud Object System based on Hadoop
RockStor - A Cloud Object System based on Hadoop
 
Fans of running gump
Fans of running gumpFans of running gump
Fans of running gump
 
Hadoop compress-stream
Hadoop compress-streamHadoop compress-stream
Hadoop compress-stream
 
Ganglia轻度使用指南
Ganglia轻度使用指南Ganglia轻度使用指南
Ganglia轻度使用指南
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solution
 
Big data and cloud
Big data and cloudBig data and cloud
Big data and cloud
 

Último

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Red Hat Global File System (GFS)

  • 1. Red Hat Storage Cluster (GFS, CLVM, GNBD) GFS allows multiple nodes to share storage at a block level as if the storage were connected locally to each cluster node. Schubert Zhang, Guangxian Liao Jul, 2008 1
  • 2. Deployment GFS node GFS node GFS node Applications Applications Applications GFS GFS GFS /mnt/gfs /mnt/gfs /mnt/gfs CLVM/LVM CLVM/LVM CLVM/LVM vg-lv vg-lv vg-lv GNBD Client GNBD Client GNBD Client import import import /dev/gnbd/x /dev/gnbd/x /dev/gnbd/x TCP/IP Network Storage Level GNBD Server GNBD Server export export Block Devices Block Devices Disks... Disks... GNBD Server Node GNBD Server Node “Economy and Performance” deployment. 2
  • 3. Architecture • Pay attention to the relationship of cluster software components. – Cluster Infrastructure (Common): CMAN, DLM, CCS, Fencing – GNBD: Server and Client – CLVM – GFS 3
  • 4. Cluster Infrastructure • We only concern cluster type for storage. • Necessary infrastructure components: – Cluster management (CMAN) • It’s a distributed cluster manager and runs in each cluster node. • Keeps track of cluster quorum, avoids "split-brain". (Like chubby’s Paxos Protocol?) • Keeps track of membership. • libcman.so library, cman_tool • dlm_controld: started by cman init script to manage dlm in kernel • gfs_controld : started by cman init script to manage gfs in kernel • groupd: started by cman init script to interface between openais/cman and dlm_controld/gfs_controld/fenced , group_tool – Lock management (DLM) • To synchronize access to shared resources (shared storage, etc.). • Runs in each cluster node. • libdlm.so library. • GFS and CLVM use locks from DLM. • GFS uses locks from the lock manager to synchronize access to file system metadata (on shared storage). • CLVM uses locks from the lock manager to synchronize updates to LVM volumes and volume groups (also on shared storage). • Like chubby’s lock function? 4
  • 5. Cluster Infrastructure (cont.) – Cluster configuration management (CCS) • Runs in each cluster node. (ccsd) • Synchronization/up-to-date of cluster configuration file. Propagates modification in a cluster. • Other components (eg. CMAN) access configuration info via CCS. • /etc/cluster/cluster.conf (Cluster Name, Cluster nodes, Fence, Resources, etc.) • ccs_tool: to make online updates of CCS configuration files • ccs_test: to retrieve information from configuration files through ccsd. – Fencing • Fencing is the disconnection of a node from the cluster's shared storage. Fencing cuts off I/O from shared storage, thus ensuring data integrity. The cluster infrastructure performs fencing through the fence daemon, fenced. • GNBD fencing: fence_gnbd? • fence_tool • cluster configuration file: fencing-method, fencing agent, fencing device for each node in the cluster. 5
  • 6. GNBD • An ancillary component of GFS that exports block-level storage to Ethernet. • Global Network Block Device – GNBD provides block-device access for Red Hat GFS over TCP/IP. GNBD is similar in concept to NBD; – GNBD is GFS-specific and tuned solely for use with GFS. • Two major components – GNBD Server – GNBD Client 6
  • 7. GNBD (server) • Exports block-level storage from its local storage. • gnbd_serv process – GNBD server need not join the cluster manager. • gnbd_export – Export block devices. 7
  • 8. GNBD (client) • A GNBD client runs in a node with GFS. • Imports block device exported by GNBD server. • Multiple GNBD clients can access a device exported by a GNBD server, thus making a GNBD suitable for use by a group of nodes running GFS. • gnbd.ko – A kernel module • gnbd_import – Import remote block devices form GNBD server. 8
  • 9. CLVM • Provides volume management of cluster storage. • A cluster-wide version of LVM2 • CLVM provides the same capabilities as LVM2 on a single node, but makes the logical volumes created with CLVM available to all nodes in a cluster. • CLVM uses the lock-management service provided by the cluster infrastructure. • Using CLVM requires minor changes to /etc/lvm/lvm.conf for cluster- wide locking. • clvmd: – A daemon that provides clustering extensions to the standard LVM2 tool set and allows LVM2 commands to manage shared storage. – Runs on each cluster node. – Distributes LVM metadata updates in a cluster, thereby presenting each cluster node with the same view of the logical volumes 9
  • 10. CLVM (cont.) • lvm: LVM2 command line tools. • /etc/lvm/lvm.conf • pvcreate – block devices/partitions -> PV • vgcreate – PV(s) -> VG • lvcreate – VG -> LV(s) • Ready for make file system on LV. 10
  • 11. GFS • To simultaneously access a block device that is shared among the nodes. • Single, consistent view of the FS name space across GFS nodes in a cluster. • Native FS under VFS, POSIX interface to applications. • Distributed metadata and multiple journals. • Uses lock manager to coordinate I/O. • When one node changes data on a GFS file system, that change is immediately visible to the other cluster nodes using that file system. • Scale the cluster seamlessly by adding servers or storage on the fly. • We use a “Economy and Performance” deployment. 11
  • 12. GFS (cont.) • gfs.ko: kernel module, loaded on each GFS cluster node. • gfs_mkfs: create a GFS on a storage device. • gfs_tool: configures or tunes a GFS. • gfs_grow: grows a mounted GFS. • gfs_jadd: adds journals to a mounted GFS. • gfs_quota: manages quotas on a mounted GFS. • gfs_fsck: repairs an unmounted GFS. • mount.gfs: mount helper called by mount. 12
  • 13. Fencing • We must configure each GFS node in the cluster for at least one form of fencing. 13
  • 14. Setup a Cluster (prepare) • Software Installation – Install default packages of “Clustering” and “Storage Clustering” in each node. – The rpms in cdrom: /ClusterStorage and /Cluster – cman’s rpm in cdrom: /Server – Major RPM (if need dependence, install depended rpm) • cman-2.0.60-1.el5.i386.rpm • modcluster-0.8-27.el5.i386.rpm • gnbd-1.1.5-1.el5.i386.rpm • kmod-gnbd-0.1.3-4.2.6.18_8.el5.i686.rpm • kmod-gfs-0.1.16-5.2.6.18_8.el5.i686.rpm • lvm2-cluster-2.02.16-3.el5.i386.rpm • Global_File_System-en-US-5.0.0-4.noarch.rpm • gfs-utils-0.1.11-1.el5.i386.rpm • gfs2-utils-0.1.25-1.el5.i386.rpm • etc. • Network – Disable firewall and SELinux – Enable multicast and IGMP. – Configure /etc/hosts or DNS for hostname (import!) • Machine hostnames – 192.168.1.251 test1 (gnbd server) – 192.168.1.252 test2 (gfs node) – 192.168.1.253 test3 (gfs node) – 192.168.1.254 test4 (gnbd server) 14
  • 15. Setup a Cluster (GNBD server) • In GNBD server node, need not start cman, i.e., the GNBD server is not a member of the cluster. (1) Start GNBD server process (man gnbd_serv) # gnbd_serv -n (2) Export block device (man gnbd_export) # gnbd_export -v -d /dev/sda3 -e gnbdnode1 –c Note1: must enable cache since no cman. Note2: the block device should be a disk partition (how about LV? The document says LV is not supported). (3) Check the export # gnbd_export -l (4) Add to /etc/rc.local gnbd_serv –n gnbd_export -v /dev/sda3 -e gnbdnode1 -c 15
  • 16. Setup a Cluster (cluster infrastructure) • Initially configure a cluster – /etc/cluster/cluster.conf: generated by “system-config-cluster” or manually. (only config clustername and node members) – In one GFS node, use “system-config-cluster” to create a new cluster(name:cluster1) and add a node (name: test2) – The cluster.conf <?xml version="1.0" ?> <cluster alias="cluster1" config_version="5" name="cluster1"> <fence_daemon post_fail_delay="0" post_join_delay="3"/> <clusternodes> <clusternode name="test2" nodeid="1" votes="1"> <fence/> </clusternode> </clusternodes> <cman expected_votes="1" two_node="1"/> <fencedevices/> <rm> <failoverdomains/> <resources/> </rm> </cluster> • Start infrastructure components – service cman start (refer to /etc/init.d/cman) • Load kernel modules (configfs, dlm, lock_dlm) • Mount configfs (I think it like chubby’s files ystem space) • Start ccsd daemon • Start cman (no daemon, use cman_tool join to join this node to cluster) • Start daemons (start groupd, fenced, dlm_controld, gfs_controld) • Start fencing (start fenced daemon, and use fence_tool to join this node to fence domain) 16
  • 17. Setup a Cluster (GNBD client) • Load kernel module gnbd.ko # echo “modprobe gnbd” >/etc/sysconfig/modules/gnbd.modules # chmod 755 /etc/sysconfig/modules/gnbd.modules # modprobe gnbd Then, the gnbd.ko will be loaded when the node boot-up. • Import GNBD # gnbd_import -i test1 Then, we can find a block device /dev/gnbd/gnbdnode1, and it is same as /dev/gnbd0. create a /etc/init.d/gnbd-client as a service script # chmod 755 /etc/init.d/gnbd-client # chkconfig --add gnbd-client Since gnbd_import -i should be done early then clvmd and gnbd_import –R should be done later then clvmd, we assign special start-number (23<24) and stop-number(77>76) in script /etc/init.d/gnbd-client. Thus, the gnbd will be imported automatically when node boot-up. 17
  • 18. Setup a Cluster (CLVM) • Start clvmd # service clvmd start # chkconfig --level 35 clvmd • pvcreate # pvcreate /dev/gnbd0, or # pvcreate /dev/gnbd/gnbdnode1 Use lvmdiskscan or pvdisplay or pvscan to display status. • vgcteare # vgcreate vg1 /dev/gnbd0 Use vgdisplay or vgscan to display status. • lvcreate # lvcreate -l 100%FREE -n lv1 vg1 Use lvdisplay and lvscan to display status. • Restart clvmd # service clvmd rerestart clvmd is responsible for sync the lvm configuration among the cluster nodes. Now, we can find a new logical volume block device @ /dev/vg1/lv1, and we can make file system on it. 18
  • 19. Setup a Cluster (GFS) • Make sure that the clocks on the GFS nodes are synchronized. (Use NTP) • Make GFS file system # gfs_mkfs -p lock_dlm -t cluster1:testgfs -j 4 /dev/vg1/lv1 Note: -j number same node number. One journal is required for each node that mounts a GFS file system. Make sure to account for additional journals needed for future expansion. • Mount GFS # mkdir /mnt/gfs (1) # mount -t gfs -o acl /dev/vg1/lv1 /mnt/gfs (2) add to /etc/fstab /dev/vg1/lv1 /mnt/gfs gfs defaults,acl 0 0 # mount -a -t gfs # chkconfig --level 35 gfs (refer to /etc/init.d/gfs) Now, the GFS file system is accessible @ /mnt/gfs/ 19
  • 20. Setup a Cluster (add a new GFS node) • Install packages in the new node • Add new node information in cluster.conf in a existing node. – Use ccs_tool addnode, or – Use system-config-cluster • Copy (scp) the /etc/cluster.conf to new node. • Import GNBD on the new node • Stop components on all running node (if there are more than 2 node existing, need not do this step) # service gfs stop # service clvmd stop # service cman stop • Then start components on all running node and the new added node # service cman start # service clvmd start # service gfs start The clvmd will sync the metadata of logical volumes to the new added node, so, when clvmd is started, the /dev/vg1/lv1 will be visible on the now node. 20
  • 21. Setup a Cluster (add a new GNBD node) • Setup a new GNBD server (machine test4) # gnbd_import -l and export a new block device. Device name : gnbdnode1 # gnbd_serv -n ---------------------- # gnbd_export -v -d /dev/sda3 -e gnbdnode2 –c Minor # : 0 # gnbd_export -l sysfs name : /block/gnbd0 • Import the new GNBD (on all of the cluster Server : test1 nodes) Port : 14567 # gnbd_import -i test4 State : Open Connected Clear # gnbd_import –l Readonly : No • Make a new PV (on one of the cluster node) Sectors : 3984120 # pvcreate –v /dev/gnbd1 Then, we can find the new PV on all nodes in the cluster, by pvdisplay or lvmdiskscan or pvscan Device name : gnbdnode2 or pvs. ---------------------- • Extend the VG (on one of the cluster node) Minor # : 1 # vgextend -v vg1 /dev/gnbd1 sysfs name : /block/gnbd1 Then, we can find the extended VG and changed PV on all nodes in the cluster, by vgdisplay or Server : test4 vgfscan or vgs. Port : 14567 • Extend the LV (on one of the cluster node) State : Close Connected # lvextend -v -l +100%FREE Clear /dev/vg1/lv1 Then, we can find the extended LV on all nodes in Readonly : No the cluster, by lvdisplay or lvfscan or lvs. Sectors : 4273290 • Grow the GFS (on one of the cluster node) (next page) 21
  • 22. Setup a Cluster (grow the GFS) • The gfs_grow command must be run on a mounted file system. Only needs to be run on one node in a cluster. • Grow # gfs_grow /mnt/gfs • Sometimes after gfs_grow, df and lvdisplay are hung. Need reboot the system. 22
  • 23. Evaluation • Size – GFS is based on a 64-bit architecture, which can theoretically accommodate an 8 EB file system. However, the current supported maximum size of a GFS file system is 25 TB. (But there is a note in Red Hat document: “If your system requires GFS file systems larger than 25 TB, contact your Red Hat service representative.”) • Essential benchmarks – Refer to GlusterFS’s benchmarks: http://www.gluster.org/docs/index.php/GlusterFS#GlusterFS_Benchmar ks • Iozone benchmarks – Refer to http://www.iozone.org • Lock test – Always blocks the process when it set a new lock which conflict with other process. 23
  • 24. Conclusions • The cluster infrastructure is too complex and weak, sometimes fail (such as cman, fencing, etc). • The GNDB is simple and robust, but lacks flexibility. • The CLVM is ok, but too complex to use. • The GFS is very weak, sometime fail. (mount is ok, but umount often fail) • The “two levels” (GNBDStorage+GFSCluster) deployment is not meet the “cloud” goal. • Not easy to add a new GFS cluster node or a GNBD storage node. • No data replicas for safety. • Risks when a GNBD node fails. When one GNBD node fails, the data GFS is not accessible. The Red Hat Cluster solution is not based on the Assumptions like GoogleFS and GlusterFS, (i.e. “The system is built from many computer that often fail.”), it is not easy and not safe to use in a moderate scale cluster. So, I think the Red Hat Storage Cluster is not a good designed solution, and has no good future. 24