Nell’iperspazio con Rocket: il Framework Web di Rust!
GlusterFS CTDB Integration
1. Red Hat K.K. All rights reserved.
GlusterFS / CTDB Integration
v1.0 2013.05.14
Etsuji Nakai
Senior Solution Architect
Red Hat K.K.
2. Red Hat K.K. All rights reserved. 2
$ who am i
Etsuji Nakai (@enakai00)
●
Senior solution architect and cloud evangelist at
Red Hat K.K.
●
The author of “Professional Linux Systems” series.
●
Available in Japanese. Translation offering from
publishers are welcomed ;-)
Professional Linux Systems
Technology for Next Decade
Professional Linux Systems
Deployment and Management
Professional Linux Systems
Network Management
3. Red Hat K.K. All rights reserved. 3
Contents
CTDB Overview
Why does CTDB matter?
CTDB split-brain resolution
Configuration steps for demo set-up
Summary
4. Red Hat K.K. All rights reserved. 4
Disclaimer
This document explains how to setup clustered Samba server using GlusterFS and CTDB
with the following software components.
●
Base OS, Samba, CTDB: RHEL6.4 (or any of your favorite clone)
●
GlusterFS: GlusterFS 3.3.1 (Community version)
●
http://download.gluster.org/pub/gluster/glusterfs/3.3/3.3.1/
Since this is based on the community version of GlusterFS, you cannot receive a commercial
support from Red Hat for this configuration. If you need a commercial support, please
consider using Red Hat Storage Server(RHS). In addition, there are different conditions for
a supportable configuration with RHS. Please consult sales representatives from Red Hat
for details.
Red Hat accepts no liability for the content of this document, or for the consequences of
any actions taken on the basis of the information provided. Any views or opinions
presented in this document are solely those of the author and do not necessarily represent
those of Red Hat.
6. Red Hat K.K. All rights reserved. 6
What's CTDB?
TDB = Trivial Database
●
Simple backend DB for Samba, used to store user info, file lock info, etc...
CTDB = Clustered TDB
●
Cluster extension of TDB, necessary for
multiple Samba hosts configuration to
provide the same filesystem contents.
All clients see the same contents
through different Samba hosts.
Samba Samba Samba
・・・
Shared Filesystem
7. Red Hat K.K. All rights reserved. 7
What's wrong without CTDB?
Windows file locks are not shared among Samba hosts.
●
You would see the following alert when someone is opening the same file.
●
Without CTDB, if others are opening the same
file through a different Samba host from you,
you never see that alert.
●
This is because file lock info is stored in the
local TDB if you don't use CTDB.
●
CTDB was initially developed as a shared TDB
for multiple Samba hosts to overcome this
problem.
xxx.xls
Windows file locks
are not shared.
Locked! Locked!
8. Red Hat K.K. All rights reserved. 8
CTDB interconnect
(heartbeat) network
Yet another benefit of CTDB
Floating IP's can be assigned across hosts for the transparent failover.
●
When one of the hosts fails, the floating IP is moved to another host.
●
Mutual health checking is done through the CTDB interconnect (so called
“heartbeat”) network.
●
CTDB can also be used for NFS server cluster to provide the floating IP
feature. (CTDB doesn't provide shared file locking for NFS though.)
Floating IP#1
・・・
Floating IP#2 Floating IP#N
Floating IP#1
・・・
Floating IP#2 Floating IP#N
Floating IP#1
9. Red Hat K.K. All rights reserved.
Why does CTDB matter?
10. Red Hat K.K. All rights reserved. 10
Access path of GlusterFS native client
The native client directly communicates to all storage nodes.
●
Transparent failover is implemented on the client side. When the client
detects the node failure, it accesses the replicated node.
●
Floating IP is unnecessary by design for the native client.
file01 file02 file03
・・・
GlusterFS Storage Nodes
file01, file02, file03
GlusterFS
Native Client
GlusterFS Volume
Native client sees the volume
as a single filesystem
The real locations of files are
calculated on the client side.
11. Red Hat K.K. All rights reserved. 11
CIFS/NFS usecase for GlusterFS
The downside of the native client is it's not available for Unix/Windows.
●
You need to rely on CIFS/NFS for Unix/Windows clients.
●
In that case, windows file lock sharing and floating IP feature are not in
GlusterFS. It should be provided with an external tool.
CTDB is the tool for it ;-)
・・・
CIFS/NFS Client
CIFS/NFS client connects to
just one specified node.
GlusterFS storage node acts
as a proxy “client”.
Different clients can connect to
different nodes.
DNS round-robin may work for it.
12. Red Hat K.K. All rights reserved. 12
Network topology overview without CTDB
Storage Nodes
CIFS/NFS Clients
GlusterFS interconnect
CIFS/NFS Access segment
...
If you don't need the floating IP/Windows file lock, you can go without CTDB.
●
NFS file lock sharing (DNLM) is provided by GlusterFS's internal NFS server.
Although it's not mandatory, you can separate CIFS/NFS access segment from
the GlusterFS interconnect for the sake of network performance.
Samba Samba Samba Samba
glusterd glusterd glusterd glusterd
13. Red Hat K.K. All rights reserved. 13
Network topology overview with CTDB
Storage Nodes
CIFS/NFS Clients
GlusterFS interconnect
CIFS/NFS access segment
...
If you use CTDB with GlusterFS, you need to add an independent CTDB
interconnect (heartbeat) segment for the reliable cluster.
●
The reason will be explained later.
CTDB interconnect
(Heartbeat)
14. Red Hat K.K. All rights reserved. 14
Demo - Seeing is believing!
http://www.youtube.com/watch?v=kr8ylOBCn8o
15. Red Hat K.K. All rights reserved.
CTDB split-brain resolution
16. Red Hat K.K. All rights reserved. 16
What's CTDB split-brain?
When heartbeat is cut-off from any reason (possibly network problem) while cluster nodes
are still running, there must be some mechanism to choose which "island" should survive
and keep running.
●
Without this mechanism, the same floating IP's are assigned on both islands. This is not specific
to CTDB, every cluster system in the world needs to take care of the “split-brain”.
In the case of CTDB, a master node is elected though the "lock file" on the shared
filesystem. An island with the master node survives. Especially, in the case of GlusterFS,
the lock file is stored on the dedicated GlusterFS volume, called "lock volume".
●
The lock volume is locally mounted on each storage node. If you share the CTDB interconnect with
GlusterFS interconnect, access to the lock volume is not guaranteed when the heartbeat is cut-
off, resulting in an unpredictable condition.
Storage Nodes
GlusterFS interconnect
CTDB interconnect
(Heartbeat)
Lock Volume
Master
The master takes an exclusive
lock on the lock file.
17. Red Hat K.K. All rights reserved. 17
Typical volume config seen from storage node
# df
Filesystem 1Kblocks Used Available Use% Mounted on
/dev/vda3 2591328 1036844 1422852 43% /
tmpfs 510288 0 510288 0% /dev/shm
/dev/vda1 495844 33450 436794 8% /boot
/dev/mapper/vg_brickslv_lock
60736 3556 57180 6% /bricks/lock
/dev/mapper/vg_brickslv_brick01
1038336 33040 1005296 4% /bricks/brick01
localhost:/lockvol 121472 7168 114304 6% /gluster/lock
localhost:/vol01 2076672 66176 2010496 4% /gluster/vol01
# ls l /gluster/lock/
total 2
rwrr. 1 root root 294 Apr 26 15:43 ctdb
rw. 1 root root 0 Apr 26 15:57 lockfile
rwrr. 1 root root 52 Apr 26 15:56 nodes
rwrr. 1 root root 96 Apr 26 15:04 public_addresses
rwrr. 1 root root 218 Apr 26 16:31 smb.conf
Locally mounted
lock volume.
Locally mounted data volume,
exported with Samba.
Lock file to elect the master.
Common config files can be
placed on the lock volume.
18. Red Hat K.K. All rights reserved. 18
What about sharing CTDB interconnect with
the access segment?
No, it doesn't work.
When NIC for the access segment fails, the cluster detects the heartbeat failure
and elects a master node through the lock file on the shared volume. However if
the NIC failed node has the lock, it becomes the master although it doesn't serve
to clients.
●
In reality, CTDB event monitoring detects the NIC failure and the node becomes "CTDB
UNHEALTHY" status, too.
19. Red Hat K.K. All rights reserved. 19
CTDB event monitoring
CTDB provides a custom event monitoring mechanism which can be used to
monitor application status, NIC status, etc...
●
Monitoring scripts are stored in /etc/ctdb/events.d/
●
They need to implement handlers to pre-defined events.
●
They are called in the order of file name when some event occurs.
●
Especially, "monitor" event is issued every 15seconds. If the "monitor" handler of some
script exits with non-zero return code, the node becomes "UNHEALTHY", and will be
rejected from the cluster.
●
For example, “10.interface” checks the link status of NIC on which floating IP is
assigned.
●
See README for details - http://bit.ly/14KOjlC
# ls /etc/ctdb/events.d/
00.ctdb 11.natgw 20.multipathd 41.httpd 61.nfstickle
01.reclock 11.routing 31.clamd 50.samba 70.iscsi
10.interface 13.per_ip_routing 40.vsftpd 60.nfs 91.lvs
20. Red Hat K.K. All rights reserved.
Configuration steps for demo set-up
21. Red Hat K.K. All rights reserved. 21
Step1 – Install RHEL6.4
Install RHEL6.4 on storage nodes.
●
Scalable File System Add-On is required for XFS.
●
Resilient Storage Add-On is required for CTDB packages.
Configure public key ssh authentication between nodes.
●
This is for an administrative purpose.
Configure network interfaces as in the configuration pages.
192.168.122.11 gluster01
192.168.122.12 gluster02
192.168.122.13 gluster03
192.168.122.14 gluster04
192.168.2.11 gluster01c
192.168.2.12 gluster02c
192.168.2.13 gluster03c
192.168.2.14 gluster04c
192.168.1.11 gluster01g
192.168.1.12 gluster02g
192.168.1.13 gluster03g
192.168.1.14 gluster04g
/etc/hosts
NFS/CIFS Access Segment
CTDB Interconnect
GlusterFS Interconnect
22. Red Hat K.K. All rights reserved. 22
Step1 – Install RHEL6.4
Configure iptables on all nodes
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
A INPUT m state state ESTABLISHED,RELATED j ACCEPT
A INPUT p icmp j ACCEPT
A INPUT i lo j ACCEPT
A INPUT m state state NEW m tcp p tcp dport 22 j ACCEPT
A INPUT m state state NEW m tcp p tcp dport 111 j ACCEPT
A INPUT m state state NEW m tcp p tcp dport 139 j ACCEPT
A INPUT m state state NEW m tcp p tcp dport 445 j ACCEPT
A INPUT m state state NEW m tcp p tcp dport 24007:24050 j ACCEPT
A INPUT m state state NEW m tcp p tcp dport 38465:38468 j ACCEPT
A INPUT m state state NEW m tcp p tcp dport 4379 j ACCEPT
A INPUT j REJECT rejectwith icmphostprohibited
A FORWARD j REJECT rejectwith icmphostprohibited
COMMIT
/etc/sysconfig/iptables
# vi /etc/sysconfig/iptables
# service iptables restart
CTDB
CIFS
portmap
NFS/NLM
Bricks
CIFS
23. Red Hat K.K. All rights reserved. 23
Step2 – Prepare bricks
Create and mount brick directories on all nodes.
# pvcreate /dev/vdb
# vgcreate vg_bricks /dev/vdb
# lvcreate n lv_lock L 64M vg_bricks
# lvcreate n lv_brick01 L 1G vg_bricks
# yum install y xfsprogs
# mkfs.xfs i size=512 /dev/vg_bricks/lv_lock
# vi mkfs.xfs i size=512 /dev/vg_bricks/lv_brick01
# echo '/dev/vg_bricks/lv_lock /bricks/lock xfs defaults 0 0' >> /etc/fstab
# echo '/dev/vg_bricks/lv_brick01 /bricks/brick01 xfs defaults 0 0' >> /etc/fstab
# mkdir p /bricks/lock
# mkdir p /bricks/brick01
# mount /bricks/lock
# mount /bricksr/brick01
/dev/vdb
lv_lock
lv_brick01
vg_bricks
Mount on /bricks/lock, used for lock volume.
Mount on /bricks/brick01, used for data volume.
24. Red Hat K.K. All rights reserved. 24
Step3 – Install GlusterFS and create volumes
Install GlusterFS packages on all nodes
# wget O /etc/yum.repos.d/glusterfsepel.repo
http://download.gluster.org/pub/gluster/glusterfs/3.3/3.3.1/RHEL/glusterfsepel.repo
# yum install y rpcbind glusterfsserver
# chkconfig rpcbind on
# service rpcbind start
# service glusterd start
# gluster peer probe gluster02g
# gluster peer probe gluster03g
# gluster peer probe gluster04g
# gluster vol create lockvol replica 2
gluster01g:/bricks/lock gluster02g:/bricks/lock
gluster03g:/bricks/lock gluster04g:/bricks/lock
# gluster vol start lockvol
# gluster vol create vol01 replica 2
gluster01g:/bricks/brick01 gluster02g:/bricks/brick01
gluster03g:/bricks/brick01 gluster04g:/bricks/brick01
# gluster vol start vol01
Do not auto start glusterd
with chkconfig.
Need to specify
GlusterFS interconnect NICs.
Configure cluster and create volumes from gluster01
25. Red Hat K.K. All rights reserved. 25
Step4 – Install and configure Samba/CTDB
●
Create the following config files on the shared volume.
# yum install y samba sambaclient ctdb
# mkdir p /gluster/lock
# mount t glusterfs localhost:/lockvol /gluster/lock
Do not auto start smb
and ctdb with chkconfig.
CTDB_PUBLIC_ADDRESSES=/gluster/lock/public_addresses
CTDB_NODES=/etc/ctdb/nodes
# Only when using Samba. Unnecessary for NFS.
CTDB_MANAGES_SAMBA=yes
# some tunables
CTDB_SET_DeterministicIPs=1
CTDB_SET_RecoveryBanPeriod=120
CTDB_SET_KeepaliveInterval=5
CTDB_SET_KeepaliveLimit=5
CTDB_SET_MonitorInterval=15
/gluster/lock/ctdb
# yum install y rpcbind nfsutils
# chkconfig rpcbind on
# service rpcbind start
Install Samba/CTDB packages on all nodes
If you use NFS, install the following packages, too.
Configure CTDB and Samba only on gluster01
26. Red Hat K.K. All rights reserved. 26
Step4 – Install and configure Samba/CTDB
192.168.2.11
192.168.2.12
192.168.2.13
192.168.2.14
/gluster/lock/nodes
192.168.122.201/24 eth0
192.168.122.202/24 eth0
192.168.122.203/24 eth0
192.168.122.204/24 eth0
/gluster/lock/public_addresses
[global]
workgroup = MYGROUP
server string = Samba Server Version %v
clustering = yes
security = user
passdb backend = tdbsam
[share]
comment = Shared Directories
path = /gluster/vol01
browseable = yes
writable = yes
/gluster/lock/smb.conf
CTDB cluster nodes.
Need to specify CTDB interconnect NICs.
Floating IP list.
Samba config.
Need to specify “clustering = yes”
27. Red Hat K.K. All rights reserved. 27
Step4 – Install and configure Samba/CTDB
Set SELinux permissive for smbd_t on all nodes due to the non-standard smb.conf location.
●
We'd better set an appropriate seculity context, but there's an open issue for using chcon with
GlusterFS.
●
https://bugzilla.redhat.com/show_bug.cgi?id=910380
# mv /etc/sysconfig/ctdb /etc/sysconfig/ctdb.orig
# mv /etc/samba/smb.conf /etc/samba/smb.conf.orig
# ln s /gluster/lock/ctdb /etc/sysconfig/ctdb
# ln s /gluster/lock/nodes /etc/ctdb/nodes
# ln s /gluster/lock/public_addresses /etc/ctdb/public_addresses
# ln s /gluster/lock/smb.conf /etc/samba/smb.conf
# yum install y policycoreutilspython
# semanage permissive a smbd_t
Create symlink to config files on all nodes.
28. Red Hat K.K. All rights reserved. 28
Step4 – Install and configure Samba/CTDB
Create the following script for start/stop services
#!/bin/sh
function runcmd {
echo exec on all nodes: $@
ssh gluster01 $@ &
ssh gluster02 $@ &
ssh gluster03 $@ &
ssh gluster04 $@ &
wait
}
case $1 in
start)
runcmd service glusterd start
sleep 1
runcmd mkdir p /gluster/lock
runcmd mount t glusterfs localhost:/lockvol /gluster/lock
runcmd mkdir p /gluster/vol01
runcmd mount t glusterfs localhost:/vol01 /gluster/vol01
runcmd service ctdb start
;;
stop)
runcmd service ctdb stop
runcmd umount /gluster/lock
runcmd umount /gluster/vol01
runcmd service glusterd stop
Runcmd pkill glusterfs
;;
esac
ctdb_manage.sh
29. Red Hat K.K. All rights reserved. 29
Step5 – Start services
Now you can start/stop services.
●
After a few moments, ctdb status becomes “OK” for all nodes.
●
And floating IP's are configured on each node.
# ./ctdb_manage.sh start
# ctdb status
Number of nodes:4
pnn:0 192.168.2.11 OK (THIS NODE)
pnn:1 192.168.2.12 OK
pnn:2 192.168.2.13 OK
pnn:3 192.168.2.14 OK
Generation:1489978381
Size:4
hash:0 lmaster:0
hash:1 lmaster:1
hash:2 lmaster:2
hash:3 lmaster:3
Recovery mode:NORMAL (0)
Recovery master:1
# ctdb ip
Public IPs on node 0
192.168.122.201 node[3] active[] available[eth0] configured[eth0]
192.168.122.202 node[2] active[] available[eth0] configured[eth0]
192.168.122.203 node[1] active[] available[eth0] configured[eth0]
192.168.122.204 node[0] active[eth0] available[eth0] configured[eth0]
30. Red Hat K.K. All rights reserved. 30
Step5 – Start services
Set samba password and check shared directories via one of floating IP's.
# pdbedit a u root
new password:
retype new password:
# smbclient L 192.168.122.201 U root
Enter root's password:
Domain=[MYGROUP] OS=[Unix] Server=[Samba 3.6.9151.el6]
Sharename Type Comment
share Disk Shared Directories
IPC$ IPC IPC Service (Samba Server Version 3.6.9151.el6)
Domain=[MYGROUP] OS=[Unix] Server=[Samba 3.6.9151.el6]
Server Comment
Workgroup Master
Password DB is shared
by all hosts in the cluster.
31. Red Hat K.K. All rights reserved. 31
Configuration hints
To specify the GlusterFS interconnect segment, "gluster peer probe" should be done for
the IP addresses on that segment.
To specify the CTDB interconnect segment, IP addresses on that segment should be
specified in "/gluster/lock/nodes" (symlink from "/etc/ctdb/nodes").
To specify the NFS/CIFS access segment, NIC names on that segment should be specified in
"/gluster/lock/public_addresses" (symlink from "/etc/ctdb/public_addresses") associated
with floating IP's.
To restrict NFS accesses for a volume, you can use “nfs.rpc-auth-allow” and “nfs.rpc-
auth-reject” volume options. (reject supersedes allow.)
The following tunables in "/gluster/lock/ctdb" (symlink from "/etc/sysconfig/ctdb") may
be useful for adjusting the CTDB failover timings. See the ctdbd man page for details.
●
CTDB_SET_DeterministicIPs=1
●
CTDB_SET_RecoveryBanPeriod=300
●
CTDB_SET_KeepaliveInterval=5
●
CTDB_SET_KeepaliveLimit=5
●
CTDB_SET_MonitorInterval=15
33. Red Hat K.K. All rights reserved. 33
Summary
CTDB is the tool well combined with CIFS/NFS usecase for GlusterFS.
Network design is crucial to realize the reliable cluster, not only for
CTDB but also for every cluster in the world ;-)
Enjoy!
And one important fine print....
●
Samba is not well tested on the large scale GlusterFS cluster. The use of
CIFS as a primary access protocol on Red Hat Storage Server 2.0 is not
officially supported by Red Hat. This will be improved in the future versions.
34. Red Hat K.K. All rights reserved.
WE CAN DO MORE
WHEN WE WORK TOGETHER
THE OPEN SOURCE WAY