DRBD+HeartBeat+NFS: Configure NFS high availability
Description:
I studied the installation and configuration of DRBD last week, and today I’m studying the first application of DRBD. DRBD+HeartBeat+NFS: Configure the high availability of NFS as the bottom shared storage in the cluster
NFS mainly stores the program code and some image files on the WEB server
span>
< strong>Reference:
http://network.51cto.com /art/201010/230237_all.htm
http://www.voidcn.com/article/p-tgakobll-mb.html
Environment:
[root@scj~]# cat /etc/issueCentOS release 6.4 (Final)Kernel\r onan\m[root@scj~]# uname-r2.6.32-358.el6.i686
dbm135 | 192.168.186.135 | dbm135.51.com | primary | DRBD+HeartBeat+NFS |
dbm134 | 192.168.186.134 | dbm134.51.com | secondary | DRBD+HeartBeat+NFS |
VIP | 192.168.186.150 |
Preparation and installation of DRBD:
Reference: http://www.voidcn.com/article/p-oozt nnqb-qq.html
Install and configure HeartBeat:< /strong>
Install HeartBeat: (dbm135,dbm134)
Use yum to install HeartBeat here (recommended)
Centos6.4 does not default With the HeartBeat software package, you need to install the epel source
[root@scj ~]# cd /usr/local/src/[root@scj src]# wget http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm[root@scj src]# rpm-ivh epel-release-6-8.noarch.rpm [root@scj src]# yum -y install heartbeatLoaded plugins: fastestmirrorLoading mirror speeds from cached hostfileError: Cannot retrieve metalink for repository: epel. Please verify its path and try again
rgb(255,0,0);”> Note:An error was reported during yum installation. Solution:
[root@scj~]# vi /etc/yum.repos.d/epel.repo#Remove all the comments from baseurl lines#Comment out all mirrorlist lines[root@scjsrc]#yum-y install heartbeat
Note: When installing HeartBeat in yum, there are always problems with the first installation, and the second installation will be successful (don’t know why)
When installing HeartBeat in yum, nfs-related software packages and rpcbind will be installed together
Configure HeartBeat: (dbm135,dbm134)
Heartbeat configuration involves the following files: /etc/ha.d/ha.cf #Main configuration file/etc/ha.d/haresources #Resource file/etc/ha.d /authkeys #Authentication related/etc/ha.d/resource.d/killnfsd #nfs startup script, managed by HeartBeat
Edit the main configuration file ha.cf [root@dbm135~]# vi /etc/ha.d/ha.cf (dbm135)logfile /var/log/ha-log# Define the log name and storage location of HA logfacility local0keepalive 2#The heartbeat sending time interval is 2 seconds deadtime 10# The dead time is 10 seconds, and the standby node does not detect the heartbeat of the host within 10 seconds, and confirm that the other party is faulty ucast eth0 192.168.186.134#ucast eth0 192.168.186.134#ucast eth0 192.168.186.134#ucast eth0 192.168.186.134#ucast eth0 192.168.186.134#ucast eth0 Two heartbeat lines on the external network to ensure high availability of the heartbeat auto_failback off# After the server is normal, the new main server takes over the resources, and the other server gives up the resources, because the cost of switching once is high node dbm135.51.com dbm134.51.com #Define node, specify hostname hostname
[root@dbm134~]# vi /etc/ha.d/ha.cf (dbm134)logfile /var /log/ha-log# Define the log name and storage location of HA logfacility local0keepalive 2#The heartbeat sending time interval is 2 seconds deadtime 1 0#Death time is 10 seconds, the standby node does not detect the heartbeat of the host within 10 seconds, and confirm that the other party is faulty ucast eth0 192.168.186.135#ucast eth1 xxx.xxx.xxx.xxx#IP address is designated as the other party’s IP# Use intranet and external Two heartbeat lines on the network to ensure high availability of heartbeat auto_failback off# After the server is normal, the new main server takes over the resources, and the other server abandons the resources because the cost of switching once is high node dbm135.51.com dbm134.51.com# Define the node and specify the hostnamehostname
Edit the two-machine interconnection verification file authkeys:(dbm135,dbm134)
[root@scj~]#vi /etc/ha.d/authkeysauth 11crc
#Need to change/etc/ha.d /authkeys is set to 600 permissions[root@scj ~]# chmod 600 /etc/ha.d/authkeys
Edit the cluster resource file haresources:(dbm135,dbm134)
[root@scj~]# vi /etc/ha.d/haresourcesdbm135.51.com IPaddr::192.168.186.150/24/eth0 drbddisk::r0 Filesystem::/dev/ drbd0::/data::ext4 killnfsd##The contents of this file on the two hosts dbm135 and dbm134 are exactly the same, do not modify dbm134 to dbm134.51.com##Host name is set to the host of the primary node (Primary) at this time name, That is, dbm135.51.com##Ipaddr: bind virtual ip, and bind it on eth0 ##drbddisk: specify drbd resource r0, switch between active and standby##Filesystem: specify drbd device /dev/drbd0, mount Click /data, file system ext4, mount drbd device##killnfsd: Specify the startup script of nfs, managed by heartbeat##drbd When switching between active and standby, HeartBeat will execute the four scripts specified in this resource file File, as shown below
[root@scj~]#vi/etc/ha.d/resource.d/killnfsdkillall-9nfsd;/etc/init.d/nfsrestart;exit0[ root@scj ~]#chmod 755 /etc/ha.d/resource.d/killnfsd
[root@scj~]# cd/etc/ha .d/resource.d/[root@scj resource.d]# ll d rbddisk Filesystem killnfsd IPaddr -rwxr-xr-x 1 root root 3162 Sep 27 2013 drbddisk-rwxr-xr-x 1 root root 1903 Dec 2 2013 Filesystem-rwxr-xr-x 2 2013 root Decwr-xr2-x 1root -x 1 root root 49 Jun 30 12:02 killnfsd##All four script files exist
Configure nfs:< /span> < span style="color:rgb(80,80,80);font-family:'Song Ti', SimSun;line-height:28px;background-color:rgb(255,255,255);">(dbm135,dbm134) span>
Note: nfs related software packages , Has been installed as a dependency package when installing HeartBeat
[root@scj ~]#vi /etc/exports/data 192.168.186.0/255.255.255.0(rw,no_root_squash,sync)[root@scj~]# chkconfig rpcbind on[root@scj~]#chkconfig nfs off #nfs do not need to set up boot Automatically start, because the start of nfs is managed by heartbeat [root@scj~]#/etc/init.d/rpcbind startStarting rpcbind: rpcbind: rpcbind: rpcbind: No need to start by heartbeat]#pre-heartbeat
< /p>< br>
Start HeartBeat:(dbm135,dbm134)
Note:Start on the primary node first(dbm135 is primary)
[root@scj~]# /etc/init.d/heartbeat start[root@scj~]#chkconfig heartbeat on[root@scj~]# ps-ef| grep heartbeatroot 1854 1 0 12 :33 00:00:00 heartbeat: master control processroot 1858 1854 0 12:33 00:00:00 heartbeat: FIFO reader 00:00: 00 heartbeat 1854 0 0 0 0 0 0 0 0 0 12:33 heartbeat: 1854 0 heartbeat: 1854 0 0 0 0 12:33? 0 12:33 00:00:00 heartbeat: read: ucast eth0 root 2057 2034 0 12:33 00:00:00 /bin/sh /usr/share/heartbeat/ResourceManager takegroup IPaddr::192.168.186.150/24 /eth0root 2283 1 0 12:33 00:00:00 /bin/sh /usr/lib/ocf/resource.d//heartb eat/IPaddr startroot 2286 2283 0 12:33 00:00:00 /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-192.168.186.150 eth0 auto not_used not_usedroot 2471 2057 0 12:33 00:00:00 /bin/sh /usr/share/heartbeat/ResourceManager takegroup IPaddr::192.168.186.150/24/eth0roots 2566 12:00:00 pts 0 :00 grep heartbeat[root@dbm135~]# ps -ef| grep nfs #Check whether the nfs on the master node has started root 2493 2 0 17:59 2 0 17:59 0 17:59 2 0 0:00:00 [nfsd4]root 2 0 0 0 0 0 0 00:00:00 [nfsd4_callbacks]root 2495 2 0 17:59 00:00:00 [nfsd]root 2 0 17:59 0 497 2 0 17:59 2 0 17:00:00 nfs 2 00:00:00 root 00:00 [nfsd]root 2498 2 0 17:59 00:00:00 [nfsd]root 2 0 17:59 2 0 0 17:59 0 17:59 2 00:00 00:00:00 [nfsd]root 0 0 17:59 [nfsd]root 00 [nfsd]root 2501 2 0 17:59 00:00:00 [nfsd]root 2 0 17:59 00:00:0 0 [nfsd]root 2530 1528 0 17:59 pts/1 00:00:00 grep nfs
Test:
Note:After repeated tests, it is found that it usually takes 1-3 seconds to switch between active and standby , It is acceptable to users
The client mounts the nfs shared directory, and the server is active and standby After switching, you don’t need to remount it, you only need to wait 1-3 seconds before you can use it normally
Test 1: Whether the remote server can be successfully mounted
nfs Server, dbm135, Primary:
[root@dbm135~]#cd/data/[root@dbm135data]#lltotal 16-rw-r- -r-- 1 root root 0 Jun 30 10:15 file1-rw-r--r-- 1 root root 0 Jun 30 10:15 file2-rw-r--r-- 1 root root 0 Jun 30 10: 15 file3-rw-r--r-- 1 root root 0 Jun 30 10:15 file4-rw-r--r-- 1 root root 0 Jun 30 10:15 file5-rw-r--r-- 1 root root 0 Jun 30 18:01 file6drwx 2 root root 16384 Jun 30 10:14 lost+foundMount the nfs server shared on another host 192.168.186.131 Directory: < /p>
[root@scj131~]# showmount -e 192.168.186.150 #150 is a virtual ipExport list for 192.168.186.150:/data 192.168.186.0/255.255.255.0 [root@scj131~]#mount 192.168.186.150:/data/data[root@scj131~]#cd/data/[root@scj131data]#lltotal 16-rw-r--r-- 1 root root 0Jun 30 2015 file1-rw-r--r-- 1 root root 0 Jun 30 2015 file2-rw-r--r-- 1 root root 0 Jun 30 2015 file3-rw-r--r-- 1 root root 0 Jun 30 2015 file4-rw-r--r-- 1 root root 0 Jun 30 2015 file5-rw-r--r-- 1 root root 0 Jun 30 2015 file6drwx 2 root root 16384 Jun 30 2015 lost+found## successfully mountedTest 2: Restart the DRBD service on the master node
Restart the DRBD service on the master node, see The change of the secondary node, whether the mount of 192.168.186.131 is still normal
Test 3: The DRBD service of the primary node is stopped
Attach the main node DRBD service stop, look at the change of the Secondary node, whether the 192.168.186.131 mount is still normal
Test 4: The primary node nfs service is stopped
Stop the nfs service on the primary node, and see if the secondary node changes, whether the 192.168.186.131 mount is still normal
##After stopping the nfs service of the primary node, the Secondary node has no change, and 192.168.186.131 cannot be mounted normally## Solution: [root@dbm135~]#vi/opt/monitor/nfs/monitornfs.sh#!/bin/bash#Monitor the operation of the nfs service while truedo drbdstatus=`cat/proc/drbd 2>/dev/null | Grep ro| tail -n1|awk -F':'{print$4}'|awk-F'/'{print$1}'#Judge the status of drbd nfsstatus=`/etc/init.d/nfs status| grep -c running` #Judging whether nfs is running if [-z $drbdstatus]; then sleep 10 elif [$drbdstatus=='Primary' status[$drbdstatus=='Primary'];then 0 ;then #If nfs is not running /etc/init.d/nfs start &> /dev/null #Start nfs service /etc/init.d/nfs start &> /dev/null d/nfs start &> /dev/null status /nfsstatus|grep -crunning#Judging again whether nfs started successfully if [$newnfsstatus-eq 0];then #If nfs is not running, it means it cannot be started/heart/stop/stop/heart/etc/stop. null #stop the heartbeat service, the purpose is to automatically switch to another backup machine /etc/init.d/heartbeat stop/dev/null fi fi fi sleep 5done##Note: Do not put this monitoring script in the /data/ directory. This script will be overwritten when the drbd device is mounted [root@dbm135~]#chmod u+x/opt/monitor/ nfs/monitornfs.sh[root@dbm135~]#nohup bash/opt/monitor/nfs/monitornfs.sh# placed background operation##Don’t forget to set the boot to start automatically< /p>
Test 5: The HeartBeat service of the master node is stopped or restarted
The HeartBeat service is stopped or restarted on the master node to see if the secondary node changes and whether the mount of 192.168.186.131 is still normal
p>
##You can use the following command to view the changes of the Secondary node[root@dbm134~]#cat/proc/drbd#Check whether the node is switched from Secondary to Primary[ root@dbm134~]# df -h #Check whether the drbd device is successfully mounted[root@dbm134~]#ps -ef|grep nfs#Check whether the nfs service is started or restarted (the process number has changed)##After testing, it is found that the main and standby nodes are switched normally, and the client is mounted normallyTest six: Shut down or restart the primary node server
Shut down or restart the primary node server to see if the secondary node changes and whether the mount of 192.168.186.131 is still normal; after the primary node restarts, look at the secondary node (new primary node). ) Changes
##After the primary node is shut down, the primary and standby nodes are switched normally, and the client is mounted normally##After the primary node returns to normal, the primary and standby nodes are not Will switch again, and the Secondary node (new master node) will continue to provide services to the outside worldTest 7: Simulate split-brain
Turn the master node Eth0 is turned off (ifdown et h0), then start eth0 (ifup eth0), at this time both nodes have become StandAlone state
##解脑裂的方法## Backup node: [root@dbm134~]#drbdadm secondary r0[root@dbm134~]#drbdadmdisconnect all[root@dbm134~]#drbdadm--discard-my-dataconnect r0##Master node:[root@ dbm135~]#drbdadmdisconnect all[root@dbm135~]#drbdadmconnectr0[root@dbm135~]#drbdsetup/dev/drbd0primary[root@dbm135~]#mount/dev/drbd0/data/< p> Note: When solving the split brain, after completing all the steps above, sometimes The client will mount normally, sometimes the mount is abnormal; if it is not normal, you can try to restart the heartbeat service of the master node.