HeartBeat Realizes Nginx High Available

Introduction

Heartbeat is a well-known HA project. After 3.0, Heartbeat was split into two independent projects, Heartbeat and Pacemaker. Pacemaker uses Corosync as the message layer in subsequent development, and is closely integrated with Corosync, while also retaining Heartbeat as an optional message layer. Heartbeat and corosync are both the Cluster Messaging in high-availability clusters Layer (Cluster Information Layer) is mainly used to transmit cluster information and heartbeat information. It has no resource management function. Resource management has to rely on the upper crm (Cluster resource Manager), the most famous resource manager. , Is pacemaker. Now corosync+pacemaker has become the best combination in a high-availability cluster.

This article first introduces the HA package Heartbeat to achieve high availability for nginx

Environment:

vm3 172.16.1.203
vm4 172.16.1.204
VIP 172.16. 1.200

heartbeat version: 3.0. 4
nginx Version: 1.6.3

All the following steps are required for both HA nodes< /span>

1. Install heartbeat and nginx
 
yum install heartbeatyum install nginx pre>

Second, configuration< /strong>
1, /etc/ha.d/authkeys < /div>
 
auth 2#1 crc2 sha1 HI!#3 md5 Hello!

2, /etc/ ha.d/ ha.cf
Note ucast parameter, you need to configure the IP address of the opposite end, that is, ha.cf on vm3 is configured as the ip address of vm4, and vice versa
 
debugfile /var/log/ha-debuglogfile /var/log/ha-loglogfacility local0keepalive 2deadtime 30< /span>warntime 10initdead 60udpport 694ucast eth0 < /span>172.16.1.204auto_failback offnode vm3node vm4ping 172.16.1.200 respawn hacluster /usr/lib64/heartbeat/ipfail

3. /etc/ ha.d/ haresources

The first paragraph is the host name of the node where the configuration file is located, nginx is /etc/ha.d/resource.d
 
vm3 IPaddr ::172.16< /span>.1.200/24< span class="pun" style="color:rgb(147,161,16 1)">/eth0:0 nginx
4. /etc/ha.d/resource.d/nginx

 
ln -s span>/etc/init.d/nginx /etc/ha.d/resource.d/nginx

3. Startup

 
service heartbeat start

< div> Heartbeat log: /var/log/ha-log and /var/log/ha-debug, you can also see it in /var/log/message

There may be a 1~2 minute delay from start to effective (and ha. The initdead parameter in cf is related, the unit is second ), At the beginning, the last few lines on /var/log/ha-log on vm3 are as follows
 
Aug 04< span class="pln" style="color:rgb(72,72,76)"> 1 4:39< /span>:42 vm3 heartbeat: [113707]: info: Link vm4:eth0 up.Aug 04 14:39:42 vm3 heartbeat: [113707]: info: Status update for node vm4: status upharc< /span>(default< /span>)[113716 span>]: span>2015/ 08/04_14:39:42 info: Running /etc/ha.d//rc.d/status status

< /div>

After it takes effect, the log is refreshed as follows, and then the floating IP 172.16.1.200 can be pinged.

 
< /div>
< div class="L7" style="color:rgb(190,190,197); line-height:18px; padding-left:0px; list-style-type:none">
Aug 04 14:41:40 vm3 heartbeat: [113707]: WARN: node 172.16.1.200: is deadharc(default)[113748]: 2015/08/04_14:41:40 info: Running /etc/ha.d//rc.d/status statusAug 04 14:41:40 vm3 heartbeat: [113707]: info: Comm_now_up(): updating status to activeAug 04 14:41:40 vm3 heartbeat: [113707]: info: Local status now set to: 'active'Aug 04 14:41:40 vm3 heartbeat: [113707]: info: Starting child client "/usr/lib64/heartbeat/ipfail" (491,490) Aug 04 14:41:40 vm3 heartbeat: [113774]: info: Starting "/usr/lib64/heartbeat/ipfail" as uid 491 gid 490 (pid 113774)Aug 04 14:41:43 vm3 heartbeat: [113707]: info: Status update for node vm4: status activeharc(default)[113777]: 2015/08/04_14:41:43 info: Running /etc/ha.d//rc.d/status statusAug 04 14:41:45 vm3 ipfail: [113774]: info: Status update: Node vm4 now has status activeAug 04 14:41:47 vm3 ipfail: [113774]: info: Asking other side for ping node count.Aug 04 14:41:50 vm3 ipfail: [113774]: info: No giveup timer to abort.Aug 04 14:41:53 vm3 heartbeat: [113707]: info: remote resource transition completed.Aug 04 14:41:53 vm3 heartbeat: [113707]: info: remote resource transition completed .Aug 04 14:41:53 vm3 heartbeat: [113707]: info: Initial resource acquisition complete (T_RESOURCES(us))/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.1.200)[113830]: 2015/08/04_14:41:53 INFO: Resource is stoppedAug 04 14:41:53 vm3 heartbeat: [113794]: info: Local Resource acquisition completed.harc(default)[113913]: 2015/08/04_14:41:53 info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-respip-request-resp(default)[113913]: 2015/08/04_14:41:53 received ip-request-resp IPaddr::172.16.1.200/24/eth0 OK yesResourceManager(default)[113936]: 2015/08/04_14:41:53 info: Acquiring resource group: vm3 IPaddr::172.16.1.200/24/eth0 nginx/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.1.200)[113964]: 2015/08/04 _14:41:54 INFO: Resource is stoppedResourceManager(default)[113936]: 2015/08/04_14:41:54 info: Running /etc/ha.d/resource.d/IPaddr 172.16.1.200/24/eth0 startIPaddr(IPaddr_172.16.1.200)[114089]: 2015/08/04_14:41:54 INFO: Adding inet address 172.16.1.200/24 with broadcast address 172.16.1.255 to device eth0IPaddr(IPaddr_172.16.1.200)[114089]: 2015/08/04_14:41:54 INFO: Bringing device eth0 upIPaddr(IPaddr_172.16.1.200)[114089]: 2015/08/04_14:41:54 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var /run/resource-agents/send_arp-172.16.1.200 eth0 172.16.1.200 auto not_used not_used/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.1.200)[114063]: 2015/08/04_14:41:54 INFO:< /span> SuccessResourceManager(default)[113936]: 2015/08/04_14:41:54 info: Running /etc/ha.d/resource.d/nginx startAug 04 14:41:54 vm3 heartbeat: [113707]: info: Link 172.16.1.200:172.16.1.200 up.Aug 04 14:< /span>41:54 vm3 heartbeat: [113707]: WARN: Late heartbeat: Node 172.16.1.200: interval 134490 msAug 04 14:41:54 vm3 ipfail: [113774]: info: Link Status update: Link 172.16.1.200/172.16.1.200 now has status upAug 04 14:41:54 vm3 heartbeat: [113707]: info: Status update for node 172.16.1.200: status pingAug 04 14:41:54 vm3 ipfail: [113774]: info: Status u pdate: Node 172.16.1.200 now has status pingAug 04 14:41:54 vm3 ipfail: [113774]: info: A ping node just came up.Aug 04 14:41:55 vm3 ipfail: [113774]: info: Asking other side for ping node count.Aug 04 14:41:59 vm3 ipfail: [113774]: info: Ping node count is ba lanced.Aug 04 14:41:59 vm3 ipfail: [113774]: info: No giveup timer to abort.

四、验证
使用curl浮动ip访问nginx
         
curl "http://172.16.1.200"

观察nginx日志/var/log/nginx/access.log,发现上面访问到vm3的nginx,关闭vm3的heartbeat

         
service heartbeat stop

再访问,观察nginx发现,访问到了vm4上面

         
curl "http://172.16.1.200"

五、总结
单独使用heartbeat,确实能够做到nginx的HA浮动IP切换,但是按照上面的测试场景,如果只是浮动IP所在节点的nginx服务被关闭了,而heartbeat进程正常的话,浮动IP并不会切换,导致nginx不可用,解决方案是结合heartbeat的CRM软件(Cluster Resource Manager),在HA管理层检测服务健康状态,拉起down掉的服务。

vm3     172.16.1.203

vm4     172.16.1.204

VIP       172.16.1.200


heartbeat 版本:3.0.4

nginx       版本:1.6.3


以下所有步骤都是两个HA节点都需要做


一、安装heartbeat和nginx

     
yum install heartbeatyum install nginx

二、配置

1、/etc/ha.d/authkeys

     
auth 2#1 crc2 sha1 HI!#3 md5 Hello!

2、/etc/ ha.d/ ha.cf

注意ucast参数,需要配置对端的IP地址,即vm3上的ha.cf配置为vm4的ip地址,反之亦然

        
debugfile /var/log/ha-debuglogfile /var/log/ha-loglogfacility local0keepalive 2deadtime 30warntime 10initdead 60udpport 694ucast eth0 172.16.1.204auto_failback offnode vm3node vm4ping 172.16.1.200respawn hacluster /usr/lib64/heartbeat/ipfail

3、 /etc/ ha.d/ haresources

       
debugfile /var/log/ha-debuglogfile /var/log/ha-loglogfacility local0keepalive 2deadtime 30warntime 10initdead 60udpport 694ucast eth0 172.16.1.204auto_failback offnode vm3node vm4ping 172.16.1.200respawn hacluster /usr/lib64/heartbeat/ipfail

3、 /etc/ ha.d/ haresources

第一段为配置文件所在节点的主机名,nginx为/etc/ha.d/resource.d下的脚本,下一步会提到

        
vm3 IPaddr::172.16.1.200/24/eth0 :0 nginx

4、/etc/ha.d/resource.d/nginx

       
vm3 IPaddr::172.16.1.200 /24/eth0:0 nginx

4、/etc/ha.d/resource.d/nginx

        
ln -s /etc/init.d/nginx /etc/ha.d/resource.d/nginx

三、启动

       
ln -s /etc/init.d/nginx /etc/ha.d/resource.d/nginx

三、启动

        
service heartbeat start

heartbeat日志:/var/log/ha-log和/var/log/ha-debug,/var/log/message中也能看到

       
service heartbeat start

heartbeat日志:/var/log/ha-log和/var/log/ha-debug,/var/log/message中也能看到

启动到生效可能会有1~2分钟延迟(和ha.cf中的initdead参数有关,单位秒 ), 一开始在vm3上的/var/log/ha-log上最后几行如下

        
Aug 04 14:39:42 vm3 heartbeat: [113707]: info: Link vm4:eth0 up.Aug 04 14:39:42 vm3 heartbeat: [113707]: info: Status update for node vm4: status upharc(default)[113716]: 2015/08/04_14:39:42 info: Running /etc/ha.d//rc.d/status status

生效后,日志刷到如下,这时就能ping通浮动IP172.16.1.200了

       
Aug 04 14:39:42 vm3 heartbeat: [113707]: info: < /span>Link vm4:eth0 up.Aug 04 14:39:42 vm3 heartbeat: [113707]: info: Status update for node vm4: status upharc(default)[113716]: 2015/08/04_14:39:42 info: Running /etc /ha.d//rc.d/status status

生效后,日志刷到如下,这时就能ping通浮动IP172.16.1.200了

        
Aug 04 14:41:40 vm3 heartbeat: [113707]: WARN: node 172.16.1.200: is deadharc(default)[113748]: 2015/08/04_14:41:40 info: Running /etc/ha.d//rc.d/status statusAug 04 14:41:40 vm3 heartbeat: [113707]: info: Comm_now_up(): updating status to activeAug 04 14:41:40 vm3 heartbeat: [113707]: info: Local status now set to: 'active'Aug 04 14:41:40 vm3 heartbeat: [113707]: info: Starting child client "/usr/lib64/heartbeat/ipfail" (491,490)Aug 04 14:41:40 vm3 heartbeat: [113774]: info: Starting "/usr/lib64/heartbeat/ipfail" as uid 491 gid 490 (pid 113774)Aug 04 14:41:43 vm3 heartbeat: [113707]: info: Status update for node vm4: status activeharc(default)[113777]: 2015/08/04_14:41:43 info: Running /etc/ha.d//rc.d/status statusAug 04 14:41:45 vm3 ipfail: [113774]: info: Status update: Node vm4 now has status activeAug 04 14:41:47 vm3 ipfail: [113774]: info: Asking other side for ping node count.Aug 04 14:41:50 vm3 ipfail: [113774]: info: No giveup timer to abort.Aug 04 14:41:53 vm3 heartbeat: [113707]: info: remote resource transition completed.Aug 04 14:41:53 vm3 heartbeat: [113707]: info: remote resource transition completed.Aug 04 14:41:53 vm3 heartbeat: [113707]: info: Initial resource acquisition complete (T_RESOURCES(us))/ usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.1.200)[113830]: 2015/08/04_14:41:53 INFO: Resource is stoppedAug 04 14:41:53 vm3 heartbeat: [113794]: info: Local Resource acquisition completed.harc(default)[113913]: 2015/08/04_14:41:53 info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-respip- request-resp(default)[113913]: 2015/08/04_14:41:53 received ip-request-resp IPaddr::172.16.1.200/24/eth0 OK yesResourceManager(default)[113936]: 2015/08/04_14:41:53 info: Acquiring resource group: vm3 IPaddr::172.16.1.200/24/eth0 nginx/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.1.200)[113964]: 2015/08/04_14:41:54 INFO: Resource is stoppedResourceManager(default)[113936]: 2015/08/04_14:41:54 info: Running /etc/ha.d/resource.d/IPaddr 172.16.1.200/24/eth0 startIPaddr(IPaddr_172.16.1.200)[114089]: 2015/08/04_14:41:54 INFO: Adding inet address 172.16.1.200/24 with broadcast address 172.16.1.255 to device eth0IPaddr(IPaddr_172.16.1.200)[114089]: 2015/08/04_14:41:54 INFO: Bringing device eth0 upIPaddr(IPaddr_172.16.1.200)[114089]: 2015/08/04_14:41:54 INFO: /usr/libexec/heartbeat/send_arp -i 200 - r 5 -p /var/run/resource-agents/send_arp-172.16.1.200 eth0 172.16.1.200 auto not_used not_used/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.1.200)[114063]: 2015/08/04_14:41:54 INFO: SuccessResourceManager(default)[113936]: 2015/08/04_14:41:54 info: Running /etc/ha.d/resource.d/nginx startAug 04 14:41:54 vm3 heartbeat: [113707]: info: Link 172.16.1.200:172.16.1.200 up.Aug 04 14:41:54 vm3 heartbeat: [113707]: WARN: Late heartbeat: Node 172.16.1.200: interval 134490 msAug 04 14:41:54 vm3 ipfail: [113774]: info: Link Status update: Link 172.16.1.200/172.16.1.200 now has status upAug 04 < /span>14:41:54 vm3 heartbeat: [113707]: info: Status update for node 172.16.1.200: status pingAug 04 14:41:54 vm3 ipfail: [113774]: info: Status update: Node 172.16.1.200 now has status pingAug 04 14:41:54 vm3 ipfail: [113774]: info: A ping node just came up.Aug 04 14:41:55 vm3 ipfail: [113774]: info: Asking other side for ping node count.Aug 04 14:41:59 vm3 ipfail: [113774]: info: Ping node count is balanced.Aug 04 14:41:5 9 vm3 ipfail: [113774]: info: No giveup timer to abort.

       
Aug 04 14:41
:40 vm3 heartbeat: [113707]: WARN: node 172.16.1.200: is deadharc(default)[113748]: 2015/08/04_14:41:40 info: Running /etc/ha.d//rc.d/status statusAug 04 14:41:40 vm3 heartbeat: [113707]: info: Comm_now_up(): updating status to activeAug 04 14:41:40 vm3 heartbeat: [113707]: info: Local status now set to: 'active'Aug 04 14:41:40 vm3 heartbeat: [113707]: info: Starting child client "/usr/lib64/heartbeat/ipfail" (491,490)Aug 04 14:41:40 vm3 heartbeat: [113774]: info: Starting "/usr/lib64/heartbeat/ipfail" as uid 491 gid 490 (pid 113774)Aug 04 14:41:43 vm3 heartbeat: [113707]: info: Status update for node vm4: status activeharc(default)[113777]: 2015/08/04_14:41:43 info: Running /etc/ha.d//rc.d/status statusAug 04 14:41:45 vm3 ipfail: [113774]: info: Status update: Node vm4 now has status activeAug 04 14:41:47 vm3 ipfail: [113774]: info: Asking other side for ping node count.Aug 04 14< span class="pun" style="color:rgb(147,161,161)">:41:50 vm3 ipfail: [113774]: info: No giveup timer to abort.Aug 04 14:41:53 vm3 heartbeat: [113707]: info: remote resource transition completed.Aug 04 14:41:53 vm3 heartbeat: [113707]: info: remote resource transition completed.Aug 04 14:41:53 vm3 heartbeat: [113707]: info: Initial resource acquisition complete (T_RESOURCES(us))/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.1.200)[113830]: 2015/08/04_14:41:53 INFO: Resource is stoppedAug 04 < /span>14:41:53 vm3 heartbeat: [113794]: info: Local Resource acquisition completed.harc(default)[113913]: 2015/08/04_14:41:53 info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-respip-request-resp(default)[113913< span class="pun" style="color:rgb(147,161,161)">]: 2015/08/04_14:41:53 received ip-request-resp IPaddr::172.16.1.200/24/eth0 OK yesResourceManager(default)[113936]: 2015/08/04_14:41:53 info: Acquiring resource group: vm3 IPaddr::172.16.1.200/24/eth0 nginx/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.1.200)[113964]: 2015/08/04_14:41:54 INFO: Resource is stoppedResourceManager(default)[113936]: 2015/08/04_14:41:54 info: Running /etc/ha.d/resource.d/IPaddr 172.16.1.200/24/eth0 startIPaddr(IPaddr_172.16.1.200)[114089]: 2015/08/04_14: 41:54 INFO: Adding inet address 172.16.1.200/24 with broadcast address 172.16.1.255 to device eth0IPaddr(IPaddr_172.16.1.200)[114089]: 2015/08/04_14:41:54 INFO: Bringing device eth0 upIPaddr(IPaddr_172.16.1.200)[114089]: 2015/08/04_14:41:54 INFO: /us r/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-172.16.1.200 eth0 172.16.1.200 auto not_used not_used/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.
16.1.200)[114063]: 2015/08/04_14:41:54 INFO: SuccessResourceManager(default)[113936]: 2015/08/04_14:41:54 info: Running /etc/ha.d/resource.d/nginx startAug 04 14:41:54 vm3 heartbeat: [113707]: info: Link 172.16.1.200:172.16.1.200 up.Aug 04 14:41:54 vm3 heartbeat: [113707]: WARN: Late heartbeat: Node 172.16.1.200: interval 134490 msAug 04 14:41:54 vm3 ipfail: [113774]: info: Link Status update: Link 17 2.16.1.200/172.16.1.200 now has status upAug 04 14:41:54 vm3 heartbeat: [113707]: info: Status update for node 172.16.1.200: status pingAug 04 14:41:54 vm3 ipfail: [113774]: info: Status update: Node 172.16.1.200 now has status pingAug 04 14:41:54 vm3 ipfail: [113774]: info: A ping node just came up.Aug 04 14:41:55 vm3 ipfail: [113774]: info: Asking other side for ping node count.Aug 04 14:41:59 vm3 ipfail: [113774]: info: Ping node count is balanced.Aug 04 14:41:59 vm3 ipfail: [113774]: info: No giveup timer to abort.

四、验证

使用curl浮动ip访问nginx

        
curl "http://172.16.1.200"

观察nginx日志/var/log/nginx/access.log,发现上面访问到vm3的nginx,关闭vm3的heartbeat

       
curl "http://172.16.1.200"

观察nginx日志/var/log/nginx/access.log,发现上面访问到vm3的nginx,关闭vm3的heartbeat

        
service heartbeat stop

再访问,观察nginx发现,访问到了vm4上面

       
service heartbeat stop

再访问,观察nginx发现,访问到了vm4上面

        
curl "http://172.16.1.200"

       
curl "http://172.16.1.200"

五、总结

单独使用heartbeat,确实能够做到nginx的HA浮动IP切换,但是按照上面的测试场景,如果只是浮动IP所在节点的nginx服务被关闭了,而heartbeat进程正常的话,浮动IP并不会切换,导致nginx不可用,解决方案是结合heartbeat的CRM软件(Cluster Resource Manager),在HA管理层检测服务健康状态,拉起down掉的服务。

Leave a Comment

Your email address will not be published.