Configure HeartBeat with HB_GUI - configuration, gui, HB, Heartbeat

In the past few days, I spent a lot of space explaining the configuration of heartbeat v1 and v2. It is also mentioned that the v2 style format is based on CRM (Cluster Resource Management) Programmatic, and provides GUI (graphical) management tools, which can be used to configure and monitor the status of the HA cluster. However, due to the fact that there are not many official instructions on this part, and the tool is indeed difficult to use. The main lack of prompt description and monitoring of input content. After a few days of exploration, some basic configurations can be achieved. The following is a specific example to illustrate.

1. Architecture
Because it is just beginning to explore this piece For the time being, only basic function implementation methods can be provided. If there is any error in the description, please understand and let me know how to solve it. Thank you!
This architecture is based on our previous test machine:

Quote

hatest3 192.168.228.233
hatest4 192.168.228.234

This cluster is used to achieve high availability of httpd services:

Quote

httpd 192.168 .228.235

two , Functions and limitations
From the actual test point of view, heartbeat can not achieve high In addition to the available functions, there are some differences from the Red Flag HA software.

reference

1. The simulation here is a cluster constructed by two servers, but Heartbeat v2 can actually support more than 2 nodes;
2. The cluster can be in symmetric or asymmetric mode, with different modes, the order and location of application startup in the cluster are not the same;
3. The hearbeat cluster does not seem to strictly distinguish the heartbeat of the internal network and the external network. As long as the heartbeat is connected, the application will not switch;
4. For this reason, it seems that there is no need to monitor the network card, only when the DC cannot communicate with a node The switch will only happen when you get in touch;
5. When there is a problem with the application, it will be restarted locally instead of being switched to another machine.

Of course, heartbeat v2 can also build more complex application clusters to achieve more complex , But this time the presentation did not mention this aspect of the content.

Three, basic configuration
This part of the content, in As mentioned in the previous days, I won’t explain it in detail here. Friends in need can look forward to it.
1, configure the host name and IP
First, set the relevant host name and IP address for each node’s machine, and use uname -n to verify.

reference

# uname -n
hatest3

Then, set the basic configuration file for heartbeat.
The following operations are performed on one of the servers, take hatest3 as an example.
2, configure authkeys
The content is as follows:

Quote

# cd /etc/ha.d/
# cat authkeys
auth 2
1 crc
2 sha1 HI!
3 md5 Hello!
# chmod 600 authkeys

3. Configure ha.cf

Quote

# more /etc/ha.d/ha.cf
keepalive 2
deadtime 30
initdead 30
udpport 694
bcast eth0 # Does not distinguish between intranet and extranet IP, if Need to establish a redundant network heartbeat, it is recommended to use the bond0 device
auto_failback off
node hatest3
node hatest4
ping 192.168.228.153 # Used to test network connectivity, generally set to gateway
use_logd yes
crm respawn
compression bz2
compression_threshold 2

4. Assignment configuration file

Quote

# /usr/lib/heartbeat/ha_propagate
Propagating HA configuration files to node hatest4. ha.cf KB 00:00 key auth/0.7 100% 100% 682 0.7KB/s 00:00
Setting HA startup configuration on node hatest4.

※ The difference from the previous log is that we don’t need to copy the cib.xml file here. (Actually, the file has not been created yet)
Because, when we configure CIB later, CRM will help us automatically synchronize the file.

5. Modify Heartbeat startup script
This step is still indispensable. For details, please see: here.

6. Create a password for hacluster users
Before using hb_gui to access the CRM management interface, you also need to create a password for the hacluster user:

# passwd hacluster

※ This user is only valid at the current node. To view the HA cluster information on other nodes, you need to operate separately.

7. Start heartbeat service
When the above work is completed, that is can be on each node separately , Start the cluster service:

# service heartbeat start

Before using the configuration tool formally, please make sure that your application is running independently on each node.
Please manually activate or stop floating IP, start or stop applications that need to be managed. I have already assumed here that you have carried out similar tests. If you don’t understand this aspect, I suggest you take a look at the previous days: [Original] Install and configure the Red Flag High Availability Server HA 5.0 [2]-Configure the application environment

Four, use graphics configuration tool
1. Start and log in
When you start the heartbeat v2 service for the first time, port 5560 will be opened and the default cib.xml file will be created automatically.
Next, you can use hb_gui to configure, after the hacluster user has set a password Run on the machine:

# hb_gui

After entering the correct user name and password, start to log in:

This is the default configuration content:

2, configure resources
heartbeat supports independent resources, or classifies resources into resource groups.
Generally specific to an application, multiple actions are combined to complete the startup process of an application , This can be done using resource groups.
The default strategy is that the execution order of resources is the same as The resources in a resource group will be executed from top to bottom. Therefore, when we are creating resources, it is best to create them in this order. (Of course, you can modify it later)
So, first click on “Resources” via email, and then select “Add new element”:

Create a resource group:

Enter the name of the group as group_httpd, and select some necessary parameters:

3. Configure IP resources
Next, configure the resources contained in the group.
according to the order mentioned above, we need to add IP resources first:

(you need to give the name of the resource, which resource group it belongs to, and what are its operating parameters)
As for cloning resources and master-slave resources, we will have the opportunity to describe them in detail later.
Here is the result after adding:

4. Configure httpd service resources
Follow the same method to add httpd service resources.
(If your web content is dependent on the enclosure, you may need to add partitions first The method is the same for the loaded resource FileSystem)
At this time, you should choose to add “ordinary resources” :

As mentioned in the previous days, resource scripts can be written in many ways, the most commonly used is LSB and OCF format, which can be seen in the figure in the specific format. In addition, double-clicking a resource will also tell you the role of the resource:

If your resource script Parameters are allowed, and if you need to add parameters, you can provide them here.
Otherwise, just add resources directly:

5. Add monitoring operations to resources
According to the official instructions, each resource script needs to return the running result after running. For example, run start, stop, restart, status, etc. This is the same as we usually run the following command:

Quote

service httpd status

And heartbeat has a default action on the returned result, for example, returning 0 is Success, return 5 for failure, etc. The monitoring action depends on the result returned by the script to judge.
We can select “Add Operation” on the “Operation” interface of the corresponding resource:
span>

Among them, the type of input operation is “monitor”, which means that the operation will be executed regularly, but Enter the monitoring interval, etc.
Special attention should be paid to the “Start Delay” time, due to the start and completion of some resources may be more Slow, and monitoring cannot be started immediately after the start, and a delay time is required. This parameter is used to define the waiting value.
In addition, “Disabled” can be used to set whether the operation takes effect, the default is false, which is valid.

6. Modify the operation that failed to stop
Under normal circumstances, heartbeat is effective for returning values. But in some cases, for example, an application on a certain machine suddenly has a problem, this is the result of a resource monitoring failure that the monitor operation will get from the status action. The default action is to stop the resource, and then restart.
But we know that at this time, it’s impossible to return 0 (indicating success, Because the resource has exited by mistake), the operation performed by heartbeat will report an error and exit, causing the resource to fail to restart.
Therefore, it is necessary to modify the default operation of heartbeat when it fails to stop resources and ignore it instead .
Add another operation:

(Please pay attention to the name of the operation and the action at On Fail)
This is the result after selecting “Apply” to submit:

(Be careful, don’t make a mistake about the resource object for monitoring and operation)

7. Add restrictions
Now that the resources have been added, the remaining operation is to tell the CRM that the resource (group) should run on that node by default.
This can be determined by adding the “location” content of the “restriction”.
First, let the IP run on hatest3, add a new element in “location”, and Select as “location”:

Then give it a name and select the resource group to be restricted:

The result after adding is as follows:

After selecting the corresponding position element, you can add the corresponding in the figure on the right For example, the restricted element runs on the node where the result of uname -n is hatest3:

Here, the weight and the added expression are both important.
Weight:
< div class="quote" style="margin:15px 20px; line-height:16.7999992370605px">

Quote

used to determine the execution priority of different resource groups;
If it is symmetric_cluster (symmetric cluster, you can set it in the “node”-“configuration” section ), the resource (group) can run on any node, and by specifying the weight of a node as -INFINITY, it means that the resource (group) cannot run on the specified node;
If it is not symmetric_cluster, A resource (group) cannot run on any node unless you set the weight of the resource (group) to INFINITY, or a positive weight.
is a positive weight, according to its size, determines the priority of its execution, that is, the order of priority.

Added expression:

Quote

can support #uname, corresponding to the value of node uname -n;
#id, corresponding to the unique ID of each node Value;
#is_dc, which is the only DC node in the cluster;
Then pass the operator eq (equal to), ne (not equal to), etc. to match the input value.

More conditions can be obtained through Boolean operations on the interface The final result of the condition.
※ 限制条件中的顺序、协同是同于控制不同资源组的执行顺序和组合多个资源组的一系列动作的，以后再详细说明。

8、启动资源组
配置完成后，即可启动特定的资源组：

这是启动结果：

（请留意，默认是运行在hatest3上的）
如果您需要调整资源组中的执行顺序，可通过右键点击对应的资源，选择“上移”或“下移”来完成：

9、测试
我们这里做两个简单的测试：
a、停止httpd应用
在hatest3上运行：

# service httpd stop

不用多久，GUI上即会显示资源出错：

同时，后台日志会显示：

引用

crmd[6987]:info: process_lrm_event: LRM operation resource_httpd_app_monitor_15000 (call=6, rc=7) complete
crmd[6987]:info: do_lrm_rsc_op: Performing op=resource_httpd_app_stop_0 key=2:14:0018a200-ca97-4816-98c8-09b0444bdd40)
crmd[6987]:info: process_lrm_event: LRM operation resource_httpd_app_monitor_15000 (call=6, rc=-2) Cancelled
lrmd[6984]:info: rsc:resource_httpd_app: stop
lrmd[7329]:WARN: For LSB init script, no additional parameters are needed.
lrmd[6984]:info: RA output: (resource_httpd_app:stop:stdout) Stopping httpd:
lrmd[6984]:info: RA output: (resource_httpd_app:stop:stdout) [
lrmd[6984]:info: RA out put: (resource_httpd_app:stop:stdout) FAILED
lrmd[6984]:info: RA output: (resource_httpd_app:stop:stdout) ]
lrmd[6984]:info: RA output: (resource_httpd_app:stop:stdout)

crmd[6987]:ERROR: process_lrm_event: LRM operation resource_httpd_app_stop_0 (call=7, rc=1) Error unknown error
crmd[6987]:info: do_lrm_rsc_op: Performing op=resource_httpd_ip_start_0 key=6:15:0018a200-ca97-4816-98c8-09b0444bdd40)
lrmd[6984]:info: rsc:resource_httpd_ip: start

也就是监控到报错后，在当前节点上会自动重启该应用：

引用

lrmd[6984]: info: rsc:resource_httpd_ip: start
IPaddr[7337][7367]: INFO: Using calculated nic for 192.168.228.235: eth0
IPaddr[7337][7372]: INFO: Using calculated netmask for 192.168.228.235: 255.255.255.0
crmd[6987]: info: process_lrm_event: LRM operation resource_httpd_ip_start_0 (call=8, rc=0) complete
crmd[6987]: info: do_lrm_rsc_op: Performing op=resource_httpd_app_start_0 key=7:15:0018a200-ca97-4816-98c8-09b0444bdd40)
lrmd[6984]: info: rsc:resource_httpd_app: start
lrmd[7388]: WARN: For LSB init script, no additional parameters are needed.
lrmd[6984]: info: RA output: (resource_httpd_app:s tart:stdout) Starting httpd:
lrmd[6984]: info: RA output: (resource_httpd_app:start:stdout) httpd: Could not determine the server’s fully qualified domain name, using 192.168.228.233 for ServerName

lrmd[6984]: info: RA output: (resource_httpd_app:start:stdout) [
lrmd[6984]: info: RA output: (resource_httpd_app:start:stdout) OK
lrmd[6984]: info: RA output: (resource_httpd_app:start:stdout) ]
lrmd[6984]: info: RA output: (resource_httpd_app:start:stdout)
lrmd[6984]: info: RA output: (resource_httpd_app:start:stdout)

crmd[6987]: info: process_lrm_event: LRM operation resource_httpd_app_start_0 (call=9, rc=0) complete

可见，重启是对整个资源组进行操作的，并且包括监控的动作。
最后运行结果如下：

b、关闭hatest3
我们可以尝试关闭hatest3，看看资源会否切换到另一个节点hatest4上：

引用

# uname -n
hatest3
# shutdown -h 0

从hatest4上的日子可看到如下的信息：

引用

tengine[6591]: info: te_crm_command: Executing crm-event (16): do_shutdown on hatest3
tengine[6591]: info: te_pseudo_action: Pseudo action 11 fired and confirmed
lrmd[6582]: info: rsc:resource_httpd_ip: start
……
crmd[6585]: info: ccm_event_detail: CURRENT: hatest4 [nodeid=1, born=3]
crmd[6585]: info: ccm_event_detail: LOST: hatest3 [nodeid=0, born=2]
heartbeat[6568]: WARN: node hatest3: is dead
heartbeat[6568]: info: Link hatest3:eth0 dead.
crmd[6585]: notice: crmd_ha_status_callback: Status update: Node hatest3 now has status [dead]
heartbeat[6568]: info: Link hatest3:eth1 dead.

GUI界面如下：

证明切换正常。重新恢复hatest3，资源会根据ha.cf中auto_failback的设置情况，回到或保持在某个节点上。
至此，配置完成。

最终的配置文件：

下载文件

点击这里下载文件

引用

hatest3 192.168.228.233
hatest4 192.168.228.234

引用

hatest3 192.168.228.233
hatest4 192.168.228.234

引用

httpd 192.168.228.235

引用

httpd 192.168. 228.235

引用

1、这里模拟的是由两台服务器构建的集群，但heartbeat v2实际可以支持多于2个节点；
2、集群可以是对称或非对称模式，使用不同的模式，应用在集群中启动的顺序、位置都是不相同的；
3、hearbeat集群似乎没有严格区分内网和外网心跳，只要心跳是通的，应用就不会发生切换；
4、为此，似乎没有对网卡监控的必要，只有当DC无法与某个节点取得联系时，才会发生切换；
5、当应用出现问题时，会从本地重启，而不会切换到其他机器上。

引用

# uname -n
hatest3

引用

# uname -n
hatest3

引用

# cd /etc/ha.d/
# cat authkeys
auth 2
1 crc
2 sha1 HI!
3 md5 Hello!
# chmod 600 authkeys

引用

# cd /etc/ha.d/
# cat authkeys
auth 2
1 crc
2 sha1 HI!
3 md5 Hello!
# chmod 600 authkeys

引用

# more /etc/ha.d/ha.cf
keepalive 2
deadtime 30
initdead 30
udpport 694
bcast   eth0               # 不区分内网和外网IP，若需要建立冗余网络心跳，建议使用bond0设备
auto_failback off
node    hatest3
node    hatest4
ping 192.168.228.153 # 用于测试网络连通性，一般设置为网关
use_logd yes
crm respawn
compression     bz2
compression_threshold 2

引用

# /usr/lib/heartbeat/ha_propagate
Pro pagating HA configuration files to node hatest4.
ha.cf 100% 11KB 10.7KB/s 00:00
authkeys 100% 682 0.7KB/s 00:00
Setting HA startup configuration on node hatest4.

引用

# /usr/lib/heartbeat/ha_propagate
Propagating HA configuration files to node hatest4.
ha.cf 100% 11KB 10.7KB/s 00:00
authkeys 100% 682 0.7KB/s 00:00
Setting HA startup configuration on node hatest4.

# passwd hacluster

# service heartbeat start

# hb_gui

引用

service httpd status

引用

service httpd status

引用

用于决定不同资源组的执行优先级；
如果是symmetric_cluster（对称集群，可在“节点”-“配置”部分设定）的，资源（组）可以在任意节点上运行，并可以通过指定某个节点的权重为-INFINITY，表示该资源（组）不可能运行在指定的节点上；
如果不是symmetric_clu ster的，资源（组）不能在任意节点上运行，除非您给该资源（组）设置权重为INFINITY，或正值的权重。
为正值的权重，根据其大小，决定其执行的优先级，也就是先后顺序。

引用

用于决定不同资源组的执行优先级；
如果是symmetric_cluster（对称集群，可在“节点”-“配置”部分设定）的，资源（组）可以在任意节点上运行，并可以通过指定某个节点的权重为-INFINITY，表示该资源（组）不可能运行在指定的节点上；
如果不是symmetric_cluster的，资源（组）不能在任意节点上运行，除非您给该资源（组）设置权重为INFINITY，或正值的权重。
为正值的权重，根据其大小，决定其执行的优先级，也就是先后顺序。

引用

可以支持#uname，对应节点uname -n的值；
#id，对应每个节点唯一的ID值；
#is_dc，也就是集群唯一的DC节点；
然后通过，运算符eq（等于）、ne（不等于）等对输入的值进行匹配。

引用

# service httpd stop

引用

crmd[6987]:info: process_lrm_event: LRM operation resource_httpd_app_monitor_15000 (call=6, rc=7) complete
crmd[6987]:info: do_lrm_rsc_op: Performing op=resource_httpd_app_stop_0 key=2:14:0018a200-ca97-4816-98c8-09b0444bdd40)
crmd[6987]:info: process_lrm_event: LRM operation resource_httpd_app_monitor_15000 (call=6, rc=-2) Cancelled
lrmd[6984]:info: rsc:resource_httpd_app: stop
lrmd[7329]:WARN: For LSB init script, no additional paramete rs are needed.
lrmd[6984]:info: RA output: (resource_httpd_app:stop:stdout) Stopping httpd:
lrmd[6984]:info: RA output: (resource_httpd_app:stop:stdout) [
lrmd[6984]:info: RA output: (resource_httpd_app:stop:stdout) FAILED
lrmd[6984]:info: RA output: (resource_httpd_app:stop:stdout) ]
lrmd[6984]:info: RA output: (resource_httpd_app:stop:stdout)

crmd[6987]:ERROR: process_lrm_event: LRM operation resource_httpd_app_stop_0 (call=7, rc=1) Error unknown error
crmd[6987]:info: do_lrm_rsc_op: Performing op=resource_httpd_ip_start_0 key=6:15:0018a200-ca97-4816-98c8-09b0444bdd40)
lrmd[6984]:info: rsc:resource_httpd_ip: start

引用

crmd[6987]:info: process_lrm_event: LRM operation resource_httpd_app_monitor_15000 (call=6, rc=7) complete
crmd[6987]:info: do_lrm_rsc_op: Performing op=resource_httpd_app_stop_0 key=2:14:0018a200-ca 97-4816-98c8-09b0444bdd40)
crmd[6987]:info: process_lrm_event: LRM operation resource_httpd_app_monitor_15000 (call=6, rc=-2) Cancelled
lrmd[6984]:info: rsc:resource_httpd_app: stop
lrmd[7329]:WARN: For LSB init script, no additional parameters are needed.
lrmd[6984]:info: RA output: (resource_httpd_app:stop:stdout) Stopping httpd:
lrmd[6984]:info: RA output: (resource_httpd_app:stop:stdout) [
lrmd[6984]:info: RA output: (resource_httpd_app:stop:stdout) FAILED
lrmd[6984]:info: RA output: (resource_httpd_app:stop:stdout) ]
lrmd[6984]:info: RA output: (resource_httpd_app:stop:stdout)

crmd[6987]:ERROR: process_lrm_event: LRM operation resource_httpd_app_stop_0 (call=7, rc=1) Error unknown error
crmd[6987]:info: do_lrm_rsc_op: Performing op=resource_httpd_ip_start_0 key=6:15:0018a200-ca97-4816-98c8-09b0444bdd40)
lrmd[6984]:info: rsc:resour ce_httpd_ip: start

引用

lrmd[6984]: info: rsc:resource_httpd_ip: start
IPaddr[7337][7367]: INFO: Using calculated nic for 192.168.228.235: eth0
IPaddr[7337][7372]: INFO: Using calculated netmask for 192.168.228.235: 255.255.255.0
crmd[6987]: info: process_lrm_event: LRM operation resource_httpd_ip_start_0 (call=8, rc=0) complete
crmd[6987]: info: do_lrm_rsc_op: Performing op=resource_httpd_app_start_0 key=7:15:0018a200-ca97-4816-9 8c8-09b0444bdd40)
lrmd[6984]: info: rsc:resource_httpd_app: start
lrmd[7388]: WARN: For LSB init script, no additional parameters are needed.
lrmd[6984]: info: RA output: (resource_httpd_app:start:stdout) Starting httpd:
lrmd[6984]: info: RA output: (resource_httpd_app:start:stdout) httpd: Could not determine the server’s fully qualified domain name, using 192.168.228.233 for ServerName

crmd[6987]: info: process_lrm_event: LRM operation resource_httpd_app_start_0 (call=9, rc=0) complete

引用

lrmd[6984]: info: rsc:resource_httpd_ip: start
IPaddr[7337][7367]: INFO: Using calculated nic for 192.168.228.235: eth0
IPaddr[7337][7372]: INFO: Using calculated netmask for 192.168.228.235: 255.255.255.0
crmd[6987]: info: process_lrm_event: LRM operation resource_httpd_ip_start_0 (call=8, rc=0) complete
crmd[6987]: info: do_lrm_rsc_op: Performing op=resource_httpd_app_start_0 key=7:15:0018a200-ca97-4816-98c8-09b0444bdd40)
lrmd[6984]: info: rsc:resource_httpd_app: start
lrmd[7388]: WARN: For LSB init script, no additional parameters are needed.
lrmd[6984]: info: RA output: (resource_httpd_app:start:stdout) Starting httpd:
lrmd[6984]: info: RA output: (resource_httpd_app:start:stdout) httpd: Could not determine the server’s fully qualified domain name, using 192.168.228.233 for ServerName

lrmd[6984]: info: RA output: (resource_httpd_app:start:stdout) [
lrmd[6984]: info: RA output: (resource_httpd_ap p:start:stdout) OK
lrmd[6984]: info: RA output: (resource_httpd_app:start:stdout) ]
lrmd[6984]: info: RA output: (resource_httpd_app:start:stdout)
lrmd[6984]: info: RA output: (resource_httpd_app:start:stdout)

crmd[6987]: info: process_lrm_event: LRM operation resource_httpd_app_start_0 (call=9, rc=0) complete

引用

# uname -n
hatest3
# shutdown -h 0

引用

# uname -n
hatest3
# shutdown -h 0

引用

tengine[6591]: info: te_crm_command: Executing crm-event (16): do_shutdown on hatest3
tengine[6591]: info: te_pseudo_action: Pseudo action 11 fired and confirmed
lrmd[6582]: info: rsc:resource_httpd_ip: start
……
crmd[6585]: info: ccm_event_detail: CURRENT: hatest4 [nodeid=1, born=3]
crmd[6585]: info: ccm_event_detail: LOST: hatest3 [nodeid=0, born=2]
heartbeat[6568]: WARN: node hatest3: is dead
heartbeat[6568]: info: Link hatest3:eth 0 dead.
crmd[6585]: notice: crmd_ha_status_callback: Status update: Node hatest3 now has status [dead]
heartbeat[6568]: info: Link hatest3:eth1 dead.

引用

下载文件

点击这里下载文件

下载文件

点击这里下载文件

Leave a Comment Cancel reply