I learned about the general scheme of OpenStack high availability a while ago. For example, the HAProxy+Keepalived scheme is used here, which reminds me of the previous configuration of LVS Used Heartbeat. I haven’t configured Keepalived, but I want to know why Keepalived is used instead of Heartbeat, so I searched and searched, such as “Heartbeat vs Keepalived”, “Comparison of Heartbeat and Keepalived” and so on. I finally found it. I think the expression is very good. A clear answer, this answer happens to come from Willy, the author of HAProxy, so I want to be a porter. To read the original text, click here.
Well, I’m not sure whether you’ll find a response here as this is purely a heartbeat question.
Anyway, I’d like to say that I’m amazed by the number of people who use heartbeat to get a redundant haproxy setup. It is not the best tool for*this* job, it was designed to build clusters, which is a lot different from having two redundant stateless network equipments. Network oriented tools such as keepalived or ucarp are the best suited for that task.
The difference between those two families is simple :
- a cluster-oriented product such as heartbeat will ensure that a shared resource will be present at *at most* one place. This is very important for shared filesystems, disks, etc… It is designed to take a service down on one node and up on another one duri ng a switch over. That way, the shared resource may never be concurrently accessed. This is a very hard task to accomplish and it does it well.
- a network-oriented product such as keepalived will ensure that a shared IP address will be present at *at least* one place. Please note that I’m not talking about a service or resource any more, it just plays with IP addresses. It will not try to down or up any service, it will just consider a certain number of criteria to decide which node is the most suited to offer the service. But the service must already be up on both nodes. As such, it is very well suited for redundant routers, firewalls and proxies, but not at all for disk arrays nor filesystems.
The difference is very visible in case of a dirty failure such as a split brain. A cluster-based product may very well end up with none of the nodes offering the service, to ensure that the shared re source is never corrupted by concurrent accesses. A network-oriented product may end up with the IP present on both nodes, resulting in the service being available on both of them. This is the reason why you don’t want to serve file-systems from shared arrays with ucarp or keepalived.
The nature of the controls and changes also has an impact on the switch time and the ability to test the service offline. For instance,with keepalived, you can switch the IP from one node to another one in just one second in case of a dirty failure, or in zero delay incase of volunteer switch, because there is no need to start/ stop anything. That also means that even if you’re experiencing flapping, it’s not a problem because even if the IP constantly moves, it moves between places where the service is offered. And since the service is permanently available on the backup nodes, you can test your configs there without impacting the master node.
So in short, I would not like to have my router/firewall/load balancer running on heartbeat, as well as I would not like to have my fileserver/disk storage/database run on keepalived .
With keepalived, your setup above is trivial. Just configure two interfaces with their shared IP addresses, enumerate the interfaces you want to track , declare scripts to check the services if you want and that’s all. If any interface fails or if haproxy dies, the IP immediately switches to the other node. If both nodes lose the same interface (eg: shared switch failure), you still have part of the service running on one of the nodes on the other interface.
Hoping this helps understanding the different types of architectures one might encounter ,
Willy span>