One Method of Heartbeat Debugging

This article mainly focuses on introducing one method of heartbeat debugging. It will include some key points listed as below:

Some basic thoughts on how to diagnose a network bug among cluster
How to handle network level 2 problem
Some frequently used commands or tools: tcpdump, arp, etc.

Prerequirements

First of all assume that we have two nodes installed heartbeat. Well, if you do not know about heartbeat, here is some useful resource links: - Heartbeat Official User Guide - Rakuten DBA HeartBeat Setting manual

Environment

As mentioned above, We’ve got two nodes that which had already installed heartbeat. Let us assume their IP listed as below:

1 2	192.168.40.64 192.168.41.63

Before start them, we need setup their configuration files, generally speaking it includs:

1
2
3

ha.cf
haresource
authkeys

For ha.cf, the configuration is listed as below:

Configuration of ha.cf

Among ha.cf, some fields has is easy to know, some we would pick up and describe. ‘crm off’ means Cluster Resource Manager(crm) is off. ‘auto_failback off’ means standby is working and it will not return its resource whether master is recovered or not. ‘node node-0’ means config every node need connect through heartbeat and ‘node-0’ is server name, you can use ‘uname -n’ get it. haresource will configure a virtual IP address to make heartbeat nodes act as one server. Below is configuration example:

Example configuration of virtual ip

‘192.168.40.1’ is a virtual IP, the most important here you need specify is when your nodes real IP and virtual IP are not in the same subnet, please add netmask, in this configuration file is ‘24’ authkeys stores how authenticate among heartbeat nodes. generally speaking there are three authentication types. ‘crc’, ‘md5’ and ‘sha1’, if you network is secure, use ‘crc’ is enough. If not please use ‘sha1’. Here is the example:

Example configuration of authentication methods

Some useful commands of start / stop heartbeat as a service.

1 2	sudo service heartbeat start sudo service heartbeat stop

But please always remember it is only useful under Ubuntu and install heartbeat as a service.

Debug

Frankly speaking, if you setup your configuration file as above example, you maybe configure heartbeat successfully without debug. But if it is just not work, so we still need to learn how to debug with heartbeat.

Logs

Every mature software or service should have logs, no exception for heartbeat. The log addresses is right in the ha.cf config file.

1 2	tail -f /var/log/ha-debug vim /var/log/ha-log

Use ‘tail -f’ or ‘vim’ depends on yourself.

Virtual IP Interface

When every heartbeat starts it will try to setup a Virtual IP address, this address is specified in haresource file. We can use ‘ifconfig’ command easily observe this. But It reports an error, here is an useful command to debug. sudo ifconfig eth0:100 192.168.40.1 netmask 255.255.255.0 broadcast 192.168.40.255 This command will setup a virtual IP Interface with specified IP 192.168.40.1 and also netmask, broadcast.

ARP

Using arp, we can find whether heartbeat nodes are connected with each other or not. Command is like this:

arp -a

Actually every IP Interface would have a arp list, it will store mappings between IP address and MAC address. If they are connected, we can got a record like this:

1	node-0 (192.168.40.64) at 00:50:56:cc:ee:ff [ehter] on eth0

SEND_ARP & TCPDUMP

Using send_arp to mock a virtual IP and use TCPDUMP to detect is there any arp requests online. Some commands are listed as below:

1	send_arp 192.168.40.4 00:11:22:aa:bb:cc 192.168.40.4 fffffffffff

Using ‘tcpdump’ to catch every arp request and response. Here is the example command:

1	sudo tcpdump -i eth0 arp

Example result is listed as below:

1 2	07:46:07.651105 ARP, Request who-has 127.0.1.1 tell 127.0.1.1, length 46 07:46:08.151360 ARP, Reply 127.0.1.1 is-at 00:00:00:56:00:05 (oui Ethernet), length 46

Nats Server working with heartbeat

Regarding to our team, heartbeat will work for nats server to promise that nats-server will always alive. In terms of nats server, we’ve one useful command ‘nats-top’ to check whether which is working.

1 2	cd nats-installation-directory ./bin/nats-top

Conclusion

Debug with heartbeat is not only focus on heartbeat, but also know about environment context, research on arp protocol. In this means what we are doing is totally based on what we learned before such as arp, IP Interface. But we need to use what we learn in real work and also try it, fix it as quick as possible. To profit our team, even our company.