HA-Cluster with loadbalancing for Zope (and Plone)
Contents
- Introduction
- Assumptions/prerequisites
- Setup
- Configuration
- Use cases
- Alternatives
- Resources
Introduction
An HA-Cluster assumes (almost) continuous availability, even in case of hardware failure. Not however that to attain real high availability, or to speak in marketing terms have an uptime of 99.999% (see for example WikiPedia), your system can be down to users only 5 minutes per year. It should be obvious that this leaves very little time for fixing problems... Also, to be truly Highly Available, not only you need more than one machine to cope with hardware failure, but you would need geographic redundancy as well, to cope with failure of the data center or the data backbone. Geographical redundancy and covering for marketing managers are not discussed here though. We leave this to the reader, as an exercise... ;)
To be able to cope with soft- and hardware failure, you'll need a setup with at least two machines, where each machine is able to perform the same services to the end-user, without the user noticing failure. For the end-user, the services are thus Highly Available. Please note that some specific end-user(s) may still notice an interruption in the services offered: if in the middle of a request the web server stops, or a CPU breaks down, there might be an effect, but the user can resume operations straight away.
We assume a simple setup with two machines in one physical location, although this is not a practical limitation of our setup, only for purposes of clarity. For a thorough account of clustering and load-balancing techniques involving other setups, see http://www.ultramonkey.org/.
Assumptions/prerequisites
We have used a setup including the following components:
- Apache2
- Heartbeat
- NFS
- mod_proxy_balancer (optional, but more efficient in terms of hardware use)
- Squid (optional. Squid configuration for Zope is not further described in this document.)
- ZEO
- DRBD
On any Linux system, these are readily available as packages.
Setup
The setup consists of two machines, each having two network interfaces. The machines are linked to each other over one serial cable, and one cross-cable for ethernet, and linked to the internet with the other ethernet interface. If you'd like to have the both machines in two separate locations, the serial cable and cross cable will of course be impossible, but you can use a second ethernet connection to communicate between the machines. This is a typical setup for HA clustering with heartbeat. Both machines have Apache2 installed, as well as a Zope instance using ZEO. The machines use heartbeat to determine whether the cluster is still in normal operation mode (master is alive). The machines are available over the internet both with their own IP-addresses, and with a floating IP-address that might point to the master, or when the master is down, to the slave. More details on this configuration can be found in the man pages for heartbeat.
The setup can be graphically depicted as follows:

The general idea is that the HA Cluster as a whole is addressed by the floating IP-address; your domain name entry in the DNS would have an A record pointing to this address. This is configured to be on the master for normal conditions. The master server handles all requests on port 80, and uses mod_proxy_balancer to distribute requests over two machines, either to Squid, or directly to Zope clients. Note that load balancing is not an essential part of the setup, but makes more efficient use of your hardware. Instead of mod_proxy_balancer you may want to use other load balancing software, like Pound or Pen.
The Zope clients use a ZEO server for their data back-end. This server runs on the master, and is contacted over the floating IP-address by the clients. Data from the master (the Data.fs) is synchronized to the slave server by DRBD, a virtual block device that actually writes to disk on two machines at the same time. On the slave, the ZEO server is not running.
In case of failover, heartbeat takes care of assigning the floating IP-address to the slave and starts the ZEO server on the slave, and will also mount the DRBD device on the slave. This server will now be the one contacted by the Zope clients. This will be detected by the Zope clients automatically. If possible, the ZEO server on the master will be stopped.
Recovery from a failover is not automatic, due to the high risk of errors in this procedure. Recovery includes:
- check on integrity of data on the slave;
- start heartbeat on master.
If you wish you can automate recovery as well by starting heartbeat on machine recovery, but we have chosen to implement manual intervention, to make sure that the master is thoroughly checked on the nature of the failure, before recovery.
Configuration
Apache
Apache is configured to load the modules for proxy, proxy_balancer and proxy_http at least. Roughly, balancing is achieved by the following statements, for example within a virtual host declaration:
<Proxy balancer://lb> BalancerMember http://192.168.1.10:8080 BalancerMember http://192.168.1.11:8080 </Proxy> ... ProxyPass / balancer://lb/VirtualHostBase/http/somesite.foo.bar:80/ploneinstance/VirtualHostRoot/assuming you have two nodes running in port 8080, IP-addresses 192.168.1.10 and 192.168.1.11, and your Plone instance is called 'ploneinstance', and you use the Virtual Host Monster to map somesite.foo.bar to the proper Plone instance. Note that this configuration does not add 'sticky sessions' to the balancing. To achieve this, we refer to another howto
on this site.
Heartbeat
The heartbeat process on the slave continually checks if the master
server is still up. The heartbeat on the slave can start automatically, so you can add links to the start-stop scripts in all runlevel init directories.
The heartbeat process on the master is not automatically (re)started
(so the floating IP address won't switch back to the master
automatically) due to our need for manual failover recovery. Remove
start/stop links to heartbeat from all runlevels in /etc/rc.<x>
and start manually using /etc/init.d/heartbeat.
Configure heartbeat according to your hardware setup, preferably using at least two communication channels for checking cluster status. A serial and an ethernet interface between both machines is a common setup. Preferably your machines have two ethernet interfaces, one for external communication, and one for heartbeat. The heartbeat configuration on the slave needs to contain the directive for starting the ZEO cluster, and stop syncing in case of failover. The configuration on the master doesn't need to do that, but should stop the zeo cluster.
Add the following directive to the /etc/heartbeat/haresources file:
<master> drbddisk::<drbd filesystem> Filesystem::/dev/drbd0::<drbd fs mount point>::ext3 <floating IP address>/24/<ethernet interface> zeo
where master is to be replaced by the name of your master node on both machines, available in the /etc/hostsfile. Check DRBD documentation on how to enable DRBD on your machine.
The identifier 'zeo' is arbitrary, but should be the name of a script available in the directory /etc/heartbeat/resource.d, that takes care of stopping and starting your zeo cluster. Check the attached zeo files for an example setup on master and slave. Both scripts assume an instance location of /opt/zope/instance0, but obviously this can be whatever you like.Use cases
NORMAL OPERATION (master + slave are up)
Only ZEO on the master is running, and connected to Data.fs located
on the master. Apache dispatches requests to clients on both master and
slave, that use the ZEO cluster on the master, using the floating IP
address.
The ZEO instance writes to the virtual block device provided by DRBD; this device writes to both machines.
MASTER IS GOING DOWN
Heartbeat on the slave detects that the master server has gone down. The following actions are executed:
- floating IP address is taken over from master
- ZEO on the slave is started
- DRBD device is mounted on slave.
(ZOPE2 automatically reconnects to ZEO2)
SLAVE IS GOING DOWN
Nothing happens. If master is also down, this is Bad News (tm).
MASTER UP AFTER BEING DOWN
no automatic actions are executed (ZEO1 remains down, Slave will keep the floating IP address!)
Check integrity of Data.fs if you wish, and start heartbeat in master. This will take care of the rest.Alternatives
As usual there's more than one way to achieve similar results. We'll not exhaust ourselves with a comparison here, but at least point to alternative ways:
- use the commercial ZRS solution of Zope Corporation;
- use ZEORAID.
