title | layout | summary |
---|---|---|
Aeolus High Availability |
page |
Aeolus High Availability |
{::options parse_block_html="true" /}
Since March the Pacemaker Cloud project has been working to provide high availability functionality for Aeolus.
A high degree of availability (link) applied to end user applications.
For more details, see further detailed mathematical constructs in this presentation (pdf):
There are four major steps:
- Monitor failure of components
- Isolate and terminate failed components
- Recover components by restart and escalation
- Report errors so a physical repair can be made
Aeolus has several components that we are interested in appling the HA methodology to.
These are applications, assemblies, and deployables.
In order to execute the high availability methodology, pacemaker-cloud needs to know how an application is started, stopped, or monitored. This is usually achieved via init scripts, but custom mechanisms could also be used.
If steps 2 or 3 fail, it is an indicator that the higher level object may have failed, and at minimum can no longer be trusted.
For example, if an application fails to restart, the assembly may be bad.
To resolve this problem, we escalate application failures into assembly failures.
Monitoring applications allows the complete High Availability (HA) methodology to be applied to enterprise and open source applications.
This monitoring provides improved availability.
Typically cloud providers can tell if a VM crashes or disappears, but can't tell whether the VM's operating system and applications are still functioning.
This can lead to situations where the VM has failed in some way that the cloud provider isn't aware of because it uses passive monitoring.
A better approach approach is to use active monitoring, where the software running in the virtual machine is checked periodically to ensure it is functioning correctly.
-
We need Matahari installed into the assemblies, whether at instance build or launch time.
-
An XML schema to describe the user application's start, stop, and monitoring mechanism.
This is typically achieved through init scripts or OCF compliant scripts on the assembly.