High Availability
(clean-up) |
(formatting) |
||
Line 13: | Line 13: | ||
=== Prerequisites === | === Prerequisites === | ||
The first prerequisite for enabling HA database functionality is to have a secondary server available running the MySQL database software. To configure that secondary server you would follow the same prerequisites for a normal installation of Toolpack on that machine. One thing to note with this change is that the IP address for the primary as well as the secondary server must now be made explicit (i.e., using ‘localhost’ or ‘127.0.0.1’ is no longer permitted). You will also want to activate database replication functionality using MySQL’s procedures for achieving this. | The first prerequisite for enabling HA database functionality is to have a secondary server available running the MySQL database software. To configure that secondary server you would follow the same prerequisites for a normal installation of Toolpack on that machine. One thing to note with this change is that the IP address for the primary as well as the secondary server must now be made explicit (i.e., using ‘localhost’ or ‘127.0.0.1’ is no longer permitted). You will also want to activate database replication functionality using MySQL’s procedures for achieving this. | ||
+ | |||
:'''Figure 2: Server scenarios''' | :'''Figure 2: Server scenarios''' | ||
Line 27: | Line 28: | ||
=== Testing this feature === | === Testing this feature === | ||
While we expect enhanced HA functionality to have a major positive impact on your operations, HA is like an insurance policy in that you hope you never have to use it and it is not something you necessarily want to test on production servers. Consequently, it is the area where we expect to spend significant testing efforts. We have identified the following items as being worthy of testing and we recommend that they be part of your testing efforts as well. Testing for HA functionality can be performed using [http://www.vmware.com/ VMware] and other server virtualization solutions. | While we expect enhanced HA functionality to have a major positive impact on your operations, HA is like an insurance policy in that you hope you never have to use it and it is not something you necessarily want to test on production servers. Consequently, it is the area where we expect to spend significant testing efforts. We have identified the following items as being worthy of testing and we recommend that they be part of your testing efforts as well. Testing for HA functionality can be performed using [http://www.vmware.com/ VMware] and other server virtualization solutions. | ||
+ | |||
You should expect different effects on the system depending on the test operations performed. These can be divided into two broad categories. The first is where the system will NOT drop any calls and will still accept new incoming calls and calls that are currently being processed (transient calls). The second is where established calls will NOT be dropped but the system will not be able to accept transient calls. Please note that this ‘interruption’ is temporary in nature, lasting only so long as the the switchover; once the switchover is complete, any incoming calls will be accepted again. | You should expect different effects on the system depending on the test operations performed. These can be divided into two broad categories. The first is where the system will NOT drop any calls and will still accept new incoming calls and calls that are currently being processed (transient calls). The second is where established calls will NOT be dropped but the system will not be able to accept transient calls. Please note that this ‘interruption’ is temporary in nature, lasting only so long as the the switchover; once the switchover is complete, any incoming calls will be accepted again. | ||
+ | |||
'''MySQL operations''' (no effect on established calls, still able to accept transient calls) | '''MySQL operations''' (no effect on established calls, still able to accept transient calls) | ||
Line 36: | Line 39: | ||
* Shutdown MySQL service on slave | * Shutdown MySQL service on slave | ||
* Kill MySQL service on slave | * Kill MySQL service on slave | ||
+ | |||
'''Toolpack operations''' | '''Toolpack operations''' | ||
Line 41: | Line 45: | ||
*Quit (graceful) master Toolpack service: (no effect on established calls, will drop transient calls) | *Quit (graceful) master Toolpack service: (no effect on established calls, will drop transient calls) | ||
*Quit (graceful) slave Toolpack service: (no effect on established calls, still able to accept transient calls) | *Quit (graceful) slave Toolpack service: (no effect on established calls, still able to accept transient calls) | ||
+ | |||
'''Toolpack applications operations''' | '''Toolpack applications operations''' | ||
Line 54: | Line 59: | ||
**toolpack_engine (no effect on established calls, still able to accept transient calls) | **toolpack_engine (no effect on established calls, still able to accept transient calls) | ||
**gateway (no effect on established calls, still able to accept transient calls) | **gateway (no effect on established calls, still able to accept transient calls) | ||
+ | |||
'''Host operations''' | '''Host operations''' | ||
Line 63: | Line 69: | ||
*Disconnect the network(s) | *Disconnect the network(s) | ||
*Shutdown the server | *Shutdown the server | ||
+ | |||
Following a switchover, you will want to test the following items and verify that there are no errors: | Following a switchover, you will want to test the following items and verify that there are no errors: |
Revision as of 10:12, 27 November 2009
High availability refers both to the design of a system as well as its ability to continue operating following one or more component faults. Concepts that are associated with high availability include:
- fault tolerance (the ability for a given component to recover from a fault)
- redundancy (the presence of one or more back-up instances of a hardware or software component)
- hot-swappable (the ability to add, remove or change a system component without taking the overall system down and without compromising functionality
- scalability (the ability of a system to grow over time as well as to respond to unexpected spikes in usage without negatively impacting performance
Contents |
TelcoBridges and High Availability
While TelcoBridges hardware could always be purchased and configured to support high-availability (HA) requirements, this new release of Toolpack provides complete HA support in software as well, enabling complete end-to-end high availability. With Toolpack version 2.3, existing application-level redundancy is now complimented by redundancy of the core configuration database. With this release of Toolpack, the primary database can go down, while master versions of applications on the main server can keep running. Following a fault, they will simply refer to the secondary (backup) database. If master host machine should encounter a fault, then the primary configuration database and all other master application instances will also go down. In this case, all standby application instances on the standby server as well as the configuration database become the new primary (master) instances. Finally, it is important to note that all applications become highly available by default once HA support has been turned on.
Prerequisites
The first prerequisite for enabling HA database functionality is to have a secondary server available running the MySQL database software. To configure that secondary server you would follow the same prerequisites for a normal installation of Toolpack on that machine. One thing to note with this change is that the IP address for the primary as well as the secondary server must now be made explicit (i.e., using ‘localhost’ or ‘127.0.0.1’ is no longer permitted). You will also want to activate database replication functionality using MySQL’s procedures for achieving this.
- Figure 2: Server scenarios
Steps
Configuration of the enhanced HA functionality is performed via the Toolpack web portal.
- If you are not currently using Toolpack already, just follow the normal installation guide. You will need to install as two separate hosts. During installation, you will indicate whether a given server is primary (master) or secondary (slave). Once again, avoid names like ‘localhost’ or 127.0.0.1 for server addressing purposes.
- If you are using Toolpack already and would like to enable high-availability, please contact us directly. For the purposes of the beta test, due to the number of operating systems supported by Toolpack, we will provide you with the procedures you need to follow depending upon the operating system you are using. However, for the final release of Toolpack, currently projected for June 2009, we will provide instructions for all supported operating systems. This will part of the normal version migration guide (i.e.: from 2.2 to version 2.3).
Testing this feature
While we expect enhanced HA functionality to have a major positive impact on your operations, HA is like an insurance policy in that you hope you never have to use it and it is not something you necessarily want to test on production servers. Consequently, it is the area where we expect to spend significant testing efforts. We have identified the following items as being worthy of testing and we recommend that they be part of your testing efforts as well. Testing for HA functionality can be performed using VMware and other server virtualization solutions.
You should expect different effects on the system depending on the test operations performed. These can be divided into two broad categories. The first is where the system will NOT drop any calls and will still accept new incoming calls and calls that are currently being processed (transient calls). The second is where established calls will NOT be dropped but the system will not be able to accept transient calls. Please note that this ‘interruption’ is temporary in nature, lasting only so long as the the switchover; once the switchover is complete, any incoming calls will be accepted again.
MySQL operations (no effect on established calls, still able to accept transient calls)
- Shutdown MySQL service on master
- Kill MySQL service on master
- Shutdown MySQL service on slave
- Kill MySQL service on slave
Toolpack operations
- Quit (graceful) master Toolpack service: (no effect on established calls, will drop transient calls)
- Quit (graceful) slave Toolpack service: (no effect on established calls, still able to accept transient calls)
Toolpack applications operations
- Master
- tboamapp
- toolpack_sys_manager (no effect on established calls, still able to accept transient calls)
- toolpack_engine (no effect on established calls, will drop transient calls)
- gateway (no effect on established calls, will drop transient calls)
- Slave
- tboamapp
- toolpack_sys_manager (no effect on established calls, still able to accept transient calls)
- toolpack_engine (no effect on established calls, still able to accept transient calls)
- gateway (no effect on established calls, still able to accept transient calls)
Host operations
- Master (no effect on established calls, will drop transient calls)
- Disconnect the network(s)
- Shutdown the server (or disconnect power)
- Slave (no effect on established calls, still able to accept transient calls)
- Disconnect the network(s)
- Shutdown the server
Following a switchover, you will want to test the following items and verify that there are no errors:
- Established calls are still open
- New incoming calls are accepted
- The system closes established calls
- You are able to switch configurations (this requires a configuration reload)
- You are able to change the state (Run / Don't Run) of an application
- You are able to enable / disable trunks (this requires a configuration reload)
- You are able to enable / disable stacks ( SS7, SIP, ISDN) (this requires a configuration reload)
- You are able to add/ remove / modify gateway routes (this requires a configuration reload)
- You are able to add / remove / modify gateway accounts (this requires a configuration reload)
Known limitations
We have identified the following limitations with the enhanced high availability support available in Toolpack v2.3.
- It is not always possible to recover from a double fault. For example, should both configuration databases go offline, any telephony applications that use these databases will cease to function correctly.