High Availability

From TBwiki
(Difference between revisions)
Jump to: navigation, search
Line 1: Line 1:
While TelcoBridges hardware could always be purchased and configured to support high-availability (HA) requirements, this new release of [[Toolpack]] provides complete HA support in software as well, enabling complete end-to-end high availability. With [[Toolpack v2.3]], existing application-level redundancy is now complimented by redundancy of the core configuration database. With this release of Toolpack, the primary database can go down, while master versions of applications on the main server can keep running. Following a fault, they will simply refer to the secondary (backup) database. If master host machine should encounter a fault, then the primary configuration database and all other master application instances will also go down. In this case, all standby application instances on the standby server as well as the configuration database become the new primary (master) instances. Finally, it is important to note that all applications become highly available by default once HA support has been turned on.
+
While TelcoBridges hardware could always be purchased and configured to support high-availability (HA) requirements, this new release of [[Toolpack]] provides complete HA support in software as well, enabling complete end-to-end high availability. With Toolpack [[version 2.3]], existing application-level redundancy is now complimented by redundancy of the core configuration database. With this release of Toolpack, the primary database can go down, while master versions of applications on the main server can keep running. Following a fault, they will simply refer to the secondary (backup) database. If master host machine should encounter a fault, then the primary configuration database and all other master application instances will also go down. In this case, all standby application instances on the standby server as well as the configuration database become the new primary (master) instances. Finally, it is important to note that all applications become highly available by default once HA support has been turned on.
  
 
== Prerequisites ==
 
== Prerequisites ==
Line 8: Line 8:
  
 
:[[Image:Enhanced-HA-overview.jpg]]
 
:[[Image:Enhanced-HA-overview.jpg]]
 
  
 
== Steps ==
 
== Steps ==
Line 37: Line 36:
  
 
* Master
 
* Master
* tboamapp
+
**tboamapp
* toolpack_sys_manager (no effect on established calls, still able to accept transient calls)
+
**toolpack_sys_manager (no effect on established calls, still able to accept transient calls)
* toolpack_engine (no effect on established calls, will drop transient calls)
+
**toolpack_engine (no effect on established calls, will drop transient calls)
* gateway (no effect on established calls, will drop transient calls)
+
**gateway (no effect on established calls, will drop transient calls)
* Slave
+
*Slave
* tboamapp
+
**tboamapp
* toolpack_sys_manager (no effect on established calls, still able to accept transient calls)
+
**toolpack_sys_manager (no effect on established calls, still able to accept transient calls)
* toolpack_engine (no effect on established calls, still able to accept transient calls)
+
**toolpack_engine (no effect on established calls, still able to accept transient calls)
* gateway (no effect on established calls, still able to accept transient calls)
+
**gateway (no effect on established calls, still able to accept transient calls)
  
 
'''Host operations'''
 
'''Host operations'''

Revision as of 15:34, 12 May 2009

While TelcoBridges hardware could always be purchased and configured to support high-availability (HA) requirements, this new release of Toolpack provides complete HA support in software as well, enabling complete end-to-end high availability. With Toolpack version 2.3, existing application-level redundancy is now complimented by redundancy of the core configuration database. With this release of Toolpack, the primary database can go down, while master versions of applications on the main server can keep running. Following a fault, they will simply refer to the secondary (backup) database. If master host machine should encounter a fault, then the primary configuration database and all other master application instances will also go down. In this case, all standby application instances on the standby server as well as the configuration database become the new primary (master) instances. Finally, it is important to note that all applications become highly available by default once HA support has been turned on.

Contents

Prerequisites

The first prerequisite for enabling HA database functionality is to have a secondary server available running the MySQL database software. To configure that secondary server you would follow the same prerequisites for a normal installation of Toolpack on that machine. One thing to note with this change is that the IP address for the primary as well as the secondary server must now be made explicit (i.e., using ‘localhost’ or ‘127.0.0.1’ is no longer permitted). You will also want to activate database replication functionality using MySQL’s procedures for achieving this.

Figure 2: Server scenarios
Enhanced-HA-overview.jpg

Steps

Configuration of the enhanced HA functionality is performed via the Toolpack web portal.

  • If you are not currently using Toolpack already, just follow the normal installation guide. You will need to install as two separate hosts. During installation, you will indicate whether a given server is primary (master) or secondary (slave). Once again, avoid names like ‘localhost’ or 127.0.0.1 for server addressing purposes.
  • If you are using Toolpack already and would like to enable high-availability, please contact us directly. For the purposes of the beta test, due to the number of operating systems supported by Toolpack, we will provide you with the procedures you need to follow depending upon the operating system you are using. However, for the final release of Toolpack, currently projected for June 2009, we will provide instructions for all supported operating systems. This will part of the normal version migration guide (i.e.: from 2.2 to version 2.3).

Testing this feature

While we expect enhanced HA functionality to have a major positive impact on your operations, HA is like an insurance policy in that you hope you never have to use it and it is not something you necessarily want to test on production servers. Consequently, it is the area where we expect to spend significant testing efforts. We have identified the following items as being worthy of testing and we recommend that they be part of your testing efforts as well. Testing for HA functionality can be performed using VMware and other server virtualization solutions.

You should expect different effects on the system depending on the test operations performed. These can be divided into two broad categories. The first is where the system will NOT drop any calls and will still accept new incoming calls and calls that are currently being processed (transient calls). The second is where established calls will NOT be dropped but the system will not be able to accept transient calls. Please note that this ‘interruption’ is temporary in nature, lasting only so long as the the switchover; once the switchover is complete, any incoming calls will be accepted again.

MySQL operations (no effect on established calls, still able to accept transient calls)

  • Shutdown MySQL service on master
  • Kill MySQL service on master
  • Shutdown MySQL service on slave
  • Kill MySQL service on slave

Toolpack operations

  • Quit (graceful) master Toolpack service: (no effect on established calls, will drop transient calls)
  • Quit (graceful) slave Toolpack service: (no effect on established calls, still able to accept transient calls)

Toolpack applications operations

  • Master
    • tboamapp
    • toolpack_sys_manager (no effect on established calls, still able to accept transient calls)
    • toolpack_engine (no effect on established calls, will drop transient calls)
    • gateway (no effect on established calls, will drop transient calls)
  • Slave
    • tboamapp
    • toolpack_sys_manager (no effect on established calls, still able to accept transient calls)
    • toolpack_engine (no effect on established calls, still able to accept transient calls)
    • gateway (no effect on established calls, still able to accept transient calls)

Host operations

  • Master (no effect on established calls, will drop transient calls)
  • Disconnect the network(s)
  • Shutdown the server (or disconnect power)
  • Slave (no effect on established calls, still able to accept transient calls)
  • Disconnect the network(s)
  • Shutdown the server

Following a switchover, you will want to test the following items and verify that there are no errors:

  • Established calls are still open
  • New incoming calls are accepted
  • The system closes established calls
  • You are able to switch configurations (this requires a configuration reload)
  • You are able to change the state (Run / Don't Run) of an application
  • You are able to enable / disable trunks (this requires a configuration reload)
  • You are able to enable / disable stacks ( SS7, SIP, ISDN) (this requires a configuration reload)
  • You are able to add/ remove / modify gateway routes (this requires a configuration reload)
  • You are able to add / remove / modify gateway accounts (this requires a configuration reload)

Known limitations

We have identified the following limitations with the enhanced high availability support available in Toolpack v2.3.

  • It is not always possible to recover from a double fault. For example, should both configuration databases go offline, any telephony applications that use these databases will cease to function correctly.



Return to Toolpack v2.3 Release Notes

Personal tools