1 / 59

ASE136: How to build highly available applications using OpenSwitch?

ASE136: How to build highly available applications using OpenSwitch?. Ganesan Gopal Senior Manager ganesan.gopal@sybase.com August 15-19, 2004. The Enterprise. Unwired. The Enterprise. Unwired. Industry and Cross Platform Solutions. Manage Information. Unwire Information. Unwire

zlata
Download Presentation

ASE136: How to build highly available applications using OpenSwitch?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ASE136: How to build highly available applications using OpenSwitch? Ganesan Gopal Senior Manager ganesan.gopal@sybase.com August 15-19, 2004

  2. The Enterprise. Unwired.

  3. The Enterprise. Unwired. Industry and Cross Platform Solutions Manage Information Unwire Information Unwire People • Adaptive Server Enterprise • Adaptive Server Anywhere • Sybase IQ • Dynamic Archive • Dynamic ODS • Replication Server • OpenSwitch • Mirror Activator • PowerDesigner • Connectivity Options • EAServer • Industry Warehouse Studio • Unwired Accelerator • Unwired Orchestrator • Unwired Toolkit • Enterprise Portal • Real Time Data Services • SQL Anywhere Studio • M-Business Anywhere • Pylon Family (Mobile Email) • Mobile Sales • XcelleNet Frontline Solutions • PocketBuilder • PowerBuilder Family • AvantGo Sybase Workspace

  4. Agenda • What is Openswitch? • Key OpenSwitch functionalities • OpenSwitch deployment options • New features in OpenSwitch 12.5 and 12.5.1 • What is Replication Coordination Module (RCM)? • Q & A

  5. What is OpenSwitch? Open Server Gateway • Passthru gateway built on top of Sybase Open Server libraries. • Provides functionality to manually or automatically manage client connections. • Can manage any number of remote servers • Routing and load balancing capabilities • Includes error detection and recovery features • Integrates into any HA environment • Works with existing HA solution (such as Replication Server, HA-CMP, etc.). • Makes HA transparent

  6. OpenSwitch Features Connection Failover • Transparent Connection Management • Failure detection and recovery • HA coordination Load Balancing & Routing • Server pooling and routing • Server Chaining and balancing • Connection caching • Connection suspend and resume • Resource governing Management • Central management of all client connections • Notification Events • Dynamic configuration

  7. Transparent Connection Management Jaguar CTS ASE Server A ISQL OpenSwitch PowerBuilder ASE Server B Any Open Client application and platform

  8. Connection Management (cont.) Jaguar CTS ASE Server A ISQL OpenSwitch PowerBuilder ASE Server B RPC Switch Request Any Open Client application and platform Administrator (isql)

  9. Transparent Connection Management Benefits • Allows maintenance periods • Index rebuilds • DBCC checks • Non-intrusive data loads • Server reboot

  10. Failure Detection and Recovery • Status of outgoing OpenSwitch connections are monitored. • As each connection is lost to primary server OpenSwitch moves it to next available Server. • Switch decision is on a per-connection basis. • Switch attempts to recover: • Database context • Client language • Client charset • Client-side cursors ASE Server A Jaguar CTS OpenSwitch ISQL PowerBuilder ASE Server B Open Client App

  11. ASE Server A ASE Server B Receiving Results Idle Idle Transaction Management • OpenSwitch Tracks: • Client activity • Database context • Client-side cursors • “Deadlock” Issued To Clients With • Open transactions • Active communications • Open client-side cursor DEADLOCK! DEADLOCK! In Transaction Jaguar CTS ISQL PowerBuilder ISQL

  12. ASE Server A ASE Server B Receiving Results Idle Idle Transaction Management (cont). • Failover is Transparent to clients • That are not in a transaction • Clients that are idle • Administrative Switch Requests • Behavior is configurable • Can queue switch until client transactions are committed. • Can specify a maximum number of seconds for client to commit transaction. • Can force transactions to be broken (generating a “deadlock” message). DEADLOCK! DEADLOCK! In Transaction Jaguar CTS ISQL PowerBuilder ISQL

  13. Pooling and Routing • What is a Pool? • Defines a named group of servers • Servers within a pool are a self-contained fail-over group • Connections go to first server in pool • Fail over to next available server in pool • Kicked off of OpenSwitch if no servers are remaining • Connections can be routed to pools based upon • Username • Application Name • Client hostname • Can use “wildcard” names • Pools can be configured in chained or balanced mode Pool A Pool B Pool C

  14. Server A Server C Server B DB A DB A/DB B DB B App A App B App B Reporting Pooling and Routing Example POOL A POOL C POOL B App A Reporting App B Pool Name POOL A POOL B POOL C Who App A App B Report Primary Server A Server B Server C Failover Server C Server C -- App A

  15. Pool A Pool B Pool A Pool B Chaining and Balancing Chained Pool A Balanced Pool B

  16. Connection Caching • Caching Per Pool • Each pool may specify a caching period • Out-going connections are retained by OpenSwitch for the specified period. • Connection handed back to user next time (s)he connects (with same password) • Connection closed if caching period expires. • Performance Boost! • Web Servers (CGI’s) without persistent connections. • Site Handlers • Any application that performs a rapid connect/query/release. Disconnected Client Connection Cache Reconnect Connection Restore

  17. ASE Server A HA Coordination • Without Coordination • OpenSwitch makes all decisions about routing and fail-over on its own. • Decision made per-connection • Coordination Module • Custom developed OpenSwitch client, using Coordination Module API for complex administration and Sybase provided RCM to control a high availability, warm standby replication environment • When CM is present OpenSwitch defers major decision to it. • Events raised by clients (such as logins and failures) are forwarded to CM. • Events include indication of what OpenSwitch would like to do. • Client hangs until CM responds. What do I do? CM Response Action Application

  18. ASE Server A ASE Server B HA Coordination (cont.) HA Solution (Rep Server) • Custom Coordination Module: • Developed by/for each customer to address questions: • What constitutes a failure? • What should be response to failure? • Used to coordinate with HA solution (Rep Server, HP Service Guard, EMC SRDF, etc.) • Hook-point for client-specific logic, such as application security. • Can be used to override OpenSwitch decisions on routing and pooling. • Can perform any action OpenSwitch administrator may perform. In Sync? Really Fail? CM Switch Connection Lost! Application

  19. Connection Suspend/Resume Suspend Activity • Suspend/Resume • One or more connections may be temporarily suspended. • Can wait for transactions to complete or “deadlock” them. • Allows quick maintenance periods. • Can safely bounce SQL Server. • Can safely establish and break hardware or OS mirroring. Reporting Enable Mirror (TimeFinder) Reporting Resume Activity Reporting Break Mirror Reporting

  20. Dynamic Configuration • Configuration in external .cfg file. • Server can change dynamically from altered .cfg file, or via built-in registered procedures. • No need to restart OpenSwitch. • Coordination Module can alter configuration.

  21. OpenSwitch Deployment Optimal Configuration? • It depends on the HA requirement • Run multiple copies OpenSwitch to mitigate openswitch being a single point of failure. • Need to determine: • What are recovery requirements? • How much of the environment can be affected by a failure? • How much hardware is available? • What is most likely to fail?

  22. Reducing Points of Failure Server A Server B • Increasing OpenSwitch instances (reducing #clients per switch) reduces impact of failure. • Client fail-over utilizes existing Open Client technology

  23. ASE Server B Open Switch with ASE • ASE requires data replication to keep the Primary and Replicate databases in sync. • OpenSwitch switches between the two ASE Servers, Server A and Server B. • If either ASE Server becomes unavailable, Open Switch shields the client from the disconnect. • CM requires Ping User connection to each ASE Server to ping the server to determine availability • CM requires connection / code to communicate to Rep Server ASE Server A HA Solution (Rep Server) CM

  24. What is NEW in OSW 12.5? • Replication Server Co-ordination Module for Warm-Standby Configuration (out of the box) • Installation & install time Configuration • Dynamic SQL Support • OCS 12.5 integration - Optimal Resource Usage • HA-Aware Support • Quality

  25. OpenSwitch What is RCM? • RCM is a OpenSwitch Coordination Module (CM) • Registers callback routines for events within the OpenSwitch • Sets timer events and registers timer callbacks • Connects to the OpenSwitch • Executes a “run” command. At this point nothing happens until a defined event occurs • OpenSwitch contacts the coordination module when: • A user requests a connection • A login attempt fails • An existing connection fails • The OpenSwitch connection fails • Coordination module determines which server the user should connect to • Coordination module is an Open-Client C executable • An OpenSwitch can have more than one coordination module • A coordination module can connect to more than one OpenSwitch Coordination Module

  26. What is RCM? • RCM provides all the functionality needed to control a high availability, warm standby replication environment • Prior to RCM, customers and/or professional services had to write custom coordination modules • RCM is part of the OpenSwitch product

  27. RCM Warm-Standby Replication Environment Topology Active ASE Application End Users OpenSwitch Warm-Standby Replication Server OpenSwitch Decision Support Users Standby ASE

  28. RCM Overview • The RCM provides the following features: • Coordinates user access to the active and standby ASE servers • Switches the warm standby connections in the Replication Server • Switches users from the active to the standby ASE if the active server is unavailable • Supports failover of multiple databases in the active ASE • Coordinates two OpenSwitch servers to provide redundancy • Note: redundant OpenSwitch servers are optional

  29. Configuring the OpenSwitch and RCM • RCM is configured using a separate configuration file • RCM settings must be coordinated with OpenSwitch settings • OpenSwitch configuration parameters • The following parameters appear in both the OpenSwitch and the RCM configuration files: • SERVER_NAME • COORD_USER • COORD_PASSWORD • The parameter COORD_MODE must be set to ‘ALWAYS’ • Example: • [CONFIG] • SERVER_NAME = ws_os • COORD_USER = os_coord • COORD_PASSWORD = os_coord_pwd • COORD_MODE = ALWAYS

  30. Configuring the OpenSwitch and RCM • Applications end users must connect to the active server unless it is unavailable • OpenSwitch must be configured with exactly one POOL for application end users • MODE = ‘CHAINED’ • STATUS = ‘UP’ • List the active server followed by the standby server • Make sure all application end users connect using this POOL • Example: • [POOL=Application:MODE=CHAINED, STATUS=UP] • servers: • BookServer • StandbyBook • connections: • username:bob • username:fred

  31. Configuring the OpenSwitch and RCM • Optionally OpenSwitch can be configured with pools for decision support users • Zero or more pools • Mode is either ‘CHAINED’ or ‘BALANCED’ • STATUS = ‘UP’ • Example: • [POOL=DSS:MODE=CHAINED, STATUS=UP] • servers: • StandbyBook • BookServer • connections: • username:alice

  32. Configuring the OpenSwitch and RCM • RCM configuration parameters • Three replication failover modes: SWITCH, QUIESCE, NONE • Multiple database support • Timing parameters • Required database list

  33. Configuring the OpenSwitch and RCM • Standard RCM configuration parameters # Open Switch Server; These parameters match OpenSwitch parameters OPENSWITCH = ws_os COORD_USER = os_coord COORD_PASSWORD = os_coord_pwd APP_POOL = Application # Active and Standby ASE parameters ACTIVE_ASE = BookServer STANDBY_ASE = StandbyBook ASE_USER = sa #ASE_PASSWORD - ASE password is blank # Wait 5 minutes before starting the failover FAILOVER_WAIT = 300 # Wait 2 minutes for the Rep Server to perform the switch over MONITOR_WAIT = 120 # Wait 5 seconds between ping/monitor commands TIMER_INTERVAL = 5

  34. Configuring the OpenSwitch and RCM • RCM configuration parameters - Failover Mode = ‘SWITCH’ # On failover, switch the flow of replication RS_FAILOVER_MODE = SWITCH # Replication Server REP_SERVER = ws_rs RS_USER = sa #RS_PASSWORD - Replication Server password is blank # Identify the database connection in the warm-standby environment LOGICAL_CONN = LDS.LDB DATABASES = pubs3

  35. Configuring the OpenSwitch and RCM • RCM configuration parameters - Failover Mode = ‘QUIESCE’ # On failover, quiesce the Replication Server # No database information is needed RS_FAILOVER_MODE = QUIESCE # Replication Server REP_SERVER = ws_rs RS_USER = sa #RS_PASSWORD - Replication Server password is blank

  36. Configuring the OpenSwitch and RCM • RCM configuration parameters - Multiple database support # On failover, switch the flow of replication RS_FAILOVER_MODE = SWITCH # Replication Server REP_SERVER = ws_rs RS_USER = sa #RS_PASSWORD - Replication Server password is blank # Identify the databases in the warm-standby environment LOGICAL_CONN = LDS.pubs3, LDS.sales, LDS.signings #DATABASES - Omitted, so RCM will use pubs3, sales, signings # The loss of the signings database will not trigger a failover REQUIRED_DBS = pubs3, sales

  37. Starting the RCM • Executable and startup script is in the OpenSwitch bin • Start the OpenSwitch before starting the RCM • Starting RCM as a Windows service is not supported • RCM command line syntax rcm [-v] [-h] [-a] [-R] [-c config_file] [-e system_log] [-i interfaces_file_directory] [-T trace_flags] • Display the RCM version string rcm -v • Start the RCM rcm -c ../config/rcm.cfg -I ../../interfaces -T EF

  38. RCM Failover Handling • Identifying the ASE failure • The OpenSwitch notifies the RCM when: • A user fails to connect to an ASE • An existing connection to an ASE fails • A switch over to an ASE fails • Active ASE: • If an application end users fails, the RCM starts the failover process • If a decision support user fails, the RCM switches them to the next available server • Standby ASE: • If any user fails, the RCM switches them to the next available server • Application users cannot log into the standby ASE unless the failover process has already occurred

  39. RCM Failover Handling • RCM Failover Processing • Starts only when an application end user fails on the active ASE • Ping the active ASE • Attempt to log into the ASE • If successful, the ASE is not down, abort the failover process • Suspend connections to the active ASE • Stop new users from logging into the ASE (rp_stop) • Suspend existing connections (rp_server_status, LOCKED) • Wait for Recovery • Wait for the ASE to automatically recover, or for the network to stabilize • Wait a configurable amount of time (FAILOVER_WAIT) • Ping the ASE at a configurable interval (TIMER_INTERVAL) • If successful, abort the failover process

  40. RCM Failover Handling • Issue the Replication Server failover commands • SWITCH – (switch active) • QUIESCE – (suspend log transfer from all, admin quiesce_force_rsi) • NONE - do not failover the connections in the Replication Server • Monitor the Replication Server failover • Wait for the Replication Server to finish the failover commands • Wait a configurable amount of time (MONITOR_WAIT) • Monitor the Replication Server at a configurable interval (TIMER_INTERVAL) • SWITCH – (admin logical_status) • QUIESCE – (admin health)

  41. RCM Failover Handling • Start the Replication Agent on the Standby ASE • Only if failover mode is ‘SWITCH’ (sp_start_rep_agent) • Switch the users to the Standby ASE • Set the server to ‘DOWN’ in the OpenSwitch (rp_server_status, ‘DOWN’) • Switch the connections from the active to standby (rp_switch) • Restart the existing connections (rp_start)

  42. RCM Failover Handling • User permissions for the RCM • ASE User • Must be a valid login for both the active and the standby ASE • Permission to connect to all databases that participate in replication • Permission to start the replication agent on all databases (sp_start_rep_agent) • Replication Server • User must have ‘sa’ privileges

  43. Redundant OpenSwitch using RCM • RCM supports dual OpenSwitch environments • Two OpenSwitch servers and two RCM’s • Primary OpenSwitch coordinates application end user connections to the active ASE • Primary OpenSwitch coordinates the failover of the Replication Server and the application end users to the standby ASE • All application end users must connect to the active ASE through the primary OpenSwitch • Secondary OpenSwitch is on standby and takes control of the failover processing if the primary OpenSwitch fails • Secondary OpenSwitch provides decision support users access to the standby ASE • Secondary RCM does not allow application end users to connect unless the primary OpenSwitch has failed • Connectivity provides multiple servers for end users. When the primary OpenSwitch fails, users connect to the secondary OpenSwitch (multiple entries in the interfaces file)

  44. Redundant OpenSwitch using RCM • RCM establishes a connection to both the primary and the secondary OpenSwitch servers • RCM is notified when either OpenSwitch fails • Primary OpenSwitch fails: • Primary RCM terminates • Application end users reconnect to the secondary OpenSwitch • Secondary RCM coordinates the connections to the active and standby ASE servers and controls failover processing • Secondary OpenSwitch fails: • Secondary RCM terminates • Users reconnect to the primary OpenSwitch Primary OpenSwitch RCM Primary RCM Secondary RCM RCM Secondary OpenSwitch

  45. RCM Notification Process • The RCM provides a simple notification feature that automatically perform a user-defined process when certain events occur • Executes a process (e.g. a script or a program) defined by the NOTIFICATION_PROCESS configuration parameter • The process is executed from the RCM’s current working directory • The process is executed with the same set of permissions that the RCM was executed with • Output for the process is redirected to a temporary file. The full path name of this file is written to the RCM log and starts with the prefix “rcm.” • The RCM does not delete the temporary output file

  46. RCM Notification Process • Events that trigger the notification process: • The RCM detected a possible failover situation where the active ASE is not responding • The failover process has started • The failover has been aborted because the active ASE has recovered • The RCM cannot connect to the Replication Server • The RCM was unable to start the Replication Agents • Executing the failover process in the Replication Server failed • Switching the users in the OpenSwitch from the active ASE to the standby ASE failed • The RCM has exited. Under normal conditions the RCM should not exit • One of the OpenSwitch servers failed • Test notification. Executed when the user starts the RCM with the –a (analyze) option • A notification ID and a text message are passed to the notification process. The process can interpret the ID to determine the appropriate response

  47. Dynamic SQL Support • OpenSwitch 12.5 supports the following Dynamic SQL Statements: • CS_PREPARE • CS_EXECUTE_IMMEDIATE • CS_EXECUTE • CS_DESCRIBE_INPUT • CS_DESCRIBE_OUTPUT • CS_DEALLOC

  48. ASE Server B Open Switch with ASE 12.x with pre-12 client • ASE 12.x cluster shares a single copy of data, eliminating the need for data replication between ASE Server A and ASE Server B. • Open Switch switches between the two ASE Servers, Server A and Server B. • If either ASE Server becomes unavailable, Open Switch shields the client from the disconnect. • CM still requires a Ping User connection to each ASE Server to ping the server to determine availability. ASE Server A CM

  49. ASE Server B ASE Server B ASE Server A ASE Server A Open Switch with ASE 12.x w/12.x client (HA Aware) SITE ONE SITE TWO • Open Switch still fits into a true ASE 12.x clustered environment where the client has been written or re-written to take advantage of the companion server. • The Open Switch can switch between two or more ASE 12.x clusters. • In ASE 12.x the cluster can only consist of 2 servers, but Open Switch can switch between N servers, in this case N=4. CM

  50. What is new in OSW 12.5.1? • Connection caching • Encrypted Username/Password in configuration file • Multi-threaded CM • Widen platform appeal by addition of Linux platform • Cleaner error messages and improved user documentation • Moved OpenSwitch to use robust Installshield based installer • Extensive stress and functional testing to exercise OpenSwitch

More Related