You are on page 1of 6

Oracle RAC and Hardware Failover To detect a node failure, the Cluster Manager uses a background processGlobal Enqueue

Service Monitor (LMON)to monitor the health of the cluster. When a node fails, the Cluster Manager reports the change in the cluster's membership to Global Cache Services (GCS) and Global Enqueue Service (GES). These services are then re-mastered based on the current membership of the cluster. To successfully re-master the cluster services, Oracle RAC keeps track of all resources and resource states on each node and then uses this information to restart these resources on a backup node. These processes also manage the state of in-flight transactions and work with TAF to either restart or resume the transactions on the new node. Now let's see how Oracle RAC and TAF work together to ensure that a server failure does not cause an unplanned service interruption. Using Transparent Application Failover After an Oracle RAC node crashesusually from a hardware failureall new application transactions are automatically rerouted to a specified backup node. The challenge in rerouting is to not lose transactions that were "in flight" at the exact moment of the crash. One of the requirements of continuous availability is the ability to restart in-flight application transactions, allowing a failed node to resume processing on another server without interruption. Oracle's answer to application failover is a new Oracle Net mechanism dubbed Transparent Application Failover. TAF allows the DBA to configure the type and method of failover for each Oracle Net client. For an application to use TAF, it must use failover-aware API calls from the Oracle Call Interface (OCI). Inside OCI are TAF callback routines that can be used to make any application failover-aware.

While the concept of failover is simple, providing an apparent instant failover can be extremely complex, because there are many ways to restart in-flight transactions. The TAF architecture offers the ability to restart transactions at either the transaction (SELECT) or session level: SELECT failover. With SELECT failover, Oracle Net keeps track of allSELECT statements issued during the transaction, tracking how many rows have been fetched back to the client for each cursor associated with a SELECTstatement. If the connection to the instance is lost, Oracle Net establishes a connection to another Oracle RAC node and re-executes the SELECTstatements, repositioning the cursors so the client can continue fetching rows as if nothing has happened. The SELECT failover approach is best for data warehouse systems that perform complex and timeconsuming transactions. SESSION failover. When the connection to an instance is lost, SESSION failover results only in the establishment of a new connection to another Oracle RAC node; any work in progress is lost. SESSION failover is ideal for online transaction processing (OLTP) systems, where transactions are small. Oracle TAF also offers choices on how to restart a failed transaction. The Oracle DBA may choose one of the following failover methods: BASIC failover. In this approach, the application connects to a backup node only after the primary connection fails. This approach has low overhead, but the end user experiences a delay while the new connection is created.

PRECONNECT failover. In this approach, the application simultaneously connects to both a primary and a backup node. This offers faster failover, because a pre-spawned connection is ready to use. But the extra connection adds everyday overhead by duplicating connections. Currently, TAF will fail over standard SQL SELECT statements that have been caught during a node crash in an in-flight transaction failure. In the current release of TAF, however, TAF must restart some types of transactions from the beginning of the transaction. The following types of transactions do not automatically fail over and must be restarted by TAF: Transactional statements. Transactions involving INSERT, UPDATE, orDELETE statements are not supported by TAF. ALTER SESSION statements. ALTER SESSION and SQL*Plus SETstatements do not fail over. The following do not fail over and cannot be restarted: Temporary objects. Transactions using temporary segments in the TEMP tablespace and global temporary tables do not fail over. PL/SQL package states. PL/SQL package states are lost during failover. Using Oracle RAC and TAF Together The continuous availability features of Oracle RAC and TAF come together when these products cooperate in restarting failed transactions. Let's take a closer look at how this works. Within each connected Oracle Net client, tnsnames.ora file parameters define the failover types and methods for that client. The parameters direct Oracle RAC and TAF on how to restart any transactions that may be in-flight during a hardware failure on the node. It is important to note that TAF failover control is external to the Oracle RAC cluster, and each Oracle Net client may have unique failover types and methods, depending on processing requirements.

The following is a client tnsnames.ora file entry for a node, including its current TAF failover parameters: bubba.world = (DESCRIPTION_LIST = (FAILOVER = true) (LOAD_BALANCE = true) (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP) (HOST = redneck)(PORT = 1521)) (CONNECT_DATA = (SERVICE_NAME = bubba) (SERVER = dedicated) (FAILOVER_MODE = (BACKUP=cletus) (TYPE=select) (METHOD=preconnect) (RETRIES=20) (DELAY=3) ) ) ) The failover_mode section of the tnsnames.ora file lists the parameters and their values: BACKUP=cletus. This names the backup node that will take over failed connections when a node crashes. In this example, the primary server is bubba, and TAF will reconnect failed transactions to the clients instance in case of server failure. TYPE=select. This tells TAF to restart all in-flight transactions from the beginning of the transaction (and not to track cursor states within each transaction). METHOD=preconnect. This directs TAF to create two connections at transaction startup time: one to the primary bubba database and a backup connection to the clients database. In case of instance

failure, the clients database will be ready to resume the failed transaction. RETRIES=20. This directs TAF to retry a failover connection up to 20 times. DELAY=3. This tells TAF to wait three seconds between connection retries. Remember, you must set these TAF parameters in every tnsnames.ora file on every Oracle Net client that needs transparent failover. Putting It All Together An Oracle Net client can be a single PC or a huge application server. In the architectures of giant Oracle RAC systems, each application server has a customized tnsnames.ora file that governs the failover method for all connections that are routed to that application server. Watching TAF in Action The transparency of TAF operation is a tremendous advantage to application users, but DBAs need to quickly see what has happened and where failover traffic is going, and they need to be able to get the status of failover transactions. To provide this capability, the Oracle data dictionary has several new columns in the V$SESSION view that give the current status of failover transactions. The following query calls the new FAILOVER_TYPE, FAILOVER_METHOD, and FAILED_OVER columns of the V$SESSION view. Be sure to note that the query is restricted to nonsystem sessions, because Oracle data definition language (DDL) and data manipulation language (DML) are not recoverable with TAF. select username, sid, serial#, failover_type, failover_method,

failed_over from v$session where username not in ('SYS','SYSTEM', 'PERFSTAT') and failed_over = 'YES'; You can run this script against the backup node after an instance failure to see those transactions that have been reconnected with TAF. Remember, TAF will quickly redirect transactions, so you'll only see entries for a short period of time immediately after the failover. A backup node can have a variety of concurrent failover transactions, because the tnsnames.ora file on each Oracle Net client specifies the backup node, the failover type, and the failover method.

You might also like