Oracle Cache Fusion – In Operation

Oracle Cache Fusion – In Operation

Agenda • Cache Fusion • What is it? • Cache Coherency Vs. Cache Fusion • Key Components and terminology • Cache Fusion in operation • Lock Mastering & Resource Affinity • Type of Contentions • Cache Fusion – I • Cache Fusion – II • Examples • Instance Crash Recovery in RAC • Key Components in a Instance crash • I Pass recovery • II Pass recovery

Cache Fusion – What is it?

What is it? Oracle introduced the framework of sharing data using private interconnects between the nodes, which was used only for messaging purposes in previous versions. This protocol is Cache Fusion. Data blocks are shipped throughout the network similar to messages, reducing the most expensive component of data transfer, disk I/O, to data sharing. According to the manual: Process that implement Cache Fusion. It maintains the block mode for blocks in the global role. It is responsible for block transfers between instances. The Global Cache Service employs various background processes such as the Global Cache Service Processes (LMSn) and Global Enqueue Service Daemon (LMD). A diskless cache coherency mechanism in Oracle Real Application Clusters that provides copies of blocks directly from a holding instance's memory cache to a requesting instance's memory cache.

Cache Coherency • According to Manual • The synchronization of data in multiple caches so that reading a memory location through any cache will return the most recent data written to that location through any other cache. Sometimes called cache consistency. • Can We say its something to maintain the resource (block) status, If so, the following two together provides the same for us. • GCS (Global Cache Services) • GES (Global Enqueue Services) In the name of Global Resource Directory

Now both together ……… • The GCS manages all types of data blocks. Cache coherency is maintained through the GCS by requiring that instances acquire a resource (lock or enqueue on a block) cluster-wide before modifying or reading a database block. The GCS is used to synchronize global cache access, allowing only one instance to modify a block at any single point in time. The GCS, through the RAC wide Global Services Directory, ensures that the status of data blocks cached in any mode in the cluster is globally visible and maintained. • Oracle’s RAC has multi-versioning architecture. This multi-versioning architecture distinguishes between current data blocks and one or more consistent read (CR) versions of a block. A current block contains changes for all committed and yet-to-be-committed transactions. A consistent read (CR) version of a block represents a consistent snapshot of the data at a previous point in time. A data block can reside in many buffer caches under the auspices of shared resources. • In Oracle9i RAC, applying rollback segment information to current blocks produces consistent read versions of a block. Both the current and consistent read blocks are managed by the GCS. • To transfer data blocks among database caches, buffers are shipped by means of the high speed IPC interconnect. Disk writes are only required for cache replacement. A past image (PI) of a block is kept in memory before the block is sent if it is a dirty (modified) block. In the event of failure, Oracle reconstructs the current version of the block by reading the PI blocks.

Background Process and their roles • LMSx – Lock Monitor Services (GCS) • Primarily responsible for shipping the blocks across buffers • Provides/creates a CR image whenever there is cross instance call for a dirtyblcok • LMS must also check constantly with the LMD background process (or our GES process) to get the lock requests placed by the LMD process. • Parameter: GCS_SERVER_PROCESS upto 36 as of 10.2, Min. cpu_count/2 • LMON – Lock Monitor Process (GES) • LMON Processes manages the global locks & resources. • Reconfiguration of locks & resources when an instance joins or leaves the cluster are handled by LMON ( During reconfiguration LMON generate the trace files) • LMON also provides cluster group services. • LMD – Lock Manager Daemon • LMD process performs global lock deadlock detection local and remote . (GES) • Also monitors for lock conversion timeouts. • Basically maintains the lock queues, traverse through the GES structures • LCK – Lock Process • Manages instance resource requests & cross instance calls for shared resources. • During instance recovery,it builds a list of invalid lock elements and validates lock elements. • DIAG – Diagnostic Daemon • Oracle 10g - this one new background processes ( New enhanced diagnosability framework).Regularly monitors the health of the instance.Also checks instance hangs & deadlocks.

History of Cache Fusion

Key Components in Cache Fusion Ping The transfer of a data block from one instance’s buffer cache to another instance’s buffer cache is known as a ping. Whenever an instance needs a block, it sends a request to the lock master to obtain a lock in the desired mode. If another lock resides on the same block, the master will ask the current holder to downgrade/release the current lock., this process is known as a blocking asynchronous trap (BAST). When an instance receives a BAST it downgrades the lock as soon as possible. However, before downgrading the lock, it might have to write the corresponding block to disk. This operation sequence is known as disk ping or a hard ping. CR Fabrication When ever there is Consistent read request from any other instance, the holding instance (LMS) has to create a Consistent read image by applying the undo information to the Current Block. Since CR fabrication is I/O expensive which requires a undo into the buffer and apply the undo image etc. Past Image (PI) Blocks PI blocks are copies of blocks in the local buffer cache. Whenever an instance has to send a block it has recently modified to another instance, it preserves a copy of that block, marking it as PI. An instance is obliged to keep Pls until that block is written to the disk by the current owner of the block. Pls are discarded after the latest version of the block is written to disk. When a block is written to disk and is known to have global role, indicating the presence of Pls in other instances’ buffer caches, Global Cache Services (GCS) informs the instance holding the Pls to discard the Pls. With Cache Fusion, a block is written to disk to satisfy checkpoint requests and so on, not to transfer the block from one instance to another via disk. Lock Mastering The memory structure where GCS keeps information about a data block (and other sharable resources) usage is known as the lock resource. The responsibility of tracking locks is distributed among all the instances and the required memory also comes from the participating instances’ System Global Area (SGA). Due to this distributed ownership of the resources, a master node exists for each lock resource. The master node maintains complete information about current users and requestors for the lock resource. The master node also contains information about the Pls of the block.

Resource Affinity and Dynamic remastering • Each block is mastered in any one of the instance at any given point of time • Resource Master can be changed based on frequency of the block that is requested by other instances • For a period of 10 Mins if an instance request 50 times for a particular resource the requested instance become the master. This is called resource affinity - Block Mastering • In Oracle 9.2 • documentation describes dynamic remastering • not implemented in code • In Oracle 10.1 • work at data file level • very high threshold so difficult to test • does occur on some customer sites • may cause LMON process to crash in 10.1.0.4 • bug 3659289 - patch available • fixed in 10.1.0.5/10.2.0.1 • In Oracle 10.2 • works at object level • thresholds are relatively low. • Object re mastering is recorded in V$GCSPFMASTER_INFO

Cache Fusion- Possible Types of Contention • Contention of a resource occurs when two or more instances want the same resource. If a resource such as a data block is being used by an instance and is needed by another instance at the same time, a contention occurs. There are three types of contention for data blocks: • Read/Read contention Read/read contention is never a problem because of the shared disk system. A block read by one instance can be read by other instances without the intervention of GCS. • Write/Read contention Write/read contention was addressed in Oracle 8i by the consistent read server. The holding instance constructs the CR block and ships the requesting instance using interconnects. • Write/Write contention Write/write contention is addressed by the Cache Fusion technology. Since Oracle 9i, cluster interconnect is used in some cases to ship data blocks among the instances that need to modify the same data block simultaneously.

Prior to Cache Fusion (before 8.1.5) Write/read contention before Cache Fusion

Cache Fusion – I aka Consistent Read Server Write/Read contention - CR Block Transfer in Cache Fusion Oracle Introduced a background process called BSP (Block Server process) makes the CR fabrication at the holder’s cache and ships the CR version of the block across the interconnect

Still need to address Write/Write Contention Write / Write Contention before Cache Fusion – II (before 9i)

So now – Cache Fusion – II or Write/Write Cache Fusion Cache Fusion current block transfer (from 9i r2 )

Buffer States In Cache Fusion SL When an instance has a resource in SL form, it can serve a copy of the block to other instances and it can read the block from disk. Since the block is not modified, there is no need to write to disk. XL When an instance has a resource in XL form, it has sole ownership and interest in that resource. It also has the exclusive right to modify the block. All changes to the blocks are in its local buffer cache, and it can write the block to disk. If another instance wants the block, it will contact the instance via GCS. NL A NL form is used to protect consistent read blocks. If a block is held in SL mode and another instance wants it in X mode, the current instance will send the block to the requesting instance and downgrade its role to NL. SG In SG form, a block is present in one or more instances. An instance can read the block from disk and serve it to other instances. XG In XG form, a block can have one or more Pls, indicating multiple copies of the block in several instances’ buffer caches. The instance with the XG role has the latest copy of the block and is the most likely candidate to write the block to disk. GCS can ask the instance with the XG role to write the block to disk or to serve it to another instance. NG After discarding Pls when instructed by GCS, the block is kept in the buffer cache with NG role. This serves only as the CR copy of the block.

Example 1: Reading a Block from Disk

Example 2: Reading a Block from the Cache

Example 3: Getting a (Cached) Clean Block for Update

Example 4: Getting a (Cached) Modified Block for Update and Commit

Example 5: Commit the Previously Modified Block and Select the Data

Example 6: Write the Dirty Buffers to Disk Due to Checkpoint

Example 7: Master Instance Crash

Example 7: What Alert log says abt reconfiguration……. • List of nodes: • 0 1 2 • Global Resource Directory frozen • * dead instance detected - domain 0 invalid = TRUE • Communication channels reestablished • * domain 0 valid = 0 according to instance 0 • Wed Jun 21 23:22:22 2006 • Master broadcasted resource hash value bitmaps • Non-local Process blocks cleaned out • Wed Jun 21 23:22:22 2006 • LMS 0: 0 GCS shadows cancelled, 0 closed • Wed Jun 21 23:22:22 2006 • LMS 2: 0 GCS shadows cancelled, 0 closed • Wed Jun 21 23:22:22 2006 • LMS 3: 0 GCS shadows cancelled, 0 closed • Wed Jun 21 23:22:22 2006 • LMS 1: 0 GCS shadows cancelled, 0 closed • Set master node info • Submitted all remote-enqueue requests • Dwn-cvts replayed, VALBLKs dubious • All grantable enqueues granted • Wed Jun 21 23:22:22 2006 • LMS 0: 2189 GCS shadows traversed, 332 replayed • Wed Jun 21 23:22:22 2006 • LMS 2: 2027 GCS shadows traversed, 364 replayed • Wed Jun 21 23:22:22 2006 • LMS 3: 2098 GCS shadows traversed, 364 replayed • Wed Jun 21 23:22:22 2006 • LMS 1: 2189 GCS shadows traversed, 343 replayed • Wed Jun 21 23:22:22 2006 • Submitted all GCS remote-cache requests • Fix write in gcs resources • Reconfiguration complete

Crash Recovery – Key Components • Redo Threads and Streams • Redo Records and Change Vectors • Checkpoints • Thread Checkpoint or Local Checkpoint • Database Checkpoint or Global Checkpoint • Incremental Checkpoint • Bounded Recovery • Block Written Record (BWR) • Past Image (PI) • Checkpoints and PI • I Pass Recovery • II Pass Recovery • Merge Threads

Cache Fusion - Crash Instance Recovery The steps for GRD reconfiguration are as follows: Instance death is detected by the cluster manager. Requests for PCM locks are frozen. Enqueues are reconfigured and made available. DLM recovery. GCS (PCM lock) is remastered. Pending writes and notifications are processed. The steps for I Pass recovery are as follows: The instance recovery (IR) lock is acquired by SMON. The recovery set is prepared and built. Memory space is allocated in the SMON Program Global Area (PGA). SMON acquires locks on buffers that need recovery. II Pass recovery steps are as follows: II Pass is initiated. The database is partially available. Blocks are made available as they are recovered. The IR lock is released by SMON. Recovery is complete. The system is available.

Example 8: Select the Rows from Instance A

Just for a clear understanding…… • Its time to play ……

Cross Instance Consistent Read col2: 340 col1: ENG slot 0 col2: 352 col2: 350 col2: 344 ITL1 col3: 10 col2: 99 col1: AUS slot 1 col3: 1 col1: ENG slot 0 col3: 1 col2: 340 col3: 10 ITL1 col2: 99 col2: 344 slot 0 col2: 352 col1: ENG col1: AUS col3: 1 col2: 350 slot 1 slot 1 col1: AUS col3: 1 col3: 10 col2: 350 col2: 344 ITL1 col3: 10 col2: 99 col1: AUS slot 1 col2: 340 col1: ENG col2: 352 slot 0 col2: 99 col2: 344 ITL1 col2: 350 col2: 99 ITL1 col2: 344 col3: 10 col2: 352 col2: 340 col1: ENG col2: 350 col2: 352 col2: 340 col3: 1 slot 1 col1: AUS slot 0 segment 5 slot 18: state: 10wrap#: 4E7dba: 00800777 UPDATE score SET runs = runs + 2 WHERE team = 'ENG'; SELECT runs,wicketsFROM scoreWHERE team = 'ENG'; UPDATE score SET runs = runs + 4 WHERE team = 'ENG'; Build read consistent version of block 42 UPDATE score SET runs = runs + 6 WHERE team = 'ENG'; 5.1 5.1 5.1 block 42 slot 0 block 42 slot 0 block 42 slot 0 uba: 800777.530.12 uba: - col2: 350 col2: 340 col2: 344 uba: - col2: 340 col2: 344 uba: 800777.530.13 col2: 350 uba: 800777.530.13 uba: 800777.530.13 col2: 350 uba: 800777.530.12 uba: - uba: 800777.530.12 col2: 340 col2: 344 seq: 530 irb 12 col3: 350 col3: 344 col3: 340 xid: 0005.018.4E7 xid: 0005.018.4E7 xid: 0005.018.4E7 xid: 0005.018.4E7 xid: 0005.018.4E7 xid: 0005.018.4E7 uba: 800777.530.13 uba: 800777.530.14 uba: 800777.530.12 uba: 800777.530.12 uba: 800777.530.13 uba: 800777.530.13 uba: 800777.530.14 uba: 800777.530.13 uba: 800777.530.13 uba: 800777.530.14 uba: 800777.530.14 uba: 800777.530.12 uba: 800777.530.14 uba: 800777.530.12 uba: 800777.530.12 uba 800777.530.13 Data Block 42 (copy) Data Block 42 (copy) Data Block 42 Data Block 42 Data Block 42 Data Block 42 Instance 1 Instance 2 Session 15 LMS0 Session 27 Undo Header 12 uba: - 13 uba 800777.530.12 14 Data Block 42 (copy) Data Block 42 Undo Block 800777

UPDATE score SET runs = 200WHERE team = 'ENG'; 22:10 22:9 LMS0 Session27 Session15 ENG 200 ENG 199 UPDATE score SET runs = 204WHERE team = 'ENG'; ENG 204 ENG 200 ENG 199 ENG 205 ENG 199 ENG 205 ENG 205 ENG 205 UPDATE score SET runs = 205WHERE team = 'ENG'; ENG 204 AUS 99 AUS 99 AUS 99 AUS 99 AUS 99 COMMIT; SELECT runs FROM score WHERE team = 'ENG'; Commited Block – Block on Disk Block 42 UndoBlock Instance 1 Instance 2

UPDATE score SET runs = 200WHERE team = 'ENG'; 22:9 22:10 Session27 Session15 LMS0 ENG 200 ENG 199 UPDATE score SET runs = 204WHERE team = 'ENG'; ENG 204 ENG 200 ENG 199 ENG 205 ENG 205 ENG 199 ENG 205 UPDATE score SET runs = 205WHERE team = 'ENG'; ENG 204 AUS 99 AUS 99 AUS 99 AUS 99 COMMIT; SELECT runs FROM score WHERE team = 'ENG'; STOP Committed Block – Block on Buffer Cache Block 42 UndoBlock Instance 1 Instance 2

UPDATE score SET runs = 200WHERE team = 'ENG'; 22:10 LMS0 Session27 Session15 ENG 200 ENG 199 UPDATE score SET runs = 204WHERE team = 'ENG'; ENG 199 ENG 200 ENG 204 ENG 204 ENG 200 ENG 199 ENG 199 ENG 199 ENG 205 ENG 199 ENG 205 ENG 205 UPDATE score SET runs = 205WHERE team = 'ENG'; ENG 204 AUS 99 AUS 99 AUS 99 AUS 99 AUS 99 AUS 99 SELECT runs FROM score WHERE team = 'ENG'; Uncommitted Block – Block in Buffer cache Block 42Copy Block 42 UndoBlock Instance 1 Instance 2

UPDATE score SET runs = 200WHERE team = 'ENG'; 22:10 Session27 LMS0 Session15 ENG 200 ENG 200 ENG 200 ENG 200 ENG 199 ENG 199 ENG 199 UPDATE score SET runs = 204WHERE team = 'ENG'; ENG 200 ENG 199 ENG 204 ENG 204 ENG 204 ENG 204 ENG 204 ENG 200 ENG 200 ENG 200 ENG 205 ENG 205 ENG 199 ENG 199 ENG 199 ENG 205 ENG 199 ENG 199 ENG 205 ENG 205 ENG 205 ENG 205 UPDATE score SET runs = 205WHERE team = 'ENG'; ENG 204 ENG 204 ENG 204 AUS 99 AUS 99 AUS 99 AUS 99 AUS 99 AUS 99 AUS 99 AUS 99 SEE SLIDE NOTES FOR ADDITIONAL INFORMATION SELECT runs FROM score WHERE team = 'ENG'; Uncommitted Block – On Disk Block 42 UndoBlock Instance 1 Instance 2

Q & A

References:- • Oracle 10g Real Application Clusters handbook – K Gopalkrishnan • Julian Dyke – RAC Presentation • Oracle 10g RAC Administrators Guide

Oracle Cache Fusion – In Operation