How to Identify a Buffer Cache Chain Latch Problem

Internally, Oracle uses various types of structures to control access to elements of the databases including latches and various types of enqueueing (locking) mechanisms. Latches are an elemental locking mechanism that are the primary means used to control access to Oracle internal structures, such as internal memory structures in the SGA. The database buffer cache is a memory area in the Shared Global Area (SGA) where Oracle keeps copies of database blocks. Latching issues can be one major source of performance problems within an Oracle database.

When a process tries to use the latch, it is said to acquire it. Once a process acquires the latch, no other processes can acquire that latch. Latches are like a race in a way. They do not have a queuing mechanism and thus, getting a latch is a first serve first come proposition. As a result, all processes have to contend with each other as they attempt to acquire the same latch. Since there is generally only one latch per structure (with some exceptions) this competition for latches can lead to significant contention issues. The good news is that once the latch is acquired, it is generally released quickly. Exceptions to this rule can be the cause of additional performance problems.

Latch requests come in two flavors, willing to wait and no-wait modes. In willing to wait mode, when a latch cannot be acquired, the acquiring session will go into a spin mode, attempting to acquire the latch over and over a specified number of times. After the number of spins has reached a specific threshold, the session will sleep for a specified period of time, wake up and try the latch again. Spin mode is bad, but sleep mode is worse!

A latch request may also be in no wait mode. This is a latching mode that if the request for the latch is not successful then the latch request will simply fail. The process is then free to simply try to reacquire the latch again, or perhaps try to acquire another latch.

To summarize this section, latch contention can clearly be a major source of performance problems in Oracle, and can be caused by many different problems. There are a number of thing you can do to try to reduce the overall level of latch waits. This topic is about the cache buffer chain latch, so let’s talk about that some more.

The Cache Buffer Chain Latch

In this topic our primary interest is the cache buffer chain latch. It is one of two main latches used to protect data blocks in the buffer cache. The other main latch is called the cache buffer LRU chain latch. The cache buffer LRU chain latch is needed when it’s time to scan the LRU chain for dirty blocks, and is also used when Oracle needs a free block in the SGA.

The cache buffer chains latch is acquired when a process needs to search the buffer cache. Its purpose then is to prevent changes from occurring to the underlying buffer cache structure during the search. The search of the buffer cache is via a linked list (cache buffer chains). The linked lists point to structures in associated hash tables. Different classes of blocks will end up on different hash chains within the hash table, and a single child latch will protect each hash chain.

How to Identify a Buffer Cache Chain Latch Problem

This latch is acquired to search the data blocks in the buffer cache. Contention on this latch is caused by very heavy access to a single block. If there is heavy contention on data blocks, you will see session wait event "buffer busy wait" in v$session_wait. The values of P1, P2, and P3 can be used to drill down to find the object that is experiencing contention. P1 is the file#, P2 the block#, and P3 is the reason. Also, in Oracle9i Release 2 and higher, the v$segment_statistics view can be an invaluable resource for finding out which segments are having contention problems. You can find there where the heaviest number of buffer busy waits are and which object those waits are on. Often, this is much easier than querying the v$session_wait table because:

  1. The v$session_wait table is very dynamic in nature. It’s hard to get a real feeling for what is going on unless you query the view often, which is time consuming and potentially a performance issue in it’s own right.
  2. You don’t have to convert the P1, P2 and P3 values, which can be a pain. v$segment_statistics does this all for you.

If you have a cache buffer chain latching problem, it will be identified by the following query:

   SELECT name, gets, misses, sleeps, 
      immediate_gets, immediate_misses
     FROM v$latch
   WHERE name = ‘cache buffers chains’
/

If the ratio of misses to gets is high (misses/gets*100 is > 10, then you need to determine why you are having a cache buffer chain latching issue and do something about it.

Using The V$Session_Wait Table To Identify Session Waits

The following query can be used to identify session waits. Events that are considered idle wait events (process waiting for something to do) can be filtered out:

column event format a30 
column username format a20 
column state format a10 trunc 
column p1 format 999999999999 heading “P1” 
column p2 format 999999999999 heading “P2” 
column p3 format 99999 heading “P3” 
set lin 132 

SELECT username, V$SESSION_WAIT.sid, V$SESSION_WAIT.event, V$SESSION_WAIT.seq#,
     V$SESSION_WAIT.seconds_in_wait, V$SESSION_WAIT.wait_time, V$SESSION_WAIT.p1 , V$SESSION_WAIT.p2, V$SESSION_WAIT.p3,
     V$SESSION_WAIT.state, V$SESSION_WAIT.p1raw, V$SESSION_WAIT.p2raw, V$SESSION_WAIT.p3raw
   FROM V$SESSION_WAIT, V$SESSION 
WHERE V$SESSION_WAIT.SID = V$SESSION.SID 
  AND NOT(event like ‘SQL%’) 
  AND NOT(event like ‘%message%’) 
  AND NOT(event like ‘%timer%’) 
  AND NOT(event like ‘%pipe get%’) 
  AND NOT(event like ‘%jobq slave wait%’) 
  AND NOT(event like ‘%null event%’) 
  AND NOT(event like ‘%wakeup time%’) 
  ORDER BY wait_time desc, event 
/

The next step would be to identify the object that is the source of the hot blocks. The following query can be used to drill down to the object that is experiencing block contention, using the P1 and P2 values from the query above:

SELECT owner,segment_name,segment_type,tablespace_name
  FROM dba_extents
 WHERE file_id = &p1
 AND &p2 between block_id and block_id+blocks-1
/

This query will also identify the objects that are experiencing block contention, based on block usage latch sleeps:

SELECT /*+ ordered */
       e.owner ||’.’|| e.segment_name segment_name,
       e.extent_id extent#,
       x.dbablk – e.block_id + 1 block#,
       x.tch,
       l.child#
  FROM sys.v$latch_children l,
       sys.x$bh x,
       sys.dba_extents e
 WHERE l.name = ‘cache buffers chains’ 
   AND l.sleeps > &sleep_count 
   AND x.hladdr = l.addr 
   AND e.file_id = x.file# 
   AND x.dbablk between e.block_id and e.block_id + e.blocks – 1
/

Since Oracle8i, there are far fewer cache buffers chains latches, so many more buffers are covered by a single hash latch. There is a touch-count column in X$BH in Oracle8i (X$BH.TCH) which can be used to see the HOT buffers. Hot buffers will typically have a high touch count:

SELECT e.owner ||'.'|| e.segment_name segment_name,
       e.extent_id extent#,
       x.dbablk – e.block_id + 1 block#,
       x.tch
  FROM sys.x$bh x,
       sys.dba_extents e
 WHERE x.tch > 50 
   AND e.file_id = x.file# 
   AND x.dbablk between e.block_id and e.block_id + e.blocks – 1
/

Using The V$segment_statistics Table To Identify Session Waits

As we mentioned earlier, the v$segment_statistics table can be used to identify troublesome sessions as well. Here is a query that you would use to identify the troublesome objects

Select owner, object_name, tablespace_name, statistic_name, value
From v$segment_statistics
Where statistic_name=’buffer busy waits’;

 

Resolving the Buffer Cache Chain Latch Problem

Once you have discovered buffer cache chain latch contention what do you do? First, if you can determine that a specific object is causing problems, then you need to determine why that is the case. Often, this is the result of concurrency issues (multiple sessions going after the same block). If this is the case, then there are several approaches:

First, review your application and determine why it needs to access the same block from multiple sessions. If possible, re-write the application in such a way that concurrency of block requests isn’t an issue. If this is not possible, then a hash partition scheme on the tables impacted might be an alternative. This will allow the concurrent sessions to modify data in the same table, but the hashing of the keys in the table will make it less likely that the applications will try to access the same block. If you have not purchased partitioning (as it is a separate Oracle product, and requires a separate license) then consider increasing pctfree and pctused to reduce the total number of rows per block as seen in this example:

ALTER TABLE emp PCTFREE 80 PCTUSED 15;

Remember that when setting pctfree and pctused that the two numbers added up together must not exceed 100. In this case we set pctfree to 80, which means that Oracle will reserve 80% of the block for updates. This will result in fewer rows per block which will result in less contention for each block by concurrent processes. Do be aware that your space requirements will increase if you choose this solution. For example if pctfree was set to 20, your space requirements will increase by some 60 percent.

If you do have a partition license, you can hash partition your existing table. To do this, you will need to first create the hash-partitioned table from the existing table, including any constraints that may need to be created. Here is an example of partitioning a table called emp:

CREATE TABLE P_emp
     STORAGE (INITIAL 10K)
     PARTITION BY HASH(empno)
       (PARTITION p1 TABLESPACE ts1, PARTITION p2 TABLESPACE ts2,
        PARTITION p3 TABLESPACE ts1, PARTITION p4 TABLESPACE ts3)
AS SELECT * FROM emp;

After executing this statement, you would want to recreate any foreign keys, constraints (such as primary keys) and indexes that existed on the previous EMP table. You would then drop the EMP table and rename P_EMP to EMP with the rename table command as seen here:

RENAME p_emp to emp;

Also, If you are in Oracle9i or later, you might also try to move the table to a tablespace that is a smaller block size and see if that solves the problem. This can be accomplished through the alter table move command as seen in this example:

Create tablespace new_tbs datafile ‘c:\oracle\data\mydb\mydb_newtbs_01.dbf’ size 100m;

Alter table emp move tablespace new_tbs;

Note that if you are using partitioned tables that you will need to move each partition individually such as in this example:

Alter table emp move partition p1 tablespace users;

If you can not seem to narrow down the problem to a specific object, then it might well be that your database buffer cache is just to small. You might consider enabling the buffer cache advice feature in Oracle9i, and review the results of that feature to determine if increasing the buffer cache might be helpful.

Also, if your indexes appear to be experiencing contention problems, consider using reverse key indexes. This will help balance the data within the index and will lessen the likelihood of concurrent processes trying to access the same block.

 

Start the discussion at forums.toadworld.com