Identify Contention

The following query determines the miss ratio and the "immediate" miss ratio for redo log latches.

Col MISS_RATIO format 999.99
Col IMM_MISS_RATIO format 999.99

SELECT substr(ln.name, 1, 20), gets, misses, immediate_gets,
       immediate_misses, 
       (misses / decode(gets,0,gets+1,gets)) * 100 MISS_RATIO,
       (immediate_misses)/
       decode(immediate_gets+immediate_misses,0,
       immediate_gets+immediate_misses+1,
       immediate_gets+immediate_misses) * 100
       IMM_MISS_RATIO
  FROM v$latch l, v$latchname ln 
 WHERE ln.name = ‘redo allocation’ 
   AND ln.latch# = l.latch#
/

If the ratio of MISSES to GETS exceeds 1%, or the ratio of IMMEDIATE_MISSES to (IMMEDIATE_GETS + IMMEDIATE_MISSES) exceeds 1%, there is latch contention. Additionally the WAIT_TIME statistic in V$LATCH can be used as a guide as to the overall impact of a given latch on the system. For example, there may be a large number of latch misses, but if the overall wait time for those latch misses only accounts to a few seconds, then the latching issue may be insignificant when compared to other tuning issues.

Additionally, the following statistics (which are available in V$SYSSTAT) indicate contention in the redo log buffer at some level:

Table 1: Contention Statistics in the Redo Log Buffer

Statistic Name Meaning
Redo buffer allocation retries Indicates that a user process has had to wait for space in the redo log buffer. Increase the size of the redo log buffer if you see this latch causing wait issues. This might also imply an IO problem with the online redo logs (since the process of writing to the online redo logs can cause delays in allocating space to the redo log buffer).
Redo writer latching time High values over a short period of time can indicate latching issues associated with the LGWR process. This might be a result of disk contention causing LGWR writes to the online redo logs to be delayed.
Redo log space wait time Indicates sessions have waited for space to be allocated from the redo log buffer. High values can indicate latch contention for the redo logs. This could indicate an insufficient number of redo logs, or redo logs that are not sized properly. Additionally this can indicate IO problems with the disks that the redo logs are located on.

However most important are the wait times reflected in V$SYSTEM_EVENT. The following tables list the wait events you might see in V$SYSTEM_EVENT that are associated with redo contention issues:

Table 2: Wait Events seen in V$SYSTEM_EVENT

Wait Event Meaning
Log buffer space Could indicate that there is some form of contention occurring with the online redo logs. This might be disk contention, or perhaps system resource issues. This wait event might also imply that the redo log buffer is to small.
Log file switch (archiving needed) Indicates that a log file that Oracle tried to switch to was in need of archiving. This indicates that the ARCH process is having problems. To solve this problem ensure that there is enough space in the archive log destinations. Also make sure there is no IO contention for the online redo logs. Finally, consider increasing the number of ARCH processes.
Log file switch (checkpoint incomplete) This implies that there is some contention with regards to the DBWR process. Determine if there is an IO problem, and correct it. In some cases, if you don’t have enough online redo logs available, this event may appear.
Log file sync Occurs after a commit or rollback when the redo log buffer is flushed to the online redo logs. If significant waits are seen for this event, consider reducing the size of the redo log buffer. Also consider batching your commits, and check for IO problems writing to the online redo log files.

 

How to Address Contention Caused by the Redo Allocation Latch

If you have identified contention caused by the redo allocation latch, you can increase their number via the LOG_PARALLELISM parameter in Oracle 9i. This parameter allows the parallel redo generation and can increase the throughput of certain update-intensive workloads. This parameter is usually only used on high end servers that have more then 16 processors. Oracle Corporation recommends setting LOG_PARALLELISM to a value between 2 and 8 when running on systems with 16 to 64 processors. Setting LOG_PARALLELISM to values greater than 8 is not currently recommended.

Oracle8i offers the LOG_SIMULTANEOUS_COPIES parameter instead of LOG_PARALLELISM. This parameter allows you to increase the number of redo copy latches.

Finally, look to your SQL and ensure that it is not performing any unnecessary DML activity. A reduction in IO’s (logical or physical) can have a significant impact on overall performance, including reduction of wait times on the redo allocation latch.

Start the discussion at forums.toadworld.com