Block Corruption in an Oracle Database

    Dec 1, 2017 9:58:33 AM by Deiby Gomez

    Introduction

    Block corruption is a common topic when we are dealing with any software that stores data. In Oracle Database there are several types of logical structures that are mapped to a physical file named “datafile” that is divided into filesystem blocks.

     

    A block can have logical or physical corruption. A corrupt block is a block that has been changed so that it differs from what Oracle expects to find. A logical corruption is a block that has a valid checksum but its content is corrupt; for example, a row locked by a non-existent transaction, the amount of space used is not equal to block size, avsp bad, etc. Logical corruption can cause ORA-600 depending on which content inside the block is corrupted. A physical corruption is also called a media corruption; the database does not recognize the block at all, it’s when the problem is not related to the content but to the physical location or structure itself; for example, a bad header, a fractured/incomplete block, the block checksum is invalid, the block is misplaced, zeroed-out blocks, the header and footer of the block do not match, one of the key data block data structures is incorrect, such as the data block address (DBA), etc.

    Detecting, monitoring and fixing corrupt blocks is an important task that we have take care of regularly and frequently. A corrupt block not only means a problem with the block but also means that there is data that may be lost, and this is very important for the business.

     

    The Problem

    The problem with corrupted blocks is that we don’t know they are corrupted until we try to use them. Of course, this applies to a scenario where we are not executing a proactive activity to detect corrupt blocks. For example, a table block can be corrupted and there is no way to know it until someone performs a SELECT or any other DML that reads that block. Once the block is read, Oracle will know the block is corrupted and then an ORA-0600, ORA-27047 or ORA-01578 will be returned to the user.

    A long time ago a customer called me saying that they were trying to execute a SELECT from the application and whenever the SELECT was executed the application got an ORA-01578. I detected the block # and the datafile # and I fixed it. At that time, the user was able to continue working the rest of the day. However, the next day again, the same customer called me saying that they were receiving more ORA-01578’s. This time I confirmed that the corrupted block was in a different datafile than the block I’d fixed a day before. This made me think that there could be more corrupted blocks.  I executed dbverify against the full database and I saw that the database had several corrupted blocks.  However last rman backup didn’t report any corrupted blocks. We engaged a sysadmin and he detected that the storage was having issues that day. Fortunately we detected the storage problem quickly and no data was lost. But if these kinds of issues are not detected properly the data can be compromised.  In this example we have been talking about a physical problem, but there are some other cases where it is more difficult to detect the problem, especially when it is a logical corruption.

     

    How to avoid it

    Using Oracle ASM: Oracle recommends using ASM as the storage for the database. ASM has three types of redundancy: External, Normal and High.  If we are using Normal or High Oracle keeps a copy (Normal) or two copies (High) of every block. This block is called “Mirror Block” and whenever it finds a corrupt block, it automatically restores the corrupt block from one of its mirror copies. I have written an article where I explain with a lot of details how Oracle recover a block from its mirror copy, in case you want you read it: Data block recovering process using Normal Redundancy

    Using parameter db_block_checking: This parameter is used to control whether block checking is done for transaction managed blocks. As early detection of corruptions is useful, and has only a small performance impact. However, there are some types of applications where having the parameter DB_BLOCK_CHECKING = TRUE can have a considerable overhead, all depends on the application,  to test the change in a test environment is recommended. The immediate overhead is a CPU overhead of checking a block contents after each change but a secondary effect is than that this means blocks are held for longer periods of time so other sessions needing the current block image may have to wait longer. The actual overhead on any system depends heavily on the application profile and data layout.

    Using parameter db_block_checksum: determines whether DBWn and the direct loader will calculate a checksum (a number calculated from all the bytes stored in the block) and store it in the cache header of every data block when writing it to disk. Checksums are verified when a block is read – only if this parameter is TYPICAL or FULL and the last write of the block stored a checksum. In FULL mode, Oracle also verifies the checksum before a change like  update/delete statements and recomputes it after the change is applied. In addition, Oracle gives every log block a checksum before writing it to the current log. Checksums allow Oracle to detect corruption caused by underlying disks, storage systems, or I/O systems. If set to FULL, DB_BLOCK_CHECKSUM also catches in-memory corruptions and stops them from making it to the disk. Turning on this feature in TYPICAL mode causes only an additional 1% to 2% overhead. In FULL mode it causes 4% to 5% overhead.

    Dbfsize: Can be used to check the consistency of Block 0.

    Dbverify: Can be used to check Oracle datafiles for signs of corruption and give some degree of confidence that a datafile is free from corruption. It opens files in a read-only mode and so cannot change the contents of the file being checked. It checks that datafile has a valid header. Each data block in the file has a special "wrapper" which identifies the block – this "wrapper" is checked for correctness. Dbverify also checks that DATA (TABLE) and INDEX blocks are internally consistent. And, from 8.1.6 onwards, it checks that various other block types are internally consistent (such as rollback segment blocks).

    RMAN VALIDATE command:  You can use the VALIDATE command to manually check for physical and logical corruptions in database files. This command performs the same types of checks as BACKUP VALIDATE. By default, RMAN does not check for logical corruption. If you specify CHECK LOGICAL on the BACKUP command, however, then RMAN tests data and index blocks for logical corruption, such as corruption of a row piece or index entry.

    RMAN> validate check logical database;

    RMAN > validate database;

    RMAN > validate backupset 11;

    RMAN > validate datafile 2 block 11;

    I have written some other articles related to RMAN and corrupt blocks,  in case you want to read more about the issue

     

    Conclusion 

    Perform proactive tasks to detect or avoid having physical and logical corruption, if the corruption is detected on time, the solution can be easily executed. Oracle offers several tools that we can use to detect, monitor, and fix corruption in the block. It is important to be aware of these type of problems so that our data is not compromised. 

    Tags: Oracle

    Deiby Gomez

    Written by Deiby Gomez

    Deiby Gómez is the youngest Oracle ACE and Oracle ACE Director in the world and the first Guatemalan with these awards. Deiby is the youngest Latin American with the highest certifications “Oracle Certified Master 11g” and “Oracle Certified Master 12c”. He received In Vegas, United States the "SELECT Journal Editor’s Choice Award 2016", he became the first Guatemalan with that award. He is a frequent speaker in several Oracle Events around the world like “Technology Network Latin American Tour 2013, 2014, 2015 and 2016 in several countries like Guatemala, Costa Rica, Nicaragua, El Salvador, Uruguay, Argentina, Mexico, Brazil, Ecuador, Colombia, Peru; Collaborate in Vegas, USA; Latin American Oracle Open World in Brazil (2015 and 2016) and Oracle Open World in San Francisco, USA (2015 and 2016). He is the first Guatemalan who was accepted by Oracle Corporation as “Beta Tester” for the version “12cR2” in 2015. He is the official Technical Reviewer of the Book “Oracle Database 12c Release 2 Multitenant (1st Edition, McGraw-Hill)" and Co-Author of the book "Oracle Database 12c Release 2 Testing Tools and Techniques for Performance and Scalability", both can be found in Amazon. He is the first Guatemala who appeared as an outstanding expert in the official magazine of Oracle Corporation called “Oracle Magazine” in the Edition November/December in 2014, this magazine is delivered around the world in several countries. He has published several articles in the Official Website of Oracle (www.oracle.com) in Portuguese, Spanish and English. Currently he is President of Guatemalan Oracle Users Group (GOUG). Director of Support Quality en Latin American Oracle Users Group Community (LAOUC) for 2016-2017, founder of a very well known group in Oracle Community called “Oraworld Team” that has members from India, Guatemala, Brazil, France and Switzerland having in total 4 Oracle ACE Directors, 3 Oracle ACE, 5 Oracle Certified Masters (OCM) and 1 PhD. Deiby was part of Oracle ACE Hackaton in Amsterdam, Netherlands in April 2016 where he was building several solutions using Oracle Cloud Products. Currently Deiby Gómez is CEO in Nuvola Consulting Group, a company that provides excellence on Support and Consulting services with Oracle Technology like Databases, Middleware, Cloud and Engineered systems. Deiby is well known in the community because of his resilience, entrepreneurship and his availability to help and share his knowledge. Deiby loves to travel, to play chess and to enjoy a good cup of coffee with friends. Oracle ACE (at the age of 23) Oracle Certified Master 11g (at the age of 24) Oracle ACE Director (at the age of 25) Oracle Certified Master 12c (at the age 26) SELECT Journal Editor’s Choice Award 2016 Speaker in several Universities, OTN Tour 2013,2014,2015. Collaborate15, LA Oracle Open World. Technical Reviewer of the Book "Oracle Database 12c Release 2 Multitenant (Oracle Press) 1st Edition" Blogger. Oracle Certifications: Oracle Linux Certified Implementation Specialist. Oracle Database 11g Administrator Certified Professional. Oracle Database 11g Administrator Certified Master (OCM 11g) Oracle Database 12c Administrator Certified Master (OCM 12c) Oracle Database 12c Maximum Availability Architecture Certified Master (MAA OCM 12c) Oracle Database 12c Maximum Availability Architecture Certified Expert (MAA OCE 12c) Oracle Database 12c Administrator Certified Professional. Oracle Service Oriented Architecture Infrastructure Implementation Certified Expert. Oracle Exadata Database Machine Administrator. Oracle RAC 11g and Grid Infraestructure Administrator Oracle RAC 12c and Grid Infraestructure Administrator Oracle Real Application Clusters 12c Certified Implementation Specialist Oracle Database 12c: Data Guard Administrator