Recently, we got an issue in our SAP system where the SAP standard job SAP_PERIODIC_ORACLE_SNAPSHOT was failing every hour with system dump DBIF_RSQL_SQL_ERROR.
The dump had the database error text as
Database error text........: "ORA-01115: IO error reading block from file
(block # )#ORA-01110: data file 30:
'/oracle/<SID>/sapdata1/sr3_23/sr3.data23'#ORA-27072: File I/O
error#Solaris-AMD64 Error: 5: I/O error#Additional information: 4#Additional
We suspended the SAP job SAP_PERIODIC_ORACLE_SNAPSHOT following SAP notes 1369716 and 1080813 which resolved the dumps and system logs issue however, these errors pointed more towards the data block corruption level, which required further investigation.
How to verify if database is corrupted?
The block corruption is not identified in the SAP scheduled DB13 checkdb, hence you need to run one of the below to verify if database is corrupted
1. Execute a Complete RMAN conistency check
brbackup -u / -m all -w only_rmv -e <level of parallel processing> -c
If this does not work for whatever reason, use the DBV instead (either via brbackup with the switch "-w only_dbv" or manually).
2. Check a single data file or block
We ran a validate on single data file quickly as we didn't have enough time to run the complete consistency check and also we had the data file name on which there was a corrupted block.
Determine the data file number for the corrupted data file, you can check the file ID from backup logs or checkdb logs in your SAP system or check it from sql query on V$DATAFILE below:
select file# from v$datafile where name = '<Filename>';
OS> rman nocatalog RMAN> connect target / RMAN> validate datafile <ID>
It verified that our database had the corruption on a specific block number
You can find further information on DB consistency checks and RMAN options in below SAP notes and articles:
SAP Note 22345 : Consistency check of ORACLE database
SAP Note 540463: FAQ: Consistency Checks + Block Corruptions
SAP Wiki : Consistency check with RMAN
How to identify the object which belongs to corrupted block?
Use the below SQL command to check the segment type and name:
select segment_name, partition_name, segment_type, block_id, blocks from dba_extents where (<corrupted block> between block_id and (block_id + blocks - 1)) and file_id = <File# determined from V$Datafile> and rownum < 2;
The object belonging to corrupt block in our case was table named GVD_LATCH_MISSES.
Follow SAP note 365481 to find the best possible resolution for your object type.
In our case , it was a table. Hence, we had to run the analyze on table GVD_LATCH_MISSES to validate the structure using below SQL command
analyze table <table name> validate structure cascade;
SAP recommends to restore the data file from the database backup in case of errors in both dbverify and table analyze , which was our case. We didn't have any good database backup without corruption and also all the database backups were failing with RC 5 since the system was installed and the backup monitoring was not being performed.
Truncation of impacted table
Though we knew that the tables GVD* were not relevant according to SAP Note 1080813 and we had already disabled the related job SAP_PERIODIC_ORACLE_SNAPSHOT, we raised an OSS incident with SAP before truncating this table.
SAP also confirmed to truncate the GVD_LATCH_MISSES table to resolve the issue.
The table analyze was successful after truncation.
However, the DBverify still failed.
Deallocation of unused space from table
Now, our issue scenario was changed to dbverify with errors/analyze without errors after truncation according to SAP Note 365481 , which recommends to deallocate the unused space from the table
Excerpt from SAP Note:
The corruption is probably located in a block of the table that has never contained data. Execute the following command: alter table <tablename> deallocate unused; The corruption should then appear in the free space and should therefore be deleted as described there.
The dbverify was also successful after deallocating the unused space.
Finally, run a full db verify at least once for your database to verify that there are no other data corruption issues.