Friday, December 31, 2021

RMAN, INCARNATIONS, and ME part II: Recovery ignoring backups / ORA-19563: datafile copy header validation failed for file

ImageIntro

In an ideal world, I believe, every backup script should be matched by 2 recovery scenarios. As this swiss OUC figure reminds us, having a backup that works means nothing without a tested restore and recovery. Today I’ll try to depict the don’t do’s after an recovery of a controlfile with open resetlogs using RMAN. Although some scenarios were a bit extreme, I still found out some weird behaviour from RMAN as I had to figure out a workaround on my own to complete the database recovery after a new incarnation was created. 

The Backup

  • Before heading to the recovery scenarios, I’ll describe how & what kind of backups were running before this exercise.  
  • Configuration:
    Oracle 19c CDB with 2 user PDBs and archive logging enabled . OS: Windows server
  • The backup Script:
    It’s a 19c CDB backed up using Image copies and incremental  backups  that will be merged with the copies every 4 days (recover copy Until time sysdate-3 days).
  • Below is a sample of the script with dummy paths & shorter retention (sysdate-1) but it’s very similar.

    Image

2. The recovery scenarios

    • Remember to always record the DBID of the database 
    • Ensure VALIDATE DATABASE/RESTORE, RESTORE ARCHIVELOG ALL VALIDATE are successful
    • In order to fully assess my backups, I first choose to try the easy recovery scenarios
  • Easy recoveries (all successful)
    • Data File Recovery PDB level ✔
      • User datafile Drop and recover 
      • PDB System tablespace drop and recover  (CDB needs to be in mount state).
    • Data File Recovery CDB level ✔
      • ROOT container system tablespace drop and recover
    • Loss of spfile in CDB ✔
      • Startup NoMount> Recover Spfile from Autobackup> shutdown immediate> startup
      • Note: PDBs don’t have spfiles, it’s all stored in PDB_SPFILE$.

2.2 Control file recovery issues

  • Restoring/recovering the Controlfile will requires an Open resetlogs no matter if the redo logs or datafiles are still valid. Open Resetlogs means a new Incarnation which brings me to the following questions

      Incarnation Questions

  • Will the backup cycle be broken after a recovery of a controlfile and a new incarnation ?
  • What happens when the subsequent incremental backup is merged with the db copies in 4 days ?
  • The copy pointing to the old incarnation number will be merged with an incremental pointing to the new one, does that make the backup invalid ?
  • Could we switch back to the previous incarnation after recovery and trick RMAN by still runing daily incremental backups and old image copies merged with new incremental backups ?


3. First Controlfile recovery

  • I found out that the recovery of the control file was already complaining. A valid system datafile had to be restored and DB recovered so it can be opened in resetlogs mode, which was odd.

    D\**\CONTROLFILE>:> del \f O1_MF_HSN45YNX_.CTL
    RMAN> STARTUP FORCE NOMOUNT; RMAN>SET DBID 670048894;
    RUN
    { SET CONTROLFILE AUTOBACKUP FORMAT
    FOR DEVICE TYPE DISK TO '\\backup$\D_controlfile_%F';
    RESTORE CONTROLFILE FROM AUTOBACKUP;
    }
    RMAN> ALTER DATABASE MOUNT; RMAN> ALTER DATABASE OPEN RESETLOGS; ----------------------- bug ??--------------------------------
    ORA-01194: file 1 needs more recovery to be consistent
    ORA-01110: data file 1: 'D:\ORACLE\CDB\CDBDEV02\ORADATA\SYSTEM01.DBF'

    RMAN> restore datafile 1;
    Finished restore
    RMAN> recover datafile 1;
    RMAN-03002: failure of recover command at 09/20/2021 14:32:55
    RMAN-06067: RECOVER DATABASE required with a backup or created control file
    I did a recover database :
    RMAN> RECOVER DATABASE;
    Finished recover

    RMAN> alter database open resetlogs;
    Statement processed
    ------------------------------------------------------------------------------
    RMAN> list incarnation ;

    DB Key  Inc Key DB Name  DB ID        STATUS  Reset SCN  Reset Time
    ------- ------- -------- ------------ ------- ---------- ---------------------
    1       1       CDBDEV2 670048894    PARENT  1           28-oct-2020 20:45:50
    2       2       CDBDEV2 670048894    CURRENT 17463165117 20-sep-2021 14:51:59

Now we have a new incarnation after resetlogs startup.I know anytime a resetlogs happens, a new backup is required but I still can’t answer my questions about the impact on the backup cycle & the next database copies merge.
So I came up with a bold theory where reverting to incarnation 1 will allow to keep merging copies with the backups. It was stupid but I wanted to try since it was a clone environment. “can’t learn if you don’t break” 

My crazy scenario (don’t do this at home)

1.  Incarnation was reverted to 1 right before the the evening backup .

2. Next morning I dropped the tablespace system datafile and wanted to recover it =“Epic fail”.

RMAN> run { restore tablespace system; recover tablespace system;}
Finished recover at 21-sep-2021 14:52:17

RMAN> alter database open;
ORA-01589: must use RESETLOGS or NORESETLOGS option for database open

RMAN> alter database open noresetlogs;
ORA-01588: must use RESETLOGS option for database open

RMAN> alter database open resetlogs;
ORA-01190: control file or data file 3 is from before the last RESETLOGS
ORA-01110: data file 3: 'D:\ORACLE\CDB\**\ORADATA\SYSAUX01.DBF'

RMAN> validate database root;
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
1 OK 0 6788 68480 17463164895 File Name: D:\ORACLE\CDB\**\SYSTEM01.DBF --
Same with all CDB files

RMAN> restore datafile 3;
RMAN> recover datafile 3;
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03
RMAN-06067: RECOVER DATABASE required with a backup or created control file

-- Try full Database Recovery

RMAN> recover database;
ORA-00283: recovery session canceled due to errors
ORA-19912: cannot recover to target incarnation 2

RMAN> reset database to incarnation 1;
RMAN> run {
set until SCN 17463165116; # last RESETLOG SCN -1
restore database;
recover database;
}

RMAN-03002: failure of recover command at 09/22/2021 10:50:50
ORA-00283: recovery session canceled due to errors
RMAN-11003: failure during parse/execution of SQL statement:
alter database recover if needed start until cancel
ORA-16433: The database or pluggable database must be opened in read/write mode.
Error from the alert.log:
…ORA-01190: control file or data file 1745 is from before the last RESETLOGS
ORA-01110: data file 1745: 'K:\ORACLE\DATA\PTRPTS_DF_3.DBF'

        I checked the header value of the data files and confirmed some files weren’t completely recovered

SELECT FHTHR Thread, FHRBA_SEQ Sequence, count(1) --,fhsta STATUS
FROM X$KCVFH group by FHTHR, FHRBA_SEQ –, fhsta STATUS
order by FHTHR, FHRBA_SEQ;
THREAD SEQUENCE   COUNT(1) ------- ---------- --------        
1    26          4       
1    5724        803

SQL> SELECT 'ROOT' con_name,status,checkpoint_change#,checkpoint_time,
resetlogs_change#, resetlogs_time, count(*), fuzzy,h.con_id
from v$datafile_header h
  where h.con_id=1
  Group by status,checkpoint_change#,checkpoint_time,
resetlogs_change#,resetlogs_time, fuzzy,h.con_id
UNION
SELECT p.name,status,checkpoint_change#,checkpoint_time,
resetlogs_change#, resetlogs_time, count(*), fuzzy,h.con_id
from v$datafile_header h, v$pdbs p
   where h.con_id=p.con_id
   Group by p.name, status,checkpoint_change#,checkpoint_time,
resetlogs_change# ,resetlogs_time, fuzzy,h.con_id
   order by con_id;

Con_name STATUS CHECKPOINT_CHANGE# CHECKPOINT_TIME RESET_CHANGE# RESETLOGS_TIME     COUNT(*) FUZ  CON_ID  ERROR
---------- ------- ------------------- -------------------- ----------------- -------------------- -------- ---- ------- --------------
ROOT       ONLINE       17463165116    20-SEP-2021 14:09:03  1                28-OCT-2020 20:45:50  4        NO   1
PDB$SEED   ONLINE           1630255    29-OCT-2020 19:06:11  1                28-OCT-2020 20:45:50  4        NO   2
FSDEMO     ONLINE       17463165116    20-SEP-2021 14:09:03  1                28-OCT-2020 20:45:50  194      NO   3
FS_PDB     OFFLINE      17463164868    20-SEP-2021 14:07:51  1                28-OCT-2020 20:45:50  605      No   4

Down the rabbit hole

Things went from bad to worse the more I tried to recover the DB an RPO prior to last resetlogs.
I want to just remind few points before I end this long end endless scourge.

  • Controlfile recovery from Sep 19th autobackup corresponding to last resetlogs has   
    • New Incarnation key: 2
    • SCN 17463165117   -     Seq 5727
    • Time 20-sep-2021 14:51:59
  • Previous Incarnation  key: 1
    • Time 28-oct-2020 20:45:50
  • I reverted the incarnation to key 1 which was absurd on a hindsight
    • Incremental backup kept running 


Solution

Case 1: Recover from a new incarnation via manual Controlfile copy

luckily a copy of the initial controlfile was kept aside so I decided to reuse it and try recovering

  • Restore the control file using a manual copy 

RMAN> startup nomount force;
RMAN> restore controlfile from 'D:\Backup\19c\O1_MF_HSN45YNX_.CTL';
Starting restore at 23-sep-2021 17:11:09
output file name=D:\ORACLE\CDB\CDBDEV2\REDO\CONTROLFILE\O1_MF_HSN45YNX_.CTL
Finished restore at 23-sep-2021 17:11:11

RECOVER the database
RMAN> alter database mount;

1. Using sequence before the LAST RESETLOGS
SQL> recover database until logseq 5728; -- Resetlogs seq +1
Starting recover at 23-sep-2021 17:26:35
Finished recover at 23-sep-2021 17:29:25
OR
2. Using SCN before the LAST RESETLOGS
run {
  set until scn 17463165116; -- Resetlogs SCN -1
  recover database;
  }

--- Check the new incarnation

RMAN> list incarnation;                                              
List of Database Incarnations                                        
DB Key  Inc Key DB Name  DB ID     STATUS  Reset SCN Reset Time      
------- ------- -------- -------- -------- --------- -----------------
1       1       CDBDEV2 670048894 CURRENT     1     28-oct-2020 20:45

RMAN> alter database open resetlogs;


Case 2: Recover from a new incarnation via autobackup

  • I was’nt happy with the first/above recovery because in real life you will never have a controlfile handy during a disaster. So I wiped out all the datafiles of the CDB and decided to recover all from scratch 
  • That’s where RMAN started to ignore image copies which can't even be cataloged (ghosted)

--- We need to recover the last controlfile: From September 21st
Piece Name: \\BACKUP$\.._C-670048894-20210920-01  21-sep-2021 19:38

RMAN> Startup nomount
CDBDEV2_CONTROLFILE_C-670048894-20210920-01

RMAN> Restore controlfile from '\\BACKUP$\CDBDEV2_CONTROLFILE_C-670048894-20210920-01';
Finished restore at 23-sep-2021 17:11:11

ORA-19563:  Recovery ERROR (ghosted backups)

Then a weird error “ORA-19563: datafile copy header validation failed” appeared upon recovery for several backup pieces as RMAN went blind ignoring image copies.

--- Recover database execution
channel ORA_DISK_1: restoring datafile 00053                     
Input datafile copy RECID=110503 STAMP=1083697667
file name=\\BACKUP$\CDB_DATA_D-CDBDEV2_I-670048894_TS-DPWORK_FNO-53_0VVU1DED 
destination for restore of datafile 00053:
D:\ORACLE\CDB\CDBDEV2\ORADATA\PDB\FSDEMO\DPWORK.DBF                                                    
ORA-19563: datafile copy header validation failed for file
\\BACKUP$\CDB_DATA_D-CDBDEV2_I-670048894_TS-DPWORK_FNO-53_0VVU1DED   
…                                                  

-->> Image copies are not visible to RMAN and can't even be cataloged.

RMAN> catalog start with '\\BACKUP$\CDB_DATA_D-CDBDEV2_I-670048894_TS-DPWORK_FNO-53_0VVU1DED';                        
searching for all files that match the pattern ..**FNO-53_0VVU1DED     
No files found to be unknown to the database                                   

Workaround

  • Created a subfolder 21_backups in the backup location and copy all backups 
  • Uncatalogued the old backup location and delete the files
  • Cataloged the new location (subfolder) containing the copy of all backups.  
  • Launched a  recovery of the database (target RPO Sep 21 14:51:59 )

move backup files to \\BACKUP$\21_backups 
-- Catalog old backup location
RMAN> CHANGE BACKUP OF ARCHIVELOG ALL UNCATALOG;
RMAN> CHANGE BACKUP OF DATABASE UNCATALOG;

-
- catalog new backup location
RMAN> CATALOG START WITH '\\BACKUP$\21_backups'
915 files found

=== RECOVERY ====
RMAN> alter database mount;           
RMAN> run                      
   {      set until logseq 5728;   -- last resetlogs seq +1
    restore database ;       
    recover database ;      
    }                       

... applying incremental backups restore
starting media recovery 
media recovery failed  

===> incarnation 1 recovery is complete oracle wants more
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================     

RMAN-03002: failure of recover command at 09/24/2021 17:51:34  
ORA-00283: recovery session canceled due to errors
RMAN-11003: failure during parse/execution of SQL statement:
alter database recover if needed start until cancel using backup controlfile
ORA-00283:
recovery session canceled due to errors
ORA-19912: cannot recover to target incarnation 1                             
RMAN> list incarnation;                                                       
DB Key Inc Key DB Name  DB ID    STATUS  Reset SCN    Reset Time         
------- ------- ------ --------- ------- ------------ -------------------  
1     1     CDBDEV2 670048894 CURRENT  1           28-oct-2020 20:45:50  
3 3     CDBDEV2 670048894 ORPHAN   17463165117 23-sep-2021 17:50:38 2     2     CDBDEV2 670048894 ORPHAN   17463165117 20-sep-2021 14:51:59
–> target RPO = Inc key 2 

  • ORA-19912 just confirms all is recovered up to incarnation #1, 
    Solution
    Now we can just roll forward to Target RPO From incarnation key # 2 

RMAN> reset database to incarnation 2;   

RMAN> recover database ; 
Finished recover at 24-sep-2021 18:18:34
RMAN> alter database open resetlogs;
Statement processed



Conclusion

The Rman lesson of the day can be taken off of a quote from THE WIRE “Omar :
"You come at the King ? you best not miss”
That is,  don’t mess with the Incarnation after a new resetlogs. Reverting to previous Incarnation will just make things worse and new incarnation rhymes with new backup cycle .

Thanks for reading

1 comment:

  1. What are your answers to the incarnation questions listed?

    ReplyDelete