Thursday, September 1, 2011

Recover archive gaps in standby database from primary using RMAN incremental backups

A Physical Standby database synchs with Primary by continuous apply of archive logs from a Primary Database. In case of an archive log gone missing or corrupt, We have to bring back the standby to sync with the primary.

When the logs missing or corrupt is less in number (say below 15), we can ship the logs which were missing in the standby site from the primary site (scp/sftp/ftp) and then we can register the log file in the standby so that the gap can be resolved.

Find the archives which are missing by issueing the following command.
SQL> select * from v$archive_gap;

This would give the gap sequences. Or you can use the v$managed_standby view to find where the log apply stuck.
SQL> select sequence#,process,status from v$managed_standby;

Here u can see status as wait for log for say sequence# 100 but your primary would've proceeded to sequence# 110
At primary
SQL> select max(sequence#) from v$archived_log;      ---> This would show you 110

Copy the logs to the standby site from the primary site

$ scp log_file_name_n.arc oracle@standby:/log/file/location/log_file_name_n.arc

At standby site
SQL> alter database register logfile '/log/file/location/log_file_name_n.arc';
logfile registered

Do the log file registration at the standby site until all the missing log files are registered. Now apply would take place and your standby will become sync with the primary.

This is easy process if you have missing or corrupt logs in lesser number. But when the difference is huge (say around 500 logs) this method is very time consuming and not a proper approach. Else you have to rebuild the standby database from scratch.
As an enhancement from 10g, an incremental backup created with BACKUP INCREMENTAL... FROM SCN can be used to refresh the standby database with changes at the primary database since the last SCN at Standby and then managed recovery can resume i.e. Compensate for the missing archive logs.

Let us see the steps involved.

Step 1: On the primary:
SQL> select current_scn from v$database;
CURRENT_SCN
-----------
144710998

On the standby:
SQL> select current_scn from v$database;
CURRENT_SCN
-----------
130158742

Clearly there is a difference. But this by itself does not indicate a problem; since the standby is expected to lag behind the primary (this is an asynchronous non-real time apply setup). The real question is how much it is lagging in the terms of wall clock.
To know that use the scn_to_timestamp function to translate the SCN to a timestamp:

SQL> select scn_to_timestamp(144710998) from dual;
SCN_TO_TIMESTAMP(1447102)
-------------------------------
18-AUG-11 08.54.28.000000000 AM

Run the same query to know the timestamp associated with the SCN of the standby database as well
SQL> select scn_to_timestamp(130158742) from dual;
SCN_TO_TIMESTAMP(1301571)
-------------------------------
13-AUG-11 07.19.27.000000000 PM

Note: Run it on the primary database, since it will fail in the standby in a mounted mode
This shows that the standby is four and half days lagging!

Step 2: [Standby] Stop the managed standby apply process:
SQL> alter database recover managed standby database cancel;
Database altered.

Step 3: [Standby] Shutdown the standby database
SQL> shut immediate

Step 4: [Primary] On the primary, take an incremental backup from the SCN number where the standby has been stuck:
RMAN> run {
2> allocate channel c1 type disk format '/u01/backup/%U.bkp';
3> backup incremental from scn 130158740 database;
4> }

Step 5: [Primary] On the primary, create a new standby controlfile:
SQL> alter database create standby controlfile as '/u01/backup/for_standby.ctl';
Database altered.

or

RMAN> backup current controlfile for standby format '/backups/tempfol/_%U';

Step 6: [Primary] Copy these files to standby host:
oracle@dba1 /u01/backup]$ scp * oracle@dba2:/u01/backup

Step 7: [Standby] Bring up the instance in nomount mode:
SQL> startup nomount

Step 8: [Standby] Check the location of the controlfile:
SQL> show parameter control_files
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
control_files string /u01/oradata/standby_cntfile.ctl

Step 9: [Standby] Replace the controlfile with the one you just created in primary.
 $ cp /u01/backup/for_standby.ctl /u01/oradata/standby_cntfile.ctl

or

RMAN> RESTORE STANDBY CONTROLFILE FROM '/u01/backup/_v8o069ee_1_1';

Step 10: [Standby] Mount the standby database:
SQL> alter database mount standby database;

Step 11: [Standby] Connect to RMAN. RMAN does not know about these files yet; so you must let it know – by a process called cataloging. Catalog these files:
$ rman target=/
RMAN> catalog start with '/u01/backup';

Step 12: Recover these files:
RMAN> recover database;

Step 13: After some time, the recovery fails with the message:

archive log filename=/u01/oradata/1_18108_697108460.dbf thread=1 sequence=18109
ORA-00310: archived log contains sequence 18108; sequence 18109 required

This happens because we have come to the last of the archived logs. The expected archived log with sequence# 18108 has not been generated yet.

Step 14: At this point exit RMAN and start managed recovery process:
SQL> alter database recover managed standby database disconnect from session;
Database altered.

Step 15: Check the SCN’s in primary and standby:
[Standby] SQL> select current_scn from v$database;
CURRENT_SCN
-----------
144747125
[Primary] SQL> select current_scn from v$database;
CURRENT_SCN
-----------
144747111

Now they are very close to each other. The standby has now caught up.

3 comments:

  1. Wow - if this works (haven't had the chance to try yet), you have saved me many hours of work! Thanks for taking the time to hellp explain to those of us less talented!

    ReplyDelete

I don't know what you think about my creation, but every words of you help me grow better and stronger!!