Monday, October 21, 2013

RMAN errors out with ORA-01008: not all variables bound

One of our DB has a problem with RMAN backups with the error as below.

DBGSQL:     TARGET> select count(*) into :dbstate from v$parameter where lower(name) = '_dummy_instance' and upper(value) = 'TRUE'
DBGSQL:        sqlcode = 1008
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00554: initialization of internal recovery manager package failed
RMAN-06003: ORACLE error from target database:
ORA-01008: not all variables bound

This is frustrating when doing any task with RMAN, even you can't log in to RMAN prompt as it says initialization of internal recovery manager package failed. The database version here is 11.2.0.2.0. Oracle support says this is related to bug 9877980 which happens when cursor sharing is set to force or similar.

We have cursor_sharing set to force. So obviously we have hit a bug.
The possible workaround is to just flush the shared pool. Yes, just flush the shared pool.

SQL> alter system flush shared_pool;
System altered.

Now you will be able to connect to the database through RMAN without any issues.

The detailed notes can be obtained from Oracle Support Doc ID 1472116.1.
By the way, the issue is fixed in 11.2.0.3 and above.

Happy troubleshooting...

Wednesday, July 3, 2013

IO ERROR: THE NETWORK ADAPTOR COULD NOT ESTABLISH THE CONNECTION

Today we handed over a newly created database on a newly built server to application team for their testing and operations. We completed all the procedures before handing over the database to the apps team which includes registering of database in the listener. Application team started to log in to the system using SQL Developer to get the following error.



Application team members called us to help them resolve the issue. So at the first glance I couldn't guess anything regarding the issue as it states IO Error. So I took the case and tried to figure out what is wrong with this. 

I tried to ping the server from my system remotely to which it was successful.


Next step is to edit the tnsnames.ora file to include the tns entry for the connection and try tnsping. Now I tried tnsping from my system after editing the tnsnames.ora file but to get an error.


This error says destination host unreachable.
Checking the host server and the listener status.

We just pinged the server and also listener is running fine on the server and also we are able to connect to the database using connect string from the same server as below.


A general search on the Oracle support docs gave the exact problem, cause and the solution.
MOS note id “TNSPING fails with TNS-12543: TNS:destination host unreachable [ID 1556918.1]” suggested the cause as
This might be a network issue with either the underlying transport not being able to contact the host, or a firewall is blocking this client or connection.
Make sure the network is functioning correctly.

So the fix would be to contact the OS/Network admin. Hence I involved the OS admin to fix the networking issue after explaining what’s happening with the issue.

OS team then confirmed that iptables process is running on the server which is a firewall package in Linux. So once after stopping the iptables process, we are now able to connect to the database from remote servers.

The problem is resolved. :-) Happy troubleshooting!!

Thursday, June 20, 2013

RESIZING ONLINE REDOLOG FILES

The following demonstration shows how to resize the online redo log files in an oracle database.
The query below gives the size of all the online redo log files which is 250MB.

SQL> SELECT a.group#, a.member, b.bytes FROM v$logfile a, v$log b WHERE a.group# = b.group#;

    GROUP# MEMBER                                                  BYTES
---------- -------------------------------------------------- ----------
         1 /fs02/oradata/demo/demo_redo1a.log            262144000
         1 /fs03/oradata/demo/demo_redo1b.log            262144000
         1 /fs04/oradata/demo/demo_redo1c.log            262144000
         2 /fs02/oradata/demo/demo_redo2a.log            262144000
         2 /fs03/oradata/demo/demo_redo2b.log            262144000
         2 /fs04/oradata/demo/demo_redo2c.log            262144000
         3 /fs02/oradata/demo/demo_redo3a.log            262144000
         3 /fs03/oradata/demo/demo_redo3b.log            262144000
         3 /fs04/oradata/demo/demo_redo3c.log            262144000

9 rows selected.

Check the status of the log group.

SQL> select group#, status from v$log;

    GROUP# STATUS
---------- ----------------
         1 CURRENT
         2 INACTIVE
         3 INACTIVE

Now force a log switch until the last redo log is marked "CURRENT" by issuing the following command:

SQL> alter system switch logfile;

SQL> alter system switch logfile;

SQL> select group#, status from v$log;

    GROUP# STATUS
---------- ----------------
         1 INACTIVE
         2 INACTIVE
         3 CURRENT


Drop the first logfile group and recreate the same with the desired size, in this case 512MB

SQL> alter database drop logfile group 1;

Database altered.

or

SQL> alter database drop logfile group 1;
ALTER DATABASE DROP LOGFILE GROUP 1
*
ERROR at line 1:
ORA-01624: log 1 needed for crash recovery of instance ORA920 (thread 1)
ORA-00312: online log 1 thread 1: '<file_name>'

Nothing to panic here and this is an easy problem to resolve. Simply perform a checkpoint on the database and try to drop again.

SQL> alter system checkpoint global;

System altered.

SQL> alter database drop logfile group 1;

Database altered.
---------------------------------------------------------------------------------------------------

Re-create the dropped redo log group with different size

alter database add logfile group 1 (
   '/fs02/oradata/demo/demo_redo1a.log', 
   '/fs03/oradata/demo/demo_redo1b.log',
   '/fs04/oradata/demo/demo_redo1c.log') size 500m reuse;

After re-creating the online redo log group, force a log switch. The online redo log group just created should become the "CURRENT" one.

SQL> select group#, status from v$log;

    GROUP# STATUS
---------- ----------------
         1 UNUSED
         2 INACTIVE
         3 CURRENT

SQL> alter system switch logfile;

SQL> select group#, status from v$log;

    GROUP# STATUS
---------- ----------------
         1 CURRENT
         2 INACTIVE
         3 ACTIVE


Do the same for all the logfiles.

alter database add logfile group 2 (
   '/fs02/oradata/demo/demo_redo2a.log', 
   '/fs03/oradata/demo/demo_redo2b.log',
   '/fs04/oradata/demo/demo_redo2c.log') size 500m reuse;


alter database add logfile group 3 (
   '/fs02/oradata/demo/demo_redo3a.log', 
   '/fs03/oradata/demo/demo_redo3b.log',
   '/fs04/oradata/demo/demo_redo3c.log') size 500m reuse;

Now, if you check the sizes of the logfiles you can see the size increased.

SQL> SELECT a.group#, a.member, b.bytes FROM v$logfile a, v$log b WHERE a.group# = b.group#;

    GROUP# MEMBER                                                  BYTES
---------- -------------------------------------------------- ----------
         1 /fs02/oradata/demo/demo_redo1a.log            524288000
         1 /fs03/oradata/demo/demo_redo1b.log            524288000
         1 /fs04/oradata/demo/demo_redo1c.log            524288000
         2 /fs02/oradata/demo/demo_redo2a.log            524288000
         2 /fs03/oradata/demo/demo_redo2b.log            524288000
         2 /fs04/oradata/demo/demo_redo2c.log            524288000
         3 /fs02/oradata/demo/demo_redo3a.log            524288000
         3 /fs03/oradata/demo/demo_redo3b.log            524288000
         3 /fs04/oradata/demo/demo_redo3c.log            524288000


9 rows selected.

Mission accomplished. :-)

Wednesday, January 16, 2013

APPLIED column on v$archived_log when Standby database is setup


We use a simple concept to check for any standby gap or logs apply problems in all the high availability (Dataguard) servers in our environment. Use the below query in a script and make it run from cron periodically so we don’t have to go and check manually all the time.

spool /home/oracle/dba_scripts/logs/standby_db_sync_err.txt

select PRIM-SEC from
(select MAX(SEQUENCE#) PRIM FROM V$ARCHIVED_LOG WHERE DEST_ID=1 AND ARCHIVED='YES'),
(select MAX(SEQUENCE#) SEC FROM V$ARCHIVED_LOG WHERE DEST_ID=2 AND APPLIED='YES');

spool off

Today I received log applied gaps error alert mail stating we have a log applied gap of 24 logs. My script runs every hour and sends the alert mail only when the log applied gap crosses 10 logs since I have delay parameter set in the database.
Note: I used the term “log applied gaps” as I’m meaning that the term is used to differentiate the logs applied on primary with logs applied on standby database.

I first thought that there might be some activity going on in the database and the gap will catch up soon as this happens most of the time, but now the gaps increased to 35, the next hour. So something is wrong , and I should check it.

I used the following query on the standby database which gave the below.

SQL> select sequence#, process, status from v$managed_standby;

 SEQUENCE# PROCESS   STATUS
---------- --------- ------------
         0 ARCH      CONNECTED
         0 ARCH      CONNECTED
         0 ARCH      CONNECTED
         0 ARCH      CONNECTED
         0 ARCH      CONNECTED
     65999 MRP0      WAIT_FOR_LOG
     66007 RFS       IDLE
     66008 RFS       IDLE
     66009 RFS       IDLE

This is obvious, RFS gets the logs as and when generated at primary and MRP waits until the “delay” time reaches before it applies. The gap is not much but only 10 logs. Then I thought why the script gives wrong information?
I ran the script manually to check and still receive the same alert mail of log applied gap 35. So I ran the query separately in primary database to check/troubleshoot what is going on. I got the below response from the primary database.

SQL> select MAX(SEQUENCE#) PRIM FROM V$ARCHIVED_LOG WHERE DEST_ID=1 AND ARCHIVED='YES';

      PRIM
----------
     66010

SQL> select MAX(SEQUENCE#) SEC FROM V$ARCHIVED_LOG WHERE DEST_ID=2 AND APPLIED='YES';

       SEC
----------
     65975

Ok, I just wait for some time and check the same again but to see the same result. (Many a times, in standby environment I get the results delayed. J )

Now, it’s time to dig Oracle support documents to find what is really going on. I thought it might be some sort of bug as I haven’t faced such an issue before. As my prediction it turns to be a bug.
The MOS article “APPLIED - Column not updated if Heartbeat-ARCH hangs [ID 1369630.1]” states the issue we are dealing with is due to the bug “Bug 6113783 - Arch processes can hang indefinitely on network [ID 6113783.8]”.

The bug is fixed in 11.2.0.1 base release and I’m currently running my database in 10.2.0.4.
There are 2 options provided.
  1. Upgrade DB to 11.2.0.1 or higher
  2. Use the work around of killing the hung ARCH process in the primary database.


Option 2 looked feasible at this time as upgrading of the DB requires a lot of approvals, application compatibility, etc., etc.,.

So how to find the hung ARCH process?

MOS says Look for Messages like
"ARCn: Becoming the heartbeat ARCH"
in the ALERT.LOG of the Primary Database to determine the current Heartbeat ARCn-Process.

So I searched my alert log file to find the below.

Fri Aug 17 05:04:08 2012
ARC2: Becoming the heartbeat ARCH
ARC3 started with pid=103, OS id=28106
Fri Aug 17 05:04:08 2012

Check for the OS process id for ARC2
[oracle@slv00112 bdump]$ ps -ef|grep ora_arc
oracle   32224     1  0  2012 ?        02:43:53 ora_arc0_edisipd
oracle   32226     1  0  2012 ?        02:21:49 ora_arc1_edisipd
oracle   32232     1  0  2012 ?        02:41:07 ora_arc2_edisipd
oracle   32234     1  0  2012 ?        02:26:41 ora_arc3_edisipd
oracle   23429 16642  0 04:51 pts/1    00:00:00 grep ora_arc

Kill the Hung ARCH process

[oracle@slv00112 bdump]$ kill -9 32232

Now check the alert log file again to find the below.

ARCH: Detected ARCH process failure
ARCH: STARTING ARCH PROCESSES
ARC2: Archival started
ARCH: STARTING ARCH PROCESSES COMPLETE
ARC2 started with pid=67, OS id=23425
ARC2: Becoming the heartbeat ARCH
Wed Jan 16 04:53:47 2013

Ok, now the new process has been generated. Hence check with the primary database whether the APPLIED column is updated.

SQL> select MAX(SEQUENCE#) SEC FROM V$ARCHIVED_LOG WHERE DEST_ID=2 AND APPLIED='YES';

       SEC
----------
     66003

Now it has been updated successfully. J