Thursday, June 12, 2014

SETTING NLS_ENV FOR DBMS_JOBS

Today, 10 days after migrating the database from Oracle 9i (9.2.0.7) to Oracle 11gR2 (11.2.0.4), we experienced a serious issue (when one of the colleague found  the issue when analysing the data) that the current production data were missing from the database. Immediately we got into business to find out the cause of the data loss and to fix it.

Steps took to find the  root cause.


  • Does someone login to the database and delete the records accidentally? Do we have audit enabled?
  • Does the data were properly loaded in the database during migration as we used normal export/import as our migration tool due to the size of data involved?
  • Consider usage of log miner to find the statements that were deleting the data.
  • Is there any job scheduled via cron or DBMS_JOBS that deletes data?
We don't have auditing enabled in the database, so it becomes difficult to check the login details. We went through all the export and import logs to verify whether all the required objects and their data were loaded correctly and yes they were perfect. We also checked for cron entries and jobs scheduled in the database. Before we get into digging the logminer, we found that there was a job scheduled to delete the old data which were not required and older than 38 days.

Now the question is if the job deletes the data which were set a flag as not required and older than 38 days, do we have to really care for the missing data? Is the data being deleted to be restored again? Yes, the data being deleted is not 38 days older and they are not set a flag as not required. Data older than 2 days which were important data has been deleted.

Again, then what is causing the issue here? We were confused on what is going in the database.
As the database is migrated just 10 days back, we didn't shut down the legacy database completely and it was kept open for reference purpose. So I took a look at the job that is calling a procedure to delete data older than 38 days and compared it with the legacy database. Everything looks fine. They were calling the same procedure in both the environment and the procedures were identical in every respect.
Taking a deep look in the jobs with querying the dba_jobs, I got the clue.

SQL> select job, LOG_USER,PRIV_USER,SCHEMA_USER,LAST_DATE,LAST_SEC,FAILURES,WHAT,NLS_ENV from  dba_jobs;
       JOB LOG_USER        PRIV_USER       SCHEMA_USER                    LAST_DATE LAST_SEC   FAILURES WHAT
---------- --------------- --------------- ------------------------------ --------- -------- ---------- ----------------------------------------
NLS_ENV
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
         3 USER1    USER1    USER1                   12-JUN-14 04:00:00          0 Proc_Cleandata_Close38days( );
NLS_LANGUAGE='AMERICAN' NLS_TERRITORY='AMERICA' NLS_CURRENCY='$' NLS_ISO_CURRENCY='AMERICA' NLS_NUMERIC_CHARACTERS='.,' NLS_DATE_FORMAT='DD-MON-RR' NLS_DATE_LANGUAGE='AMERICAN' NLS_SORT='BINARY'

The same job when queried from the legacy database gave the following output.

SQL> select job, LOG_USER,PRIV_USER,SCHEMA_USER,LAST_DATE,LAST_SEC,FAILURES,WHAT,NLS_ENV from  dba_jobs;
       JOB LOG_USER        PRIV_USER       SCHEMA_USER                    LAST_DATE LAST_SEC   FAILURES WHAT
---------- --------------- --------------- ------------------------------ --------- -------- ---------- ----------------------------------------
NLS_ENV
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
         3 USER1    USER1    USER1                   12-JUN-14 04:00:00          0 Proc_Cleandata_Close38days( );
NLS_LANGUAGE='ENGLISH' NLS_TERRITORY='CANADA' NLS_CURRENCY='$' NLS_ISO_CURRENCY='CANADA' NLS_NUMERIC_CHARACTERS='.,' NLS_DATE_FORMAT='RR-MM-DD' NLS_DATE_LANGUAGE='ENGLISH' NLS_SORT='BINARY'

Here, everything seems similar but the NLS_ENV has a variation and the main clue I got here is the NLS_DATE_FORMAT that the job contains.

The procedure selects the data to delete with the following query involving date conversion.

SELECT * FROM TABLE1 WHERE COLUMN1 = 4 AND TO_DATE(LASTUPDATETIME, 'YYYY-MM-DD') <=  TO_DATE(SYSDATE-38, 'YYYY-MM-DD') ORDER BY LASTUPDATETIME, REPORTID;

With the default NLS_DATE_FORMAT of the database,

SQL> select TO_DATE(SYSDATE-38, 'YYYY-MM-DD') from dual;
TO_DATE(S
---------
14-MAY-05

With the  NLS_DATE_FORMAT set to 'RR-MM-DD', 
SQL> select TO_DATE(SYSDATE-38, 'YYYY-MM-DD') from dual;
TO_DATE(
--------
14-05-05

This is where the issue lies and this is the reason the database picks up the wrong data to be deleted.
Now how does the NLS_ENV set to a different setting other then the source? We used different character set for source and target for the migration (WE8ISO8859P1 to WE8MSWIN1252) but when exporting and importing we have set the correct character set in the client session. So how does the NLS_ENV differs again?
After coming across the Oracle Support note "The Priority of NLS Parameters Explained (Where To Define NLS Parameters) (Doc ID 241047.1)", it is mentioned as below.
"DBMS_SCHEDULER or DBMS_JOB store the SESSION values of the SUBMITTING session for each job. This is visible in the NLS_ENV column of DBA_SCHEDULER_JOBS or DBA_JOBS"
So the NLS_ENV values are dependent on the session submitting the job. While importing data, the session was set to use NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252 and hence the default values of the mentioned NLS_LANG were set to the job.

How do we change the NLS_ENV of the job?
Now I have dropped the job and started a new session to set the desired NLS_ENV.

$ export ENGLISH_CANADA.WE8MSWIN1252

Now when login to database and check the NLS parameters, this looked like this.

PARAMETER                                                        VALUE
---------------------------------------------------------------- ----------------------------------------------------------------
NLS_LANGUAGE                                                     ENGLISH
NLS_TERRITORY                                                    CANADA
NLS_CURRENCY                                                     $
NLS_ISO_CURRENCY                                                 CANADA
NLS_DATE_FORMAT                                                  RR-MM-DD
NLS_DATE_LANGUAGE                                                ENGLISH
NLS_CHARACTERSET                                                 WE8MSWIN1252
NLS_SORT                                                         BINARY

Now I have submit the job using the above settings and check the NLS_ENV of the job which looked like below.

JOB LOG_USER        PRIV_USER       SCHEMA_USER                    LAST_DATE LAST_SEC   FAILURES WHAT
---------- --------------- --------------- ------------------------------ --------- -------- ---------- ----------------------------------------
NLS_ENV
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
         2 USER1       USER1       USER1                      12-JUN-14 00:00:01          0 Proc_Cleandata_Close38days( );
NLS_LANGUAGE='ENGLISH' NLS_TERRITORY='CANADA' NLS_CURRENCY='$' NLS_ISO_CURRENCY='CANADA' NLS_NUMERIC_CHARACTERS='.,' NLS_DATE_FORMAT='RR-MM-DD' NLS_DATE_LANGUAGE='ENGLISH' NLS_SORT='BINARY'

Now the procedure picks the right data and the issue is resolved.
The blog post is to explain how to set the NLS_ENV and hence the solution is provided as above. 
There is also another solution to this issue.
The select query which picks the data to delete is using date conversion.

So using  
SELECT * FROM TABLE1 WHERE COLUMN1 = 4 AND LASTUPDATETIME <=  SYSDATE-38 ORDER BY LASTUPDATETIME, REPORTID;
instead of
SELECT * FROM TABLE1 WHERE COLUMN1 = 4 AND TO_DATE(LASTUPDATETIME, 'YYYY-MM-DD') <=  TO_DATE(SYSDATE-38, 'YYYY-MM-DD') ORDER BY LASTUPDATETIME, REPORTID;
will resolve the issue as well. This is an application provided query and business will not accept to do the desired change in the query level. :)

Happy Troubleshooting!!

Monday, October 21, 2013

RMAN errors out with ORA-01008: not all variables bound

One of our DB has a problem with RMAN backups with the error as below.

DBGSQL:     TARGET> select count(*) into :dbstate from v$parameter where lower(name) = '_dummy_instance' and upper(value) = 'TRUE'
DBGSQL:        sqlcode = 1008
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00554: initialization of internal recovery manager package failed
RMAN-06003: ORACLE error from target database:
ORA-01008: not all variables bound

This is frustrating when doing any task with RMAN, even you can't log in to RMAN prompt as it says initialization of internal recovery manager package failed. The database version here is 11.2.0.2.0. Oracle support says this is related to bug 9877980 which happens when cursor sharing is set to force or similar.

We have cursor_sharing set to force. So obviously we have hit a bug.
The possible workaround is to just flush the shared pool. Yes, just flush the shared pool.

SQL> alter system flush shared_pool;
System altered.

Now you will be able to connect to the database through RMAN without any issues.

The detailed notes can be obtained from Oracle Support Doc ID 1472116.1.
By the way, the issue is fixed in 11.2.0.3 and above.

Happy troubleshooting...