Sunday, May 20, 2012

Datapump - some tips


Data Pump is a utility for unloading/loading data and metadata into a set of operating system files called a dump file set. The dump file set can be imported only by the Data Pump Import utility. The dump file set can be imported on the same system or it can be moved to another system and loaded there.
In this post, let us see some tips and tricks that can done with Datapump. 

Tip #1 : Using PARALLEL parameter
PARALLEL parameter is used to improve the speed of the export. But this will be more effective when you split the dumpfiles with DUMPFILE parameter across the filesystem.
Create 2 or 3 directories in different filesystems and use the commands effectively.

expdp / dumpfile=dir1:test_1.dmp, dir1:test_2.dmp, dir2:test_3.dmp, dir3:test_4.dmp logfile=dir1:test.log full=y parallel=4
where dir1, dir2 and dir3 are directory names created in the database.

Tip #2 : Using FILESIZE parameter
FILESIZE parameter is used to limit the dumpfile size. For eg., if you want to limit your dumpfiles to 5gb, you can issue command as below

expdp / directory=dir1 dumpfile=test1.dmp,test2.dmp,test3.dmp logfile=test.log filesize=5120m
or 
expdp / directory=dir1 dumpfile=test_%U.dmp logfile=test.log filesize=5120m full=y
where %U will assign numbers automatically from 1 to 99. 

Note: If you use %U, dumpfile number 100 can't be created and export fails with "dumpfile exhausted" error.

Update (23 July 2013): If you want to create more than 99 files, you can use this work around. 
expdp / directory=dir1 dumpfile=test_%U.dmp dumpfile=test_1%U.dmp logfile=test.log filesize=5120m full=y

This will create files in a round robin method like below.
test_01.dmp
test_101.dmp
test_02.dmp
test_102.dmp

Tip #3 : Usage of VERSION parameter
VERSION parameter is used while taking export if you want to create a dumpfile which should be imported into a DB which is lower than the source DB. 
For eg., if your source DB is 11g and target DB is 10g, you can't use the dumpfile taken from 11g expdp utility to import into 10g DB. 
This throws the below error.
ORA-39142: incompatible version number 3.1 in dump file "/u02/dpump/test.dmp"
To overcome this we can use the VERSION parameter.
VERSION={COMPATIBLE | LATEST | version_string}
Eg.: expdp / directory=dir1 dumpfile=test_1.dmp logfile=test.log VERSION=10.2.0

Tip #4 : PARALLEL with single DUMPFILE
When you use PARALLEL parameter and use only one dumpfile to unload datas from the DB, you may get the below error.
expdp / directory=dir1 dumpfile=test_1.dmp logfile=test.log parallel=4
ORA-39095: Dump file space has been exhausted: Unable to allocate 8192 bytes 
Job "USER"."TABLE_UNLOAD" stopped due to fatal error at 00:37:29
Now a simple work around is to remove the PARALLEL parameter or add dumpfiles. This will over come the error.

expdp / directory=dir1 dumpfile=test_1.dmp logfile=test.log 
or
expdp / directory=dir1 dumpfile=test_1.dmp,test_2.dmp,test_3.dmp, test_4.dmp logfile=test.log parallel=4
or
expdp / directory=dir1 dumpfile=test_%U.dmp logfile=test.log parallel=4

Tip #5 : Drop dba_datapump_job rows
Sometimes before the export completes or when the export encounters a resumable wait or you would have stopped the export job in between. Now you start the DataPump job that stopped. Then the dump file has been removed from the directory location. You are not able to attach to the job. 
You will get an error like this.

ORA-39000: bad dump file specification
ORA-31640: unable to open dump file "/oracle/product/10.2.0/db_2/rdbms/log/test.dmp" for read
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory

But you will see the row updated in view dba_datapump_jobs

SQL> select * from dba_datapump_jobs;
OWNER JOB_NAME                       OPERATI JOB_M STATE                              DEGREE ATTACHED_SESSIONS DATAPUMP_SESSIONS
----- ------------------------------ ------- ----- ------------------------------ ---------- ----------------- -----------------
SYS   SYS_EXPORT_FULL_01             EXPORT  FULL  NOT RUNNING                             0        ##########                 0
You are not able to remove the row from dba_datapump_jobs as you are not able to attach to the export job with expdp client to kill the job.
In this case you can remove the row by dropping the master table created by the datapump export.

SQL> drop table SYS_EXPORT_FULL_01 purge;
Table dropped.
SQL> select * from dba_datapump_jobs;
no rows selected

Now you can see the row is deleted from the dba_datapump_jobs view.

Tip #6 : FLASHBACK_SCN and FLASHBACK_TIME 
Do not use FLASHBACK_SCN and FLASHBACK_TIME as these parameters slow down the performace of export.

Tip #7 : Effective EXCLUDE
Import of full database should be split as tables first and indexes next. Use the parameter exclude effectively to improve the speed of import.
EXCLUDE = INDEX,STATISTICS 
This will not import the indexes and statistics which in turn only import the tables, hence improving the performance.

Tip #8 : INDEXFILE=<filename> usage
After the import of tables has been completed, you can create the indexes and collect statistics of the tables. To get the indexes creation ddl, you can use the INDEXFILE = <filename> parameter to get all the indexes creation statements which were involved in the import operation.

Example of effective import 
impdp / directory=dir1,dir2,dir3 dumpfile=test_%U.dmp logfile=test.log EXCLUDE=STATISTICS Full=Y INDEXFILE=index_ddl.sql

The above will turn on the legacy mode import of datapump as the  parameter indexfile is present instead of SQLFILE parameter. 
Indexfile parameter is available in imp and sqlfile parameter with impdp. However you can use indexfile parameter in impdp which will turn on legacy mode import which is as below. 
;;; Legacy Mode Active due to the following parameters:
;;; Legacy Mode Parameter: "indexfile=testindex.sql" Location: Command Line, Replaced with: "sqlfile=index_ddl.sql include=index"

Hence to extract only the indexes the statement should be as below.
impdp / directory=dir1,dir2,dir3 dumpfile=test_%U.dmp logfile=test.log EXCLUDE=STATISTICS Full=Y SQLFILE=index_ddl.sql INCLUDE=INDEX

Note: Tip #8 edited as per comment from Eric below.

Tip #9 : Contents of Dump file
If you are not sure about the schemas that were present in the dumpfile or tablespaces present inside the dumpfile, etc., you can easily check the dumpfile for those information using the below command

grep -a "CREATE USER" test_1.dmp
grep -a "CREATE TABLESPACE" test_1.dmp
-a is not a recognised flag in some OS and hence command works without the flag. Mind, the dumpfile created is a binary file.

The above command gives all the CREATE USER statements and CREATE TABLESPACE statements which will be useful in many cases. You can also get the INDEXES and TABLES creation ddl from the dumpfile as well.

Tip #10 : init.ora parameter cursor_sharing
Always set init.ora parameter cursor_sharing to exact which has a good effect on import's performance.

Tip #11 : STATUS parameter usage
You can check the on going datapump export/import operation with the use of STATUS parameter and track the progress by yourself. You can attach to a export/import session and check the status. 

For example:
[oracle@ini8115l3aa2ba-136018207027 ~]$ expdp attach=SYS_EXPORT_FULL_01
Export: Release 11.2.0.1.0 - Production on Mon May 21 10:56:28 2012
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
Username: sys as sysdba
Password:
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Job: SYS_EXPORT_FULL_01
  Owner: SYS
  Operation: EXPORT
  Creator Privs: TRUE
  GUID: C08622D4FB5571E4E04012881BCF4C92
  Start Time: Monday, 21 May, 2012 10:55:55
  Mode: FULL
  Instance: newdb
  Max Parallelism: 1
  EXPORT Job Parameters:
  Parameter Name      Parameter Value:
     CLIENT_COMMAND        sys/******** AS SYSDBA directory=dmpdir full=y
  State: EXECUTING
  Bytes Processed: 0
  Current Parallelism: 1
  Job Error Count: 0
  Dump File: /u02/dpump/expdat.dmp
    bytes written: 4,096
Worker 1 Status:
  Process Name: DW00
  State: EXECUTING
  Object Type: DATABASE_EXPORT/SCHEMA/TABLE/COMMENT
  Completed Objects: 400
  Worker Parallelism: 1
Export> status
Job: SYS_EXPORT_FULL_01
  Operation: EXPORT
  Mode: FULL
  State: COMPLETING
  Bytes Processed: 37,121
  Percent Done: 100
  Current Parallelism: 1
  Job Error Count: 0
  Dump File: /u02/dpump/expdat.dmp
    bytes written: 561,152
Worker 1 Status:
  Process Name: DW00
  State: WORK WAITING

Here you can see the bytes written which will be progressing and you can track the export/import job easily.
Note: The parameter ATTACH when used, it cannot be combined with any other parameter other than the USERID parameter.

$ expdp ATTACH= JOB_NAME

I’ll be updating the post whenever I come across things that can help improving the performance of datapump. J