Sunday, May 20, 2012

Datapump - some tips


Data Pump is a utility for unloading/loading data and metadata into a set of operating system files called a dump file set. The dump file set can be imported only by the Data Pump Import utility. The dump file set can be imported on the same system or it can be moved to another system and loaded there.
In this post, let us see some tips and tricks that can done with Datapump. 

Tip #1 : Using PARALLEL parameter
PARALLEL parameter is used to improve the speed of the export. But this will be more effective when you split the dumpfiles with DUMPFILE parameter across the filesystem.
Create 2 or 3 directories in different filesystems and use the commands effectively.

expdp / dumpfile=dir1:test_1.dmp, dir1:test_2.dmp, dir2:test_3.dmp, dir3:test_4.dmp logfile=dir1:test.log full=y parallel=4
where dir1, dir2 and dir3 are directory names created in the database.

Tip #2 : Using FILESIZE parameter
FILESIZE parameter is used to limit the dumpfile size. For eg., if you want to limit your dumpfiles to 5gb, you can issue command as below

expdp / directory=dir1 dumpfile=test1.dmp,test2.dmp,test3.dmp logfile=test.log filesize=5120m
or 
expdp / directory=dir1 dumpfile=test_%U.dmp logfile=test.log filesize=5120m full=y
where %U will assign numbers automatically from 1 to 99. 

Note: If you use %U, dumpfile number 100 can't be created and export fails with "dumpfile exhausted" error.

Update (23 July 2013): If you want to create more than 99 files, you can use this work around. 
expdp / directory=dir1 dumpfile=test_%U.dmp dumpfile=test_1%U.dmp logfile=test.log filesize=5120m full=y

This will create files in a round robin method like below.
test_01.dmp
test_101.dmp
test_02.dmp
test_102.dmp

Tip #3 : Usage of VERSION parameter
VERSION parameter is used while taking export if you want to create a dumpfile which should be imported into a DB which is lower than the source DB. 
For eg., if your source DB is 11g and target DB is 10g, you can't use the dumpfile taken from 11g expdp utility to import into 10g DB. 
This throws the below error.
ORA-39142: incompatible version number 3.1 in dump file "/u02/dpump/test.dmp"
To overcome this we can use the VERSION parameter.
VERSION={COMPATIBLE | LATEST | version_string}
Eg.: expdp / directory=dir1 dumpfile=test_1.dmp logfile=test.log VERSION=10.2.0

Tip #4 : PARALLEL with single DUMPFILE
When you use PARALLEL parameter and use only one dumpfile to unload datas from the DB, you may get the below error.
expdp / directory=dir1 dumpfile=test_1.dmp logfile=test.log parallel=4
ORA-39095: Dump file space has been exhausted: Unable to allocate 8192 bytes 
Job "USER"."TABLE_UNLOAD" stopped due to fatal error at 00:37:29
Now a simple work around is to remove the PARALLEL parameter or add dumpfiles. This will over come the error.

expdp / directory=dir1 dumpfile=test_1.dmp logfile=test.log 
or
expdp / directory=dir1 dumpfile=test_1.dmp,test_2.dmp,test_3.dmp, test_4.dmp logfile=test.log parallel=4
or
expdp / directory=dir1 dumpfile=test_%U.dmp logfile=test.log parallel=4

Tip #5 : Drop dba_datapump_job rows
Sometimes before the export completes or when the export encounters a resumable wait or you would have stopped the export job in between. Now you start the DataPump job that stopped. Then the dump file has been removed from the directory location. You are not able to attach to the job. 
You will get an error like this.

ORA-39000: bad dump file specification
ORA-31640: unable to open dump file "/oracle/product/10.2.0/db_2/rdbms/log/test.dmp" for read
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory

But you will see the row updated in view dba_datapump_jobs

SQL> select * from dba_datapump_jobs;
OWNER JOB_NAME                       OPERATI JOB_M STATE                              DEGREE ATTACHED_SESSIONS DATAPUMP_SESSIONS
----- ------------------------------ ------- ----- ------------------------------ ---------- ----------------- -----------------
SYS   SYS_EXPORT_FULL_01             EXPORT  FULL  NOT RUNNING                             0        ##########                 0
You are not able to remove the row from dba_datapump_jobs as you are not able to attach to the export job with expdp client to kill the job.
In this case you can remove the row by dropping the master table created by the datapump export.

SQL> drop table SYS_EXPORT_FULL_01 purge;
Table dropped.
SQL> select * from dba_datapump_jobs;
no rows selected

Now you can see the row is deleted from the dba_datapump_jobs view.

Tip #6 : FLASHBACK_SCN and FLASHBACK_TIME 
Do not use FLASHBACK_SCN and FLASHBACK_TIME as these parameters slow down the performace of export.

Tip #7 : Effective EXCLUDE
Import of full database should be split as tables first and indexes next. Use the parameter exclude effectively to improve the speed of import.
EXCLUDE = INDEX,STATISTICS 
This will not import the indexes and statistics which in turn only import the tables, hence improving the performance.

Tip #8 : INDEXFILE=<filename> usage
After the import of tables has been completed, you can create the indexes and collect statistics of the tables. To get the indexes creation ddl, you can use the INDEXFILE = <filename> parameter to get all the indexes creation statements which were involved in the import operation.

Example of effective import 
impdp / directory=dir1,dir2,dir3 dumpfile=test_%U.dmp logfile=test.log EXCLUDE=STATISTICS Full=Y INDEXFILE=index_ddl.sql

The above will turn on the legacy mode import of datapump as the  parameter indexfile is present instead of SQLFILE parameter. 
Indexfile parameter is available in imp and sqlfile parameter with impdp. However you can use indexfile parameter in impdp which will turn on legacy mode import which is as below. 
;;; Legacy Mode Active due to the following parameters:
;;; Legacy Mode Parameter: "indexfile=testindex.sql" Location: Command Line, Replaced with: "sqlfile=index_ddl.sql include=index"

Hence to extract only the indexes the statement should be as below.
impdp / directory=dir1,dir2,dir3 dumpfile=test_%U.dmp logfile=test.log EXCLUDE=STATISTICS Full=Y SQLFILE=index_ddl.sql INCLUDE=INDEX

Note: Tip #8 edited as per comment from Eric below.

Tip #9 : Contents of Dump file
If you are not sure about the schemas that were present in the dumpfile or tablespaces present inside the dumpfile, etc., you can easily check the dumpfile for those information using the below command

grep -a "CREATE USER" test_1.dmp
grep -a "CREATE TABLESPACE" test_1.dmp
-a is not a recognised flag in some OS and hence command works without the flag. Mind, the dumpfile created is a binary file.

The above command gives all the CREATE USER statements and CREATE TABLESPACE statements which will be useful in many cases. You can also get the INDEXES and TABLES creation ddl from the dumpfile as well.

Tip #10 : init.ora parameter cursor_sharing
Always set init.ora parameter cursor_sharing to exact which has a good effect on import's performance.

Tip #11 : STATUS parameter usage
You can check the on going datapump export/import operation with the use of STATUS parameter and track the progress by yourself. You can attach to a export/import session and check the status. 

For example:
[oracle@ini8115l3aa2ba-136018207027 ~]$ expdp attach=SYS_EXPORT_FULL_01
Export: Release 11.2.0.1.0 - Production on Mon May 21 10:56:28 2012
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
Username: sys as sysdba
Password:
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Job: SYS_EXPORT_FULL_01
  Owner: SYS
  Operation: EXPORT
  Creator Privs: TRUE
  GUID: C08622D4FB5571E4E04012881BCF4C92
  Start Time: Monday, 21 May, 2012 10:55:55
  Mode: FULL
  Instance: newdb
  Max Parallelism: 1
  EXPORT Job Parameters:
  Parameter Name      Parameter Value:
     CLIENT_COMMAND        sys/******** AS SYSDBA directory=dmpdir full=y
  State: EXECUTING
  Bytes Processed: 0
  Current Parallelism: 1
  Job Error Count: 0
  Dump File: /u02/dpump/expdat.dmp
    bytes written: 4,096
Worker 1 Status:
  Process Name: DW00
  State: EXECUTING
  Object Type: DATABASE_EXPORT/SCHEMA/TABLE/COMMENT
  Completed Objects: 400
  Worker Parallelism: 1
Export> status
Job: SYS_EXPORT_FULL_01
  Operation: EXPORT
  Mode: FULL
  State: COMPLETING
  Bytes Processed: 37,121
  Percent Done: 100
  Current Parallelism: 1
  Job Error Count: 0
  Dump File: /u02/dpump/expdat.dmp
    bytes written: 561,152
Worker 1 Status:
  Process Name: DW00
  State: WORK WAITING

Here you can see the bytes written which will be progressing and you can track the export/import job easily.
Note: The parameter ATTACH when used, it cannot be combined with any other parameter other than the USERID parameter.

$ expdp ATTACH= JOB_NAME

I’ll be updating the post whenever I come across things that can help improving the performance of datapump. J

18 comments:

  1. Nice article, but I would say, that dump file can be imported/exported not only by DataPump utility, but also with the help of DBMS_DATAPUMP package.

    ReplyDelete
  2. Thanks Mamukutti for sharing these tips ... for tip#8 should that be INCLUDE instead of EXCLUDE=.

    --Vinay

    ReplyDelete
  3. @Nikolayivankin Thank you for the comment. :)

    ReplyDelete
  4. @Vinay It should be EXCLUDE. If you want to speed up the import, you should not import indexes and statistics as they take significant amount of time. So you have to exclude from the dump while loading the datas.

    ReplyDelete
  5. Really very good points. I really appreciate your post. Thanks

    ReplyDelete
  6. Thank u mamu kutty, its really helpfull... Expecting some more articles from u...

    ReplyDelete
    Replies
    1. Hi Nandhu, you are welcome! Sure, I'll add articles as and when I get into interesting works.

      Delete
  7. A very good place you have given for Datapump users. Nice article

    ReplyDelete
  8. hi
    very nice tips mamu kutti.it is very very useful for dba.good helpful for dba presentation

    ReplyDelete
  9. Thanks MK for sharing tip#11 to monitor the percentage of datapump jobs, is there any query to monitor the percentage from db side?

    ReplyDelete
  10. Hi Shanmugam,

    Apologies for delayed reply.
    You can check with the below query.

    alter session set nls_date_format='dd/mm/yy hh24:mi:ss'
    select sid, USERNAME, OPNAME, start_time, totalwork, sofar, (sofar/totalwork) * 100 pct_done from v$session_longops where totalwork > sofar and opname like '%EXPORT%';

    Thank you!!

    ReplyDelete
  11. Hi ,
    Just clarify about the indexfile in impdp - Is it available imp or impdp
    What is the meaning of SQLFILE in impdp?
    Your example seems to be confusing and misleading

    Example of effective import
    impdp / directory=dir1,dir2,dir3 dumpfile=test_%U.dmp logfile=test.log EXCLUDE=INDEX,STATISTICS Full=Y INDEXFILE=index_ddl.sql

    Again why you need to use session long_ops instead of attach= and query the status through impdp utility itself

    Manoharan
    manoharansk@yahoo.com

    ReplyDelete
    Replies
    1. Hi,

      Indexfile parameter is available in imp and sqlfile parameter with impdp. However you can use indexfile parameter in impdp which will turn on legacy mode import which is as below.
      ;;; Legacy Mode Active due to the following parameters:
      ;;; Legacy Mode Parameter: "indexfile=index_ddl.sql" Location: Command Line, Replaced with: "sqlfile=index_ddl.sql include=index"

      Sqlfile parameter is used to extract all the create statements that the dump file consists. Remember, import first create objects using create statements unless present already and then inserts records into the objects. So if you would like to extract all the create statements that the dump consists you can utilize "sqlfile" parameter for the same. I have made the correction to my post regarding the same.

      I have given the statement (using v$session_longops) to check the status of export from within DB as requested by Shanmugam in the above comment. You can very well check the status from within datapump utility by attaching a session to the datapump session.

      Thank you for your comment!

      Delete
  12. Hello sir pls clear me this doubts
    why we use directory and where the dumpfile will be stored ?
    where the log file will be stored?
    suppose if we export a table from one database to another database where the dumpfile will be stored?
    should we name the dumpfile to export a table?
    and also pls tell me whats the use of remap?
    pls sir iam going to attend an interview surely they will ask this export import pls help out
    thanks in advance.

    ReplyDelete
    Replies
    1. Datapump is a server-based technology unlike conventional export/import which requires a directory object specifying a physical directory structure on the database server.
      We should mention DIRECTORY parameter to say datapump utility to unload data to the specified directory object path. The name of the dump will be given with DUMPFILE parameter.
      If you don't specify the above, the default values will be taken for all the mandatory parameters.
      DIRECTORY will be defaulted to DATA_PUMP_DIR which defaults to physical location of $ORACLE_HOME/rdbms/log and the DUMPFILE parameter will defaults to expdat.dmp. By default, DATA_PUMP_DIR is available only to privileged users.
      Remap is used for remapping of objects, for eg. from USER1 to USER2. It is similar to the FROMUSER and TOUSER parameter in conventional export/import.

      Thanks!

      Delete

I don't know what you think about my creation, but every words of you help me grow better and stronger!!