Oracle Observations

April 26, 2007

Installing and supporting Oracle RMAN and Tivoli DPO

Filed under: rman,TDPO — bigdaveroberts @ 3:12 pm

Another document that I have found regarding one of the projects I was due to do, was an RMAN/Tivoli backup.

Note that before the completion of the project, I was moved onto another project so there is the possibility that this checklist contains untested assertions.

It is also certainly somewhat out of date!

This document should be read in conjunction with the standard documentation and the documents listed at the end of this document. It is based largely on information gathered from Metalink forum articles, Oracle usenet groups and the adsm.org archives (search.adsm.org).

1. Installation

2. Configuring Oracle

Verify that the DB_DOMAIN parameter is set in the init<Database SID>.ora configuration file.

Verify that there is an oracle user on the system named oracle!

3. Testing without Tivoli

It is possible to perform some testing even before Tivoli DPO has been linked to RMAN. This is achieved through the stub interface library (typically libdsbtsh8.a) which implements a ‘tape backup’ to disk.

The BACKUP_DIR environment variable is used by the stub interface library to determine the disk location for the simulation of a tape backup.

E.g.:

# export BACKUP_DIR=/tmp

# sbttest command

This method should not be used for live backups to disk, as it is incredibly inefficient, its purpose is solely for testing purposes.

4. Version checking

Check the version of the library installed on the system:

/usr/tivoli/tsm/client/oracle/bin what libobk.a

libobk.a:

(C) Copyright IBM Corporation 1990, 2000

build date: Mon Mar 26 15:54:21 PST 2001

Tivoli Data Protection for Oracle: version 2.2.0.0

The version of the library should be 2.1.7D or later!

5. Relink

After the relink various platform specific methods exist for verifying the executables linkage to the library.

HP/UX:

chatr $ORACLE_HOME/bin/oracle

dynamic /ora01/app/oracle/product/8.1.6/lib//libobk.sl

AIX:

dump -X32 -Hv oracle | pg

…..

8                                    libobk.a            shr.o

Solaris:

ldd ?

6. Configuration

Creating a password file

Creating the password file is version specific, and changed with TDPO v2.2. The instructions in the Current IBM red book are for version v2.1. Check that you are using the correct utility.

PASSWORD_ACCESS parameter in the DSM.SYS file and make sure it is NOT set to “generated”. The parameter PASSWORD_ACCESS needs to be set to “permanent”

Run the tdpoconf utility to generate a password file

Tdpoconf password -TDPO_OPTFILE=<>

The file will be of the form TDPO.<node name> where the node name is defined in the tdpo.opt file.

Change the owner:group of the file to be oracle:dba

7. Testing

The backup process can be tested non destructively by performing the backup validate command.

The restore process can be tested non-destructively by performing a restore validate command on a valid backup.

It is possible for the backup to fail silently if the setsize is too large, it is recommended that the setsize be explicitly set and that a restore validate should be performed to verify the validity of the backup. If the setsize is too large, the restore validate will return the following errors:

RMAN-10035: exception raised in RPC: ORA-19660: some files in the backup set could not be verified

ORA-19661: datafile 3 could not be verified

RMAN-10031: ORA-19660 occurred during call to DBMS_BACKUP_RESTORE.RESTOREBACKUPPIECE

E.g.:

Backup database setsize = 2024800

8. Tracing and debugging

AIX

A list of system errors can be generated with the following command:

errpt -a

sbttest

The sbttest (and sbttest.exe) utility can be used to check the configuration of the TDPO library on most platforms.

Interpretation of error codes

Additional information 7xxx – indicates operating system probs.

Error numbers are defined in skgfqsbt.h and 149068.1 and are also listed towards the end of this document.

Additional information/return codes 2xxx – explanations can be found in: dsmrc.h

Likely error messages and their interpretation

ORA-19506 and ORA-27006

The media management software is linked correctly, but is not operating properly.

ORA-19511 and a 4110 media manager return code

The media management layer is not correctly linked.

BACKUP_DIR environment variable is not set

The oracle executable is still linked to the dummy test library libdsbtsh8.a or libdsbtsh8.so.

Log files to examine

RMAN log

sbtio.log file (located in user_dump_dest)

tdpo.out

tsmapi.out

DSIERROR.LOG (API error log file for TPDO)

See also tracefile parameter in dsm.opt

Generating a trace file

A parameter can be used to force the generation of a trace file.

E.g.:

run {

allocate channel t1 type ‘sbt_tape’ trace 1;

debug on;

restore tablespace PSAPBTABD ;

recover tablespace PSAPBTABD ;

release channel t1 ;

debug off;

}

The ‘TRACE 1’ will create an Oracle trace file in udump for each channel allocated. We need to see what calls were made last in those files. The trace will capture the conversation between the MML and the Oracle server process.

9. Problems

HP/UX

The oracle executable errors and is unable to find the TDPO supplied tape library.

Use chatr to verify that the oracle executable uses the LD_LIBRARY_PATH/SHLIB_PATH environment variables:

% cd $ORACLE_HOME/bin

% chatr oracle

1 oracle:

2              shared executable

3              shared library dynamic path search:

4                              SHLIB_PATH enabled second

5                              embedded path disabled first Not Defined

Etc.

The command chatr is used to enable or disable SHLIB_PATH and LD_LIBRARY_PATH:

% chatr +s enable <executable>

or

% chatr +s disable <executable>

On HP-UX 10.20 and 11 (32 bit), this enables/disables SHLIB_PATH only.

On HP-UX 11 (64 bit), both SHLIB_PATH and LD_LIBRARY_PATH are enabled/disabled.

Out of memory errors

Out of memory errors can be caused by setting the FILESPERSET parameter too high, try setting FILESPERSET to 4.

Timeouts

Timeouts can be caused by setting the FILESPERSET parameter too high, try setting FILESPERSET to 1.

Oracle 8.1.6

There are known issues linking Oracle 8.1.6 to TDPO. Instructions for linking this release can be found in metalink article147862.1.

ORA-19506: failed to create sequential file, name=”al_t468634250_s3617_p1″, parms=””

Verify that the format parameter guarantees a unique backup piece name! This can most easily be achieved by adding the %u parameter to the file name mask.

Symbol putmsg in ksh is undefined or Symbol getmsg in ksh is undefined

As root, run /usr/sbin/strload

Parameter length problems

In releases prior to 2.2 parameters were defined in the RMAN script rather than the tdpo.opt file. This potentially led to issues with limits on the length of the parameter string. As a work around, Oracle recommended producing short paths using symbolic links that can be substituted for long path names to reduce the length of the parameter string.

Problems connecting to the TSM server

The following error can be caused by setting the IDLETIMEOUT parameter to a value that is too high:

“Enter your user id: admin

ANS1017E Session rejected: TCP/IP connection failure

ANS8023E Unable to establish session with server.

ANS8002I Highest return code was -50.”

The above error was encountered when the IDLETIMEOUT parameter was set to the value: “999999999”.

Sbtio.log grows very large very quickly

Tivoli misunderstood how tracing worked and released the Tivoli Data Protection Agent 2.2 with tracing turned on to the maximum. Tivoli provided a patch that defaulted the tracing to level 0. The patch may be found on:

ftp://index.storsys.ibm.com/tivoli-storage-management/patches/tivoli-data-protection/oracle/platform

10. Hints, tips and notes

HP/UX

The 805 64bit on HP is different than other 64bit versions. There is no lib64 directory. Also the libobk is not a shared library (.sl) but an archive library (.a).

11. Q&A

Why is my incremental backup not faster than my full backup?

Incremental backups increase the speed when the network is the bottleneck, when the tape devices are local the backup will take a similar amount of time.

12. Cloning

In a disaster recovery scenario, it is assumed that you are essentially recreating a system that will have all of the same logical characteristics as the original node. However to clone a database to a different server that will in Tivoli terms constitute a new node requires a slightly more complex procedure than that used for a disaster recovery.

To restore a database to a different node using Tivoli requires that you configure the system to make RMAN believe that the media manager node is the same as the one that was made to make the original backup.

Set the parameter “TDPO_NODE” in the “tdpo.opt” to be the TSM node that performed the backup.

Set the TDPO_OWNER to be the database owner on the new machine if the owner is different to that on the original machine.

Set the environment variable DSMO_NODE to be the name of the original node and then generate the TDPO password file.

Restore the database.

Change the TDPO_NODE parameter in the tdpo.opt file back to be the node name for that machine.

Set the environment variable DSMO_NODE to be the name of the new node and then regenerate the TDPO password file.

See metalink notes: 183025.1, 232107.996, 1020252.102

13. Uninstalling

To upgrade Oracle you will need to unlink the TDPO library from the current release of oracle, upgrade oracle and then relink.

You must recreate the symbolic link between the libobk.a dynamic library and the $ORACLE_HOME/lib/libdsbtsh8.a which is the standard Oracle I/O library.

Unlinking e.g.:

$ cd $ORACLE_HOME/lib

$ rm libobk.a

$ ln -s libdsbtsh8.a libobk.a

14. Performance

If DSMO_AVG_SIZE is defined too large for the defined disk storage pool, ADSM will skip the disk storage pool and use the secondary tape storage pool instead, without first writing to the disk storage pool.

(Metalink 1050834.6)

There is a bug in Oracle 8.1.7.4 that can lead to deteriorating performance of backups; a work around for this bug is to analyze all the tables in the RMAN recovery catalog. (Metalink 279968.995)

Your first step is probably to identify (or narrow down the list of suspects for) the bottleneck.

How fast is your network connection, and how busy? On a really good day, a *dedicated* 10Mbit Ethernet will give you about 1MB/sec throughput. That’s about 3.5 GB/hour and about 30 hours to backup 100 GB. (I’m assuming that you meant “100GB” rather than “100MB” as your database size).

A single DLT 7000 has a native transfer rate of about 5MB/sec. Under ideal conditions, and without compression, this will give you a maximum of 18 GB/hour; if your backups are not buffering well, the tape drive may be stopping and starting a lot (shoe-shining), and your effective transfer rate will plummet. (Turning compression on could actually hurt sometimes; this will double — or more — the rate that the tape drive can accept data, increasing the possibility of start/stop problems.)

What is your database server doing during the backup? Is the CPU busy? Does it page a lot; are you getting large amounts of I/O wait? Setting filesperset=100 may be a mistake. If you’re seeing a high %WIO, try setting this to the number of disks on your system (or less). Unless your disks are more than 10 years old, serial reads on one or two disks should be able to easily exceed the bandwidth of most networks.

It is recommended that 2 channels per device are allocated, but test to verify the performance improvement.

See also

http://otn.oracle.com/deploy/availability/pdf/rman_performance_wp.pdf

Update (18-Oct-2013) the original document appears to have been updated and renamed:

http://www.oracle.com/technetwork/jp/content/rman-performance-wp-133899.pdf

An additional document is now also available:

http://www.oracle.com/technetwork/jp/content/rman-perfromance-scbc-wp-133854.pdf

SAP includes its own parameter file for RMAN:

$ORACLE_HOME/dbs/sap/init.sap

This file contains parameters that override the default RMAN parameters.

E.g. of init.sap parameter:

RMAN_FILESPERSET = 10

16. Sources of additional Tivoli DPO information

Supported platforms:

http://www.tivoli.com/products/index/storage-mgr-db/platforms.html

(Update 18-Dec-2013) Supported platforms are now found via this landing page:

http://www-01.ibm.com/software/sysmgmt/products/support/Tivoli_Supported_Platforms.html

This will will eventually redirect you to IBMs Software Product Compatibility Reports page:

http://publib.boulder.ibm.com/infocenter/prodguid/v1r0/clarity/index.html

Certification matrix:

http://www.tivoli.com/support/storage_mgr/adoracjp.htm

Product information

http://www.tivoli.com/products/index/data_protect_oracle/

http://www.oracle.com/ip/deploy/database/features/recovery/index.html?recovery.html

Support information / hardware requirements

http://www.tivoli.com/support/storage_mgr/adagent.htm

Other useful Documentation

Oracle 9i RMAN Backup and Recovery.

IBM Redbook:

Backing Up Oracle Using Tivoli Storage Management

http://www.redbooks.ibm.com/abstracts/sg246249.html

Note that this book describes TDPO v2.1, Not v2.2

Documentation for tdposync:

http://publib.boulder.ibm.com/tividd/td/DPON/SH26-4112-02/en_US/HTML/anou0010.htm

RMAN performance tuning document:

http://otn.oracle.com/deploy/availability/pdf/rman_performance_wp.pdf

17. Glossary

TDP – Tivoli Data Protection

TDPO – Tivoli Data protection for Oracle

18. TDPO error codes reported back to RMAN

‘*’ Are internal and should never be seen during normal operation:

/* Error codes for sbtopen */

7000 /* file not found (only for read) */

7001 /* file exists (only for write) */

* 7002 /* bad mode specified */

* 7003 /* bad block size specified */ Bad tpblksiz, should be multiple of 512

7004 /* no tape device found */

7005 /* device is busy; try again later */

7006 /* tape volume not found */

7007 /* tape volume is in-use */

7008 /* I/O Error */

7009 /* cant connect with Media Manager */

7010 /* permission denied */

7011 /* OS error */ system error – e.g. malloc, fork errors

7012 /* invalid argument(s) */ invalid argument(s) to sbtopen

7013 Backup file exists

/* Error codes for sbtclose */

7020 /* bad th (no sbtopen done) */

7021 /* bad flags */

7022 /* I/O error */

7023 /* OS error */

7024 /* invalid argument(s) */

7025 /* cant connect with Media Manager */

/* Error codes for sbtwrite */

* 7040 /* bad th */ bad th to sbtwrite

7042 /* I/O error */

7043 /* OS error */

* 7044 /* invalid argument(s) */ invalid argument(s) to sbtwrite

/* Error codes for sbtread */

* 7060 /* bad th */ bad th to sbtread

7061 /* EOF encountered */

7063 /* I/O error */

7064 /* OS error */

* 7065 /* invalid argument(s) */ invalid argument(s) to sbtread

/* Error codes for sbtremove */

7080 /* backup file not found */

7081 /* backup file being used */

7082 /* I/O Error */

7083 /* cant connect with Media Manager */

7084 /* permission denied */

7085 /* OS error */

* 7086 /* invalid argument(s) */ invalid argument(s) to sbtremove

/* Error codes for sbtinfo */

7090 /* backup file not found */

7091 /* I/O Error */

7092 /* cant connect with Media Manager */

7093 /* permission denied */

7094 /* OS error */

* 7095 /* invalid argument(s) */ invalid argument(s) to sbtinfo

/* Error codes for sbtinit */

* 7110 /* invalid argument(s) */ invalid argument(s) to sbtinit

7111 /* OS error */ system error – e.g. malloc, fork errors

* 7120 th was not of a previously sbtpopened one (sbtclose)

* 7121 Bad Flags to sbtclose

7122 I/O ERROR

7123 System Error

7124 Can’t connect with Media Manager

* 7125 Invalid Argument to sbtclose

Common error codes.

Can be issued by any SBT V2 function. **********************************************************

7500 /* Operator intervention requested */

7501 /* Catch-all media manager error */

7502 /* File Not Found */

7503 /* File already exists */

7504 /* End of File */

7505 /* cant proxy copy the specified file */

7506 /* no proxy work in progress */

Blog at WordPress.com.