Oracle Observations

July 17, 2007

ORA-600 [15015] revisited.

Filed under: ORA-600 — bigdaveroberts @ 1:50 pm

Well Oracle has produced an analysis of the problem based on one of the hundreds of trace files produced.

With hindsight, one of the symptoms I should have mentioned in my original post was that when logging into sql*plus you received an error indicating that the set_application_info procedure was invalid.

Oracles conclusion was that due to a bug in Oracle(1867501). Sometimes if a process connects to Oracle as SYSDBA and issues commands while the database is starting up, the SGA can be corrupted.

From the point that this happens, then all of the following errors (including the ORA-600) are secondary.

I do like the response in the fact that it fits my favored scenario of being caused by an unforeseen side effect of a change. I am however suspicious, because the change that involved scheduling a script to regularly connect to the database as SYSDBA and run a script was implemented more than 12 months ago. So I am still concerned that one of the more recent changes may also be implicated as a secondary cause of the problem.

OTOH the information Oracle has given us allows us to make a change that will avoid the problem in future!

If anyone else encounters the same error, I would be interested in any information you have with regards to what you may have recently done to your system!

July 11, 2007

ORA-00600 [15015] and all that.

Filed under: ORA-600 — bigdaveroberts @ 11:35 am

Well last Wednesday was the first fun day (at work) for a long while.

The application, database and OS were all struggling, and I suspect that the network was also experiencing problems.

After consideration, the apparent cause (based on being the earliest errors we could find evidence of) were repeated ORA-600 errors (predominantly 15015), starting within seconds of the database being restarted after the backup (which also failed).

The errors appeared to be related to the snapshot process that was dieing every 5 minutes and was then being automatically restarted by the database.

I looked up ORA-600 [15015] on Google, and got no hits and I looked up 15015 on the Metalink ORA-600 argument look up tool and received the unhelpful response:

A description for this ORA-600 error is not yet published.

I also searched in the knowledge base including the archived articles and bug database and received no hits.

So we have a stable system on a terminal release (8.1.7.4) that suddenly and for no apparent reason starts kicking out super obscure errors.

And it isn’t as if there have been any significant changes implemented.

There was one change to patch an oracle bug that reared its head when we started running the client under Citrix and one to increase the size of the SGA. Both changes were implemented more than a month ago.

Before you get excited, I have to say that I don’t know what the problem is. (Lets be frank, the only reason you are reading this is because you are experiencing the same error.)

So where do my theories lie?

There is a general tendency for Software people to blame hardware when a new problem appears in a stable system, but I wouldn’t initially blame hardware. (/var/adm/messages didn’t have anything novel in it until a disk partition filled from all the core dumps and the problem was resolved by a reboot.)

I also don’t tend towards the conspiracy theorists that assume that all problems start with an uncontrolled change made by some well meaning techie. Certainly pkginfo didn’t indicate that the system had been patched or had any new packages installed within the last 12 months.

Generally I find that unexpected problems are most often explained by the unforeseen outcomes of poorly understood changes, and while the 2 changes appear to be superficially innocuous, it is there that my suspicion starts.

The Oracle patch will probably have been installed on multiple systems for multiple customers, and while it may be possible for the interaction between multiple patches to produce unusual results, the simple fact is that the system is rarely patched and is as close to a vanilla install as possible. Thus I think that the patch is unlikely to be the cause.

Thus finally we are left with the SGA increase. This does worry me slightly, in that the size of the SGA is now close to the SHMMAX setting for the maximum shared memory segment size, to the extent that on some mornings we receive a warning:

WARNING: Not enough physical memory for SHM_SHARE_MMU segment of size 0xnnnnnnnn

in the alert log, which Metalink unhelpfully indicates may be serious on some versions and innocuous on others.

So with my suspicion that there is an issue with shared memory I have scheduled a cron job to record the results for ipcs -Am before and after each backup window, and leave the problem with a watching brief.

Obviously when Oracle comes up with a response I will post an update.

Create a free website or blog at WordPress.com.