Oracle Observations

July 12, 2007

EAGAIN (again)

Filed under: AIX — bigdaveroberts @ 3:00 pm

One of the more interesting aspects of a blog, is the ability to see the search terms used by the user in the search engine that redirected the user to this blog.

Thus it is possible for me to know that almost every day someone searches for EAGAIN, and looks at my blog on performance problems using async I/O on AIX.

As that blog entry covers a number of issues, I think that it might be worthwhile to revisit this subject and dedicate a single post to the subject of EAGAIN warnings under AIX.

The history of the EAGAIN problem under AIX as I understand it.
(Based largely on supposition rather than hard fact!)

When IBM originally produced the Asynchronous I/O subsystem for buffered file systems on AIX 3, the solution implemented was sub-optimal, in that on occasions it would unnecessarily lock the inode, and not actually always be asynchronous.

Oracle then used IBMs asynchronous I/O API to implement async I/O on AIX.

There are then 2 possibilities as to what happened.

Oracle gave insufficient instructions in the setup guide concerning async I/O configuration in the AIX environment and when IBM re-wrote the async I/O subsystem Oracle began to generate EAGAIN errors indicating a poor configuration that had been hidden by the inefficient initial implementation.

or

When IBM re-wrote the async I/O subsystem they added an additional configuration parameter, which without an appropriate setting resulted in numerous EAGAIN warnings.

What certainly did happen, was that IBM introduced new bugs into the system which required several iterations of patches to resolve.

Whatever the cause, many people running Oracle on AIX encountered an increasing number of “Warning lio_listo returned EAGAIN” messages.

The response of Oracle was to blame AIX, as before the upgrade, the warnings were not occurring, and IBM blamed Oracle, as all they had done was improve the efficiency of their async I/O system.

What should you do if you encounter EAGAIN warnings under AIX.

Firstly you should ensure that the appropriate AIX operating system patches have been applied.

Test bos.rte.aio level with:

# lslpp -l bos.rte.aio
bos.rte.aio 5.1.0.25 COMMITTED Asynchronous I/O Extension

Secondly, you should accept that the eradication of EAGAIN warnings is not a guarantee that you have actually resolved the underlying problem nor that the existence of the occasional EAGIN warning indicates a problem.

As the basic explanation of the message indicates, the warning is an indication that the I/O system is not running optimally.

In AIX the async implementation consists of a single buffer to contain all disk writes, with multiple write processes executing the write instructions.

When an EAGAIN warning occurs, it is simply an indication that the async i/o write buffer is full, and Oracle will have to absorb the overhead of attempting to write the data to the buffer again.

If you increase the size of the buffer, you will reduce the number of warnings and slightly reduce the workload on oracle, however, this should not be your first consideration or goal.

The greatest way to increase the efficiency of the system is to increase the rate that disk writes are completed and thus removed from the queue by increasing I/O bandwidth (replacing RAID 5 with mirroring, using faster disks, reducing disk contention etc), then secondly you should look at reducing the number of disk writes added to the async I/O buffer by methods of redo reduction and deletion before and recreation of indexes after data loads.

It is only after using general methods of increasing i/o efficiency, that you should then turn to attempts to tune the async I/O subsystem itself.

You should consider that while using async I/O you can configure multiple processes to write to the hard disk simultaneously, a Hard disk can only physically write to one place on a hard disk at a time. Thus it is only through the combination of NCQ and disk buffers that implementing multiple write processes per disk will actually increase I/O. Thus even if you do reduce the number of EAGAIN warnings by increasing the number of write processes, that is not a guarantee that the speed of the system has been increased! Again by increasing then size of the async i/o buffer, you may well reduce the number of EAGAIN warnings, but if the memory utilised could have been better used to increase the size of the SGA, then the performance of the system may be reduced, even though the number of warnings has been reduced.

Obviously, if you haven’t changed the configuration of the system, and the number of EAGAIN warnings is on the increase, then that is an indication of a problem, but the solution may well not be in the realm of the DBA, it may be that new inefficient routines are being implemented by the developers.

In short, the EAGAIN warning itself should not be considered the problem itself, but rather it should be considered another symptom that if not eradicated, probably needs to be monitored and managed.

Further information:

IBM documents on tuning oracle on AIX:

http://www-941.ibm.com/collaboration/wiki/download/attachments/5570/Oracle_on_AIX_WebinarFeb2007.pdf

http://www.sioug.si/sioug2005/datoteka.jsp?filename=Tomaz%20Vincek%20-%20Oracle%2010g%20Performance%20Tuning%20v%20UNIX%20okolju.pdf

For the enthusiastic, AIX documentation about the lio_listo function:

https://www-rz.uni-hohenheim.de/betriebssysteme/unix/aix/aix_4.3.3_doc/ext_doc/usr/share/man/info/en_US/a_doc_lib/aixprggd/kernextc/async_io_subsys.htm

And an interesting metalink article:

34924.996

IBM response to query about aiostat (a tool that IBM supplied to analyse the volume of asynchronous i/o calls)

http://www-1.ibm.com/support/docview.wss?uid=std3295a41c3f8c73bda49256f66000ecf3d

Advertisements

1 Comment »

  1. Hi.
    I red your article about aix eagin. thank you.
    i used to oracle(10.2.3.0) on aix 5.3 with async IO data file system. i don’t know better or bad about async IO or CIO.
    what do you think about that ?
    if i want to CIO, i will do what ?

    please.. send email..thank you..

    Comment by Hun Kim — September 28, 2008 @ 8:54 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: