Oracle Observations

February 28, 2007

The first tip.

Filed under: Unix — bigdaveroberts @ 3:14 pm

During a project in 92 at my second job, where at certain times we would build a test machine, with the latest version of the software we were developing and package it up for the customer to test at their offices; we encountered an unusual performance problem.

One of the peculiarities of the project was that the customer was sold a solution based around Unix V4 on an Intel platform, whereas, at the development stage, the only machines available were based on Unix V3 on a Motorola based box.

The initial releases based on the old hardware worked fine (apart from the bugs) and formed the basis of the initial pilots, which were successful.

However after moving the development over to the new platform, when we produced a system and passed it to the customer for testing it ran like a dog. The kit was returned to us and was tested and we confirmed the presence of a severe performance problem.

After initial investigation, the problem was put down to hardware (this was after all a new hardware platform) and a new machine was built, tested and dispatched to the customer, where again, the system performed more like a 3 legged lap dog than a greased whippet!

A simple illustration of the scale of the problem was the ps command. On our development box it ran in half a second, on the similarly specked box returned by the customer, it took a full six seconds to run!

After some analysis with truss, it was found that there was a divergence after the xstat system call was executed with a parameter of ‘/etc/passwd’.

As the contents of the file were similar and permissions on the file were the same, the problem seemed to be caused by the timestamp on the file.

How had this happened?

Well, the process of preparing the system for the customer included a final step where the passwords were changed from the private passwords we used to ones that the customer knew.

The first thing that the customer did for their testing was to reset the system clock to a date from which they had ‘live’ data for the regression testing.

From that point on, the time stamp on the /etc/passwd file, as far as the operating system was concerned was in the future, and all the cached data was considered invalid. The server would produce the results without using the cached data and re-generate the cached data at the same time, which would be time stamped with the current machine time, which was still before the timestamp on the /etc/passwd file.


My guess would be for a combination of two reasons.

1) Unix V4 was the first release with a standard ABI (application binary interface) as well as a standard API, based on the Intel platform, which in theory would allow any program compiled on one Intel based server to run on any other Intel based server. However, at the time the Intel architecture was falling behind the new RISC architecture based systems, and to make the Intel a viable platform, the amount of OS data that was cached was increased to improve the performance.

2) I believe that Unix V4 was the first release to gain C2 security accreditation. Meaning that UNIX had to rigorously enforce the validity of the cached data on which it relied before each use.

At the end of the day the problem was resolved by touching the /etc/passwd file each time the system date was put back.

On another occasion, the issue was also caused by the last modified date on the /dev file system being in the future, so the issue isn’t isolated to just the passwd file.


Blog at