Adventures in Enormous Lucene Indexes on AIX

I have been working hard over the last several weeks to port our system at work from our x86 Linux development environment to the PowerPC AIX production environment. Fortunately for us, most of the platform differences are well hidden because our code is generally platform independent: Java, XSLT, and JavaScript. There are a few cases where we make calls to a JNI library, but the libraries exist and are supported for the varying platforms, and we have had no trouble with those.

What we unexpectedly had trouble with, though, was our fulltext Lucene index. Weighing in at a massive 55 GB, and only expected to get bigger, we were duly impressed at our development environment’s ability to process the index with no hiccups, in addition to consistently speedy search times. When I moved it to AIX, however, something went amiss. We started receiving this exception, which the stack trace revealed was coming from Lucene’s index reading code:

java.io.IOException: Unknown format version:-16056063

We confirmed with MD5 hashes that the files were identical in both environments, and we confirmed that the Lucene libraries were all correct. That left us with some obscure platform difference we had to track down.

Using a smaller test index, we were able to confirm that Lucene was able to successfully open an index on AIX, confirming Lucene’s own touted endian agnosticism. We also lifted file write size ulimits on certain users to confirm that that limit didn’t unintentionally affect the ability to read files as well.

Finally, we discovered through some documentation (of all places!) that 32-bit IBM programs are limited to file reads of no more than ~2 GB - that magic 2^31 - 1 limit - and our Java virtual machine was only 32-bit! Simply upgrading to the 64-bit JVM solved the problem.

We hadn’t thought of this because we were using a 32-bit JVM in development, with no problems, but the crucial difference is that it was the Sun JVM. We later installed the 32-bit IBM JVM onto a development environment and confirmed that it cannot open our index file there, either. Notably, however, it provided a much more useful error message:

java.io.IOException: Value too large for defined data type at java.io.RandomAccessFile.length(Native Method) at org.apache.lucene.store.FSIndexInput.(FSDirectory.java:440)

Rather than throwing an IOException from the java.io code, the IBM JVM on AIX simply returned bogus data. This caused Lucene’s index reader to throw an exception because, coincidentally, the number it was trying to read at that magic signed integer limit was expected to be a file version number. It was expecting to see -1, but instead got -16056063.

And so everything seems to running swimmingly now. The moral of the story is: Beware of big files on 32-bit machines.