Adventures in Enormous Lucene Indexes on AIX
What we unexpectedly had trouble with, though, was our fulltext Lucene index. Weighing in at a massive 55 GB, and only expected to get bigger, we were duly impressed at our development environment’s ability to process the index with no hiccups, in addition to consistently speedy search times. When I moved it to AIX, however, something went amiss. We started receiving this exception, which the stack trace revealed was coming from Lucene’s index reading code:
java.io.IOException: Unknown format version:-16056063
We confirmed with MD5 hashes that the files were identical in both environments, and we confirmed that the Lucene libraries were all correct. That left us with some obscure platform difference we had to track down.
Using a smaller test index, we were able to confirm that Lucene was able to successfully open an index on AIX, confirming Lucene’s own touted endian agnosticism. We also lifted file write size ulimits on certain users to confirm that that limit didn’t unintentionally affect the ability to read files as well.
Finally, we discovered through some documentation (of all places!) that 32-bit IBM programs are limited to file reads of no more than ~2 GB - that magic 2^31 - 1 limit - and our Java virtual machine was only 32-bit! Simply upgrading to the 64-bit JVM solved the problem.
We hadn’t thought of this because we were using a 32-bit JVM in development, with no problems, but the crucial difference is that it was the Sun JVM. We later installed the 32-bit IBM JVM onto a development environment and confirmed that it cannot open our index file there, either. Notably, however, it provided a much more useful error message:
java.io.IOException: Value too large for defined data type at java.io.RandomAccessFile.length(Native Method) at org.apache.lucene.store.FSIndexInput.(FSDirectory.java:440)
Rather than throwing an IOException from the
java.io code, the IBM JVM on AIX simply returned bogus data. This caused Lucene’s index reader to throw an exception because, coincidentally, the number it was trying to read at that magic signed integer limit was expected to be a file version number. It was expecting to see -1, but instead got -16056063.
And so everything seems to running swimmingly now. The moral of the story is: Beware of big files on 32-bit machines.