Adventures in Enormous Lucene Indexes on AIX

Java Digital Libraries

I have been working hard over the last several weeks to port our system at work from our x86 Linux development environment to the PowerPC AIX production environment. Fortunately for us, most of the platform differences are well hidden because our code is generally platform independent: Java, XSLT, and JavaScript. There are a few cases where we make calls to a JNI library, but the libraries exist and are supported for the varying platforms, and we have had no trouble with those.

What we unexpectedly had trouble with, though, was our fulltext Lucene index. Weighing in at a massive 55 GB, and only expected to get bigger, we were duly impressed at our development environment's ability to process the index with no hiccups, in addition to consistently speedy search times. When I moved it to AIX, however, something went amiss. We started receiving this exception, which the stack trace revealed was coming from Lucene's index reading code:

java.io.IOException: Unknown format version:-16056063

We confirmed with MD5 hashes that the files were identical in both environments, and we confirmed that the Lucene libraries were all correct. That left us with some obscure platform difference we had to track down.

Using a smaller test index, we were able to confirm that Lucene was able to successfully open an index on AIX, confirming Lucene's own touted endian agnosticism. We also lifted file write size ulimits on certain users to confirm that that limit didn't unintentionally affect the ability to read files as well.

Finally, we discovered through some documentation (of all places!) that 32-bit IBM programs are limited to file reads of no more than ~2 GB - that magic 2^31 - 1 limit - and our Java virtual machine was only 32-bit! Simply upgrading to the 64-bit JVM solved the problem.

We hadn't thought of this because we were using a 32-bit JVM in development, with no problems, but the crucial difference is that it was the Sun JVM. We later installed the 32-bit IBM JVM onto a development environment and confirmed that it cannot open our index file there, either. Notably, however, it provided a much more useful error message:

java.io.IOException: Value too large for defined data type at java.io.RandomAccessFile.length(Native Method) at org.apache.lucene.store.FSIndexInput.(FSDirectory.java:440)

Rather than throwing an IOException from the java.io code, the IBM JVM on AIX simply returned bogus data. This caused Lucene's index reader to throw an exception because, coincidentally, the number it was trying to read at that magic signed integer limit was expected to be a file version number. It was expecting to see -1, but instead got -16056063.

And so everything seems to running swimmingly now. The moral of the story is: Beware of big files on 32-bit machines.

0 Comments

Published vs. Public

Java

I'm really like the idea of separating published interfaces from public interfaces. Apparently, a JSR has been started to add an idea like this to Java. In this case, the idea is to provide superpackage, and the initially proposed syntax (taken from that site) is something like:

super package com.sun.myModule { export com.sun.myModule.myStuff.*; export com.sun.myModule.yourStuff.Interface; com.sun.myModule.myStuff; com.sun.myModule.yourStuff; com.sun.SomeOtherModule.theirStuff; org.someOpenSource.someCoolStuff; }

Despite the author's numerous warnings that this syntax is arbitrary and just for exemplifying the general idea, I have to say this syntax sucks. How about we just reuse the recently introduced support for annotations and add a Published attribute to Java. No need to change the language, no need to add another arbitrary source file. No mess, no fuss.

(via LtU)

0 Comments

Suppressing Deprecation Warnings in Import Statements

Java

I've got an import statement that references a deprecated class. However, it's old and tested, and I don't want to change it. For a long time, I annoyed the nagging warnings about such problems because there wasn't anything I could do about them.

Along comes Java 1.5 with support for annotations (that's attributes for you .NET folk). In particular, the SuppressWarnings attribute provides the ability to selectively disable warnings - precisely what I want.

So I've got something like this:

package foo;

import old.package.OldClass;

@SuppressWarnings("deprecation") public class Bar { private OldClass iAmDeprecated; }

And everything works great, except for that pesky import statement. It still tosses a warning at me, and I can't get it to go away! Putting the attribute into the package-info.java file doesn't work, either, because the SuppressWarnings attribute isn't declared as a package annotation target.

For now, I've worked around the problem by removing the import and fully quaifying the class name in my code, but that's lame. If anybody knows the "right" way to do this, please contact me.

0 Comments