And who cares?
RSS icon Email icon Home icon
  • Chronicles of Windows 7 Part 1: Qualcomm Gobi 3G Modem and VMWare NAT

    Posted on May 28th, 2009 Brian 1 comment

    So I went ahead and installed Windows 7 RC 1.  The process is remarkably smooth, and the OS is nicely polished.  The new task bar is a long-overdue change, formerly difficult or esoteric system tasks are now simple and obvious, and the Libraries paradigm in Explorer has pleasantly surprised me.

    But that’s not to say there aren’t some niggling issues.  This is a new release – nay, a pre-release – of the most popular operating system in the world.  There are bound to be some compatibility problems.  What is truly amazing is how well things work right out-of-the box.

    As I use the OS day-to-day, I’ll post some updates about real-life surprises and tribulations.  Here are my first two.

    Qualcomm Gobi 3G Modem

    Winodws 7 recognized almost every single piece of hardware on my HP Elitebook 8530w, including the silly fingerprint reader and the webcam I never use.  The one thing it didn’t already have drivers for was the built-in Qualcomm Gobi un2400 modem 3G.  What’s worse, the Vista drivers from HP’s support site don’t install, either.

    Fortunately, some amazingly enterprising soul figured out the problems, and was not only able to divine how to install the drivers, but then even wrote a schnasty little program to force-feed the Gobi modem its appropriate firmware.  Major kudos!  Unfortunately for me, it still doesn’t work.  There’s some magic incantation that isn’t being done quite right for my AT&T setup, so I’ll have to wait until the drivers get updated.  Hopefully that’ll be soon – paying for a data plan I’m not using is rather annoying.

    But, really, given how esoteric and fragile these 3G modems are, it’s not that surprising something bjorked their spaghetti-like functioning.  (Did you read the “More About The Firmware” section at that link?!)

    VMWare NAT Failure

    The only other true problem I’ve had is with VMWare Workstation 6.5.  It works like a charm, except that NAT routing fails to work correctly.  Interestingly, the guests can ping out, but other connections fail.  It’s a known issue, though, and will certainly be fixed soon.  And the work-around is simple enough: Just use bridging instead.

  • Confession of a Credit Card Deadbeat

    Posted on May 19th, 2009 Brian 6 comments

    The roiling economy has uncomfortably squeezed the profits of various huge financial mega-corporations.  As the bottom-of-the-barrel customers are no longer able to honor their obligations, the fatty underbelly of fees and interest which the poorest consumers have struggled to pay is suddenly looking a little lean.  To make back their profits, the credit card companies are looking to eliminate the cash-back and frequent flier miles, reduce or rescind interest grace periods, and reinstate annual fees. Stack of Credit Cards

    There’s no way this could backfire.

    An amusingly twisted word in the credit card parlance is “deadbeats”.  Counter-intuitively, these companies use that term to describe their very best customers.  Do you pay your card off in full every month?  Do you rarely, if ever, accrue interest or late fees?  Do you regularly cash in your frequent flier miles and cash-back bonus?  Yes?  Well, since you don’t make much money for them, they don’t like you.  They tolerate you, but you’re gaming the system, getting a free ride.  You’re a deadbeat.

    It’s nice to know what they really think of you.

    Don’t weep, though.  They still make plenty of money from customers like me.  Every time we use our cards, the merchant is charged hefty fees for privilege of accepting our card of choice.  We don’t pay that directly, but we pay it indirectly through higher prices in restaurants, shops, and online.

    But it’s not enough, so they’re coming after the good customers.  Ironically, they’re coming after the customers who need them least. I admit it: I am a credit card deadbeat.  And if I suddenly have to pay an annual fee or lose my grace period, what incentive do I have just to not carry cash?  I will happily abandon the credit card companies, tossing their aggressive advertising, obnoxious phone calls, and invasive behavior tracking right along with their annual fees and interest rates.  And good riddance!

    And, assuming I’m not the only one happy to return to legal tender, then the credit card companies are sowing the seeds of their own doom.  As their best customers jump ship, their balance sheets will be left a ghetto of poor credit customers.  As losses mount from those who can no longer pay, the ratio of good assets to bad will finally topple the once-mighty giants of consumer finance.

    Stack of Credit Cards modified from the original Too Much Credit by Andres Rueda under a CC Attribution 2.0 Generic license.

  • Multi-Threading with VFS

    Posted on May 14th, 2009 Brian No comments

    One of the new features in the BagIt Library will be multi-threading CPU-intensive bag processing operations, such as bag creation and verification.  Modern processors are all multi-core, but because the current version of the BagIt Library is not utilizing those cores, bag operations take longer than they should.  The new version of BIL should create and verify bags significantly faster than the old version.  Of course, as we add CPUs, we shift the bottleneck to the hard disk and IO bus, but it’s an improvement nonetheless.

    Writing proper multi-threaded code is a tricky proposition, though.  Threading is a notorious minefield of subtle errors and difficult-to-reproduce bugs.  When we turned on multi-threading in our tests, we ran into some interesting issues with the Apache Commons VFS library we use to keep track of file locations.  It turns out that VFS is not really designed to be thread-safe.  Some recent list traffic seems to indicate that this might be fixed sometime in the future, but it’s certainly not the case now.

    Now, we don’t want to lose VFS – it’s a huge boon.  Its support for various serialization formats and virtual files makes modeling serialized and holey bags a lot easier.  So we had to figure out how to make VFS work cleanly across multiple threads.

    The FileSystemManager is the root of one’s access to the VFS API.  It does a lot of caching internally, and the child objects coming from its methods often hold links back to each other via the FileSystemManager.  If you can isolate a FileSystemManager object per-thread, then you should be good to go.

    The usual way of obtaining a VFS is through the VFS.getManager() method,which returns a singleton FileSystemManager object.  Our solution was to replace the singleton call with a ThreadLocal variable, with the initialValue() method overloaded to create and initialize a new StandardFileSystemManager.  The code for that looks like this.

    private static final ThreadLocal fileSystemManager = new ThreadLocal() { @Override protected FileSystemManager initialValue() { StandardFileSystemManager mgr = new StandardFileSystemManager(); mgr.setLogger(LogFactory.getLog(VFS.class)); try { mgr.init(); } catch (FileSystemException e) { log.fatal("Could not initialize thread-local FileSystemManager.", e); } return mgr; } };

    The downside is that we lose the internal VFS caching that the manager does (although it still caches inside of a thread).  But that’s a small price to pay for it working.

  • Human Beings Are Big-Endian

    Posted on May 11th, 2009 Brian No comments

    I always have trouble remembering the difference between big-endian and little-endian.  The names don’t make any sense, so it ends up being a mere definition – and I have trouble with arbitrary definitions.  In the past, after figuring it out, I have noted to myself that human beings are big-endian as a memory-aid.  That is, we put our most-significant digits on the left.  And that’s great to remember, except then I forgot which endianness we were.

    I guess I need a memory-aid for my memory-aid.

    So this is a note to my future self: Human beings are big-endian.  Well, at least English-speaking, Arabic-numeral-using, base-ten-counting human beings who assume that linear memory addresses increase as you go from left-to-right.  Those assumptions seem good enough for me, though.

  • Funny Smelling Code – Endlessly Propagating Parameters

    Posted on May 8th, 2009 Brian No comments

    We’re currently working on a new version of the BagIt Library: adding some new functionality, making some bug fixes, and refactoring the interfaces pretty heavily.  If you happen to be one of the people currently using the programmatic interface, the next version will likely break your code.  Sorry about that.

    The BagIt spec is pretty clear about what makes a bag valid or complete, and it might seem a no-brainer to strictly implement validation based on the spec.  Unfortunately, the real-world is not so simple.  For example, the spec is unambiguous about the required existence of the bagit.txt, but we have real bags on-disk (from before the spec existed) that lack the bag declaration and yet need to be processed.  As another example, hidden files are not mentioned at all by the spec, and the current version of the code treats them in an unspecified manner.  On Windows, when the bag being validated has been checked out from Subversion, the hidden .svn folders cause unit tests to fail all over the place.

    It seems an easy enough feature to add some flags to make the bag processing a bit more lenient.  In fact, the checkValid() method already had an overloaded version which took a boolean indicating whether or not to tolerate a missing bagit.txt.  I began by creating an enum which contained two flags (TOLERATE_MISSING_DECLARATION and IGNORE_HIDDEN_FILES), and began retrofitting the enum in place of the boolean.

    And then I got a whiff.

    I found that, internally, the various validation methods call one another, passing the same parameters over and over.  Additionally, the validation methods weren’t using any privileged internal information during processing – only public methods were being called.

    I called Justin this morning to discuss refactoring the validation operations using a Strategy pattern.  This would allow us to:

    1. Encapsulate the parameters to the algorithm, making the code easier to read and maintain.  No more long lists of parameters passed from function call to function call.
    2. Vary the algorithm used for processing based on the needs of the caller.
    3. Re-use standard algorithm components (either through aggregation or inheritance), simplifying one-off cases.

    He had also come to the same conclusion, although driven by a different parameter set.  It’s a good sign you’re headed in the right direction when two developers independently hacking on the code come up with the same solution to the same problem.

  • Useful PDF ImageMagick Recipes

    Posted on May 6th, 2009 Brian No comments

    It turns out that ImageMagick is really quite good at reading, writing, re-arranging, and otherwise mucking with PDFs.  Unfortunately, you need to know the proper incantation, which can take much trial and error to figure out.  So, for my own future reference:

    Split A PDF Into Parts

    $ convert -quality 100 -density 300x300 multipage.pdf single%d.jpg

    The quality parameter is the quality of the written JPEGs, and the density is the DPI (in this case, 300 DPI in both X and Y).

    Join JPEG Parts Into A PDF

    $ convert -adjoin file*.jpg doc.pdf

    Rotate a PDF

    $ convert -rotate 270 -density 300x300 -compress lzw in.pdf out.pdf

    This assumes a TIFF-backed PDF. The density parameter is important because otherwise ImageMagick down-samples the image (for some reason). Adding in the compression option helps keep the overall size of the PDF smaller, with no loss in quality.

    Now, if I can just figure out how to make future me remember to look here…

  • What You Learn

    Posted on May 5th, 2009 Brian 1 comment

    Version 2.0 of Chronicling America went online yesterday.  Congratulations are in order to David, Ed, Dan, Curt, and everybody else on the team!

    The new version looks almost the same as the old version, but is entirely different on the back-end.  The 1.x series attempted to build an end-to-end digital repository using XML-centric technologies — FEDORA and Cocoon — with a more traditional Django and MySQL web application.  The original version was complicated, slow, scaled poorly, and suffered stability problems from day one.  We had a very restrictive robots.txt in place because search crawlers would regularly crash the application.

    The new version finally has navigable permalinks, makes some vast improvements in the ingest workflow, and sports some RDF data linked in from the HTML pages.  It scales more predictably, is a lot more stable, and has a vastly smaller codebase.  I had very little to do with the development of the new version, mostly providing advice and historical perspective.

    The retirement of the old code is a little bittersweet and definitely humbling.  After all, my primary contribution to the 2.0 release was lessons in how not to build a repository system.  Fortunately, much of the knowledge gained from the first release has made its way into other projects — spawning some, improving others.  Useful tools and concepts like BagIt, workflow, transfer, and bit storage have all been informed by anecdotes, scenarios, situations, and problems from Chronicling America 1.0.

    It’s true that you learn more from failure than from success.  But it sure ain’t pretty on the ego.