What You Learn

Version 2.0 of Chronicling America went online yesterday.  Congratulations are in order to David, Ed, Dan, Curt, and everybody else on the team!

The new version looks almost the same as the old version, but is entirely different on the back-end.  The 1.x series attempted to build an end-to-end digital repository using XML-centric technologies — FEDORA and Cocoon — with a more traditional Django and MySQL web application.  The original version was complicated, slow, scaled poorly, and suffered stability problems from day one.  We had a very restrictive robots.txt in place because search crawlers would regularly crash the application.

The new version finally has navigable permalinks, makes some vast improvements in the ingest workflow, and sports some RDF data linked in from the HTML pages.  It scales more predictably, is a lot more stable, and has a vastly smaller codebase.  I had very little to do with the development of the new version, mostly providing advice and historical perspective.

The retirement of the old code is a little bittersweet and definitely humbling.  After all, my primary contribution to the 2.0 release was lessons in how not to build a repository system.  Fortunately, much of the knowledge gained from the first release has made its way into other projects — spawning some, improving others.  Useful tools and concepts like BagIt, workflow, transfer, and bit storage have all been informed by anecdotes, scenarios, situations, and problems from Chronicling America 1.0.

It’s true that you learn more from failure than from success.  But it sure ain’t pretty on the ego.