The Future of Digital Archives

April 12, 2008

Ever since the announcement that portions of the Library’s new Library of Congress Experience initiative would have some user-interface components implemented using Microsoft’s Silverlight in exchange by Microsoft for some $3 million worth of hardware for kiosks, the net has been ablaze with fury over some perceived “selling-out” of the organization. For some reason, people think that www.loc.gov will shortly be 301ing to loc.microsoft.com, or something ridiculous.

(Disclaimer: These are my opinions, based on my experience and work as a digital librarian. They are not necessarily those of the Library of Congress, or anybody else. Also, I am deliberately avoiding the topic of content born-digital in a proprietary format. It’s an especially painful area for digital archivists, as the content is still important notwithstanding the problems reading it both now and in the future, and it really doesn’t bear on the current complaints, anyway.)

It’s a tricky problem, to be sure. The cost of digital conversion, preservation, and access is extremely high. The cost outlay by the National Endowment for the Humanities for digitizing the 1/2 million (and growing!) newspaper pages in Chronicling America runs into the millions of dollars. And that doesn’t even account for the cost of keeping that data around for any length of time! Funding has to come from somewhere, and so, as Ars says, there will likely be more and more of these quid pro quo arrangements in the future.

But I’ll let you in on a little secret: It doesn’t matter. The future of digital libraries, and of organizations like the Library of Congress, isn’t in the web site that people visit. It isn’t in the technology choices made for viewing content. And it certainly isn’t going to be controlled by any particular company.

See, though the user interface may be written in something proprietary, like Flash or Silverlight, the archived bits aren’t stored that way. Librarians are an insanely conservative bunch, made that way through hundreds of years of experience in attempting to keep old stuff and make it available for new people. The guidelines for digital preservation reflect that. We’re still working with TIFFs for a good reason: TIFF has been around for decades, and it’ll continue to be around for decades. We know we’ll be able to read them in a hundred years. In the library world, lack of shininess on a file format isn’t a bug - it’s a feature!

But the real reason it doesn’t matter is because the digital library as a web site you visit to view content won’t exist in a decade. Instead, we’ll be serving out content via web-exposed APIs, opening the inner archives to anyone who cares to look. Sure, there will still be curated special presentations of content just like the new Experience program, and they might even use some new, fancy, proprietary technology, maybe even in exchange for $200 million of computer equipment (adjusting for inflation, here). But those special presentations will just be the tip of the content iceberg, and the rest will be there to look at in very standard and open formats.

Brian Vargas