Making Old Links Work with mod_rewrite

So one of the big concerns with moving from an old blog installation to new software is the possibility of breaking links from around the Internet to content that I have created. At first, one might think that this kind of thing might generate a massive SEP field, but there’s more at stake than just a broken link.

What is that, you ask? PageRank. The content on my site has garnered a non-trivial score in Google’s view of the world, and such a score is valued, as it determines not only how high your own sites appear in any given Google query, but also how high people to whom you link appear. It is valued so much, in fact, that people often resort to comment spamming even mildly popular sites in order to increase their own search results.

So making sure the old link /article.php?story=20040727104257410 correctly maps to the new link /failed_sandbox_peer_launches_or_something.html is an important migration step.

So what’s the tool of choice? Well mod_rewrite, of course. The mod_rewrite package is an Apache module that performs some black magic of rewriting URLs from one form into another. In my particular case, the requirement was to map a Geeklog story ID (SID) into a Drupal node ID (NID). Surprisingly, that’s relatively easy to do.

The .htaccess file in this application’s directory contains the following declarations. The first line examines the incoming query string for a variable called story, and it captures it using a regular expression. The second line looks for a request beginning with article.php, and then maps it to a value pulled from a mapping called geekmap. Notice the %1, which is a backreference to the SID captured in the previous line. The question mark at the end strips off any other query strings. Finally, the L stops all further mod_rewrite processing, and the R causes the new URL to be returned as an HTTP 301 response code, meaning a permanent redirect.

The two-part rewriting is required because mod_rewrite doesn’t support examination of query strings inside of a RewriteRule.

# Do Geeklog to alias mapping. RewriteCond %{QUERY_STRING} story=(\d+) RewriteRule ^article.php /${geekmap:%1|}? [L,R=301]

So that begs the question: Where is that map called geekmap defined? It can be found In the httpd.conf.

# Add a rewrite map for Geeklog to Drupal RewriteMap geekmap txt:/etc/apache2/sid-to-alias-rewrite-map.txt

That’s pretty self-explanatory, except for that txt part. It just specifies the simple text-file format for a rewrite map. There are other formats, that can be found in the documentation.

The map file is also pretty simple. It’s just a series of tab seperated lines. The left column is the key, and the right column is the value.

20040727104257410 failed_sandbox_peer_launches_or_something.html 20040808110934404 three_tequila_night.html 20040809092923819 i_dont_even_know_her.html ...

And that’s it! I do similar mappings for the old static page links, as well as the old RSS feed, but those are all much simpler cases. Finally, there’s some other Drupal magic on the backend to convert those long underscored URLs into node ID URLs (such as node/136), as well as further mod_rewrite magic to make the Drupal URLs pretty - but that’s beyond the scope of all this.