We’re currently working on a new version of the BagIt Library: adding some new functionality, making some bug fixes, and refactoring the interfaces pretty heavily. If you happen to be one of the people currently using the programmatic interface, the next version will likely break your code. Sorry about that.
The BagIt spec is pretty clear about what makes a bag valid or complete, and it might seem a no-brainer to strictly implement validation based on the spec. Unfortunately, the real-world is not so simple. For example, the spec is unambiguous about the required existence of the bagit.txt, but we have real bags on-disk (from before the spec existed) that lack the bag declaration and yet need to be processed. As another example, hidden files are not mentioned at all by the spec, and the current version of the code treats them in an unspecified manner. On Windows, when the bag being validated has been checked out from Subversion, the hidden .svn folders cause unit tests to fail all over the place.
It seems an easy enough feature to add some flags to make the bag processing a bit more lenient. In fact, the checkValid() method already had an overloaded version which took a boolean indicating whether or not to tolerate a missing bagit.txt. I began by creating an enum which contained two flags (TOLERATE_MISSING_DECLARATION and IGNORE_HIDDEN_FILES), and began retrofitting the enum in place of the boolean.
And then I got a whiff.
I found that, internally, the various validation methods call one another, passing the same parameters over and over. Additionally, the validation methods weren’t using any privileged internal information during processing – only public methods were being called.
I called Justin this morning to discuss refactoring the validation operations using a Strategy pattern. This would allow us to:
- Encapsulate the parameters to the algorithm, making the code easier to read and maintain. No more long lists of parameters passed from function call to function call.
- Vary the algorithm used for processing based on the needs of the caller.
- Re-use standard algorithm components (either through aggregation or inheritance), simplifying one-off cases.
He had also come to the same conclusion, although driven by a different parameter set. It’s a good sign you’re headed in the right direction when two developers independently hacking on the code come up with the same solution to the same problem.