<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Dumping Ground &#187; threading</title>
	<atom:link href="http://ardvaark.net/tag/threading/feed" rel="self" type="application/rss+xml" />
	<link>http://ardvaark.net</link>
	<description>And who cares?</description>
	<lastBuildDate>Tue, 15 May 2012 14:57:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Multi-Threading with VFS</title>
		<link>http://ardvaark.net/multi-threading-with-vfs</link>
		<comments>http://ardvaark.net/multi-threading-with-vfs#comments</comments>
		<pubDate>Thu, 14 May 2009 15:32:19 +0000</pubDate>
		<dc:creator>Brian</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Digital Libraries]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[apache commons]]></category>
		<category><![CDATA[bagit]]></category>
		<category><![CDATA[threading]]></category>
		<category><![CDATA[vfs]]></category>

		<guid isPermaLink="false">http://ardvaark.net/multi-threading-with-vfs</guid>
		<description><![CDATA[One of the new features in the BagIt Library will be multi-threading CPU-intensive bag processing operations, such as bag creation and verification.  Modern processors are all multi-core, but because the current version of the BagIt Library is not utilizing those cores, bag operations take longer than they should.  The new version of BIL should create [...]]]></description>
			<content:encoded><![CDATA[<p>One of the new features in the <a title="Library of Congress Transfer Utilities on SourceForge" href="https://sourceforge.net/projects/loc-xferutils">BagIt Library</a> will be multi-threading CPU-intensive bag processing operations, such as bag creation and verification.  Modern processors are all multi-core, but because the current version of the BagIt Library is not utilizing those cores, bag operations take longer than they should.  The new version of BIL should create and verify bags significantly faster than the old version.  Of course, as we add CPUs, we shift the bottleneck to the hard disk and IO bus, but it’s an improvement nonetheless.</p>
<p>Writing proper multi-threaded code is a tricky proposition, though.  Threading is a notorious minefield of subtle errors and difficult-to-reproduce bugs.  When we turned on multi-threading in our tests, we ran into some interesting issues with the <a title="Apache Commons VFS" href="http://commons.apache.org/vfs/index.html">Apache Commons VFS library</a> we use to keep track of file locations.  It turns out that VFS is not really designed to be thread-safe.  Some <a title="Email Thread: Status of current snapshot build" href="http://www.mail-archive.com/user@commons.apache.org/msg02718.html">recent list traffic</a> seems to indicate that this might be fixed sometime in the future, but it’s certainly not the case now.</p>
<p>Now, we don’t want to lose VFS – it’s a huge boon.  Its support for various serialization formats and virtual files makes modeling serialized and holey bags a lot easier.  So we had to figure out how to make VFS work cleanly across multiple threads.</p>
<p>The <a title="FileSystemManager JavaDoc" href="http://commons.apache.org/vfs/apidocs/org/apache/commons/vfs/FileSystemManager.html">FileSystemManager</a> is the root of one’s access to the VFS API.  It does a lot of caching internally, and the child objects coming from its methods often hold links back to each other via the <span class="code">FileSystemManager</span>.  If you can isolate a <span class="code">FileSystemManager</span> object per-thread, then you should be good to go.</p>
<p>The usual way of obtaining a VFS is through the <a title="VFS.getManager() Javadoc" href="http://commons.apache.org/vfs/apidocs/org/apache/commons/vfs/VFS.html#getManager()">VFS.getManager()</a> method,which returns a singleton <span class="code">FileSystemManager</span> object.  Our solution was to replace the singleton call with a <a title="ThreadLocal Javadoc" href="http://java.sun.com/javase/6/docs/api/java/lang/ThreadLocal.html">ThreadLocal</a> variable, with the <a title="ThreadLocal.initialValue() Javadoc" href="http://java.sun.com/javase/6/docs/api/java/lang/ThreadLocal.html#initialValue()">initialValue() method</a> overloaded to create and initialize a new <a title="VFS StandardFileSystemManager Javadoc" href="http://commons.apache.org/vfs/apidocs/org/apache/commons/vfs/impl/StandardFileSystemManager.html">StandardFileSystemManager</a>.  The code for that looks like this.</p>
<p>
<p class="code">private static final ThreadLocal<FileSystemManager> fileSystemManager = new ThreadLocal<FileSystemManager>() { 
    @Override 
    protected FileSystemManager initialValue() { 
        StandardFileSystemManager mgr = new StandardFileSystemManager(); 
        mgr.setLogger(LogFactory.getLog(VFS.class)); 

        try 
        { 
            mgr.init(); 
        } 
        catch (FileSystemException e) 
        { 
            log.fatal("Could not initialize thread-local FileSystemManager.", e); 
        } 

        return mgr; 
    } 
};
</p>
</p>
<p>The downside is that we lose the internal VFS caching that the manager does (although it still caches inside of a thread).  But that’s a small price to pay for it working.</p>
]]></content:encoded>
			<wfw:commentRss>http://ardvaark.net/multi-threading-with-vfs/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

