If you've ever had to do batch processing, then you know how
tedious it can be to write all the infrastructure code surrounding
retries and error recovery and usefully handling long running
processing and all the other tedium that surrounds a typical
batch application. For these types of applications, I use Spring Batch,
a batch processing framework from Dave Syer and the fine people at
SpringSource.
The basic idea is that you setup jobs that have
steps, that have tasklets. This the normal
use case, but by no means the only one. You use jobs and steps to
string together sequences of processing input and writing to output
via a reader and a writer.
Spring Batch has implementations for both reading and writing that
will likely meet most of your needs: XML, files, streams, databases,
etc. There's so much interesting stuff here, so of course I humbly
recommend you take a crack at the documentation or read my book, Spring Enterprise
Recipes.
That said all said, there's no obvious way to read from an input
source and then write to multiple files. The use case here, in my
case, is Google's
Sitemaps. These are XML files that describe the pages on your
site. You list every URL possible. If you have more than 50,000 links,
then you must create many files and list those files in a Sitemap
index file.
So, I wanted to read from a database and derive all the URLs possible
for content, and then write those to sitemap XML files, where each
sitemap could not exceed 50,000 entries. Spring Batch ships with an
adapter writer that serves exactly this purpose. It's called
org.springframework.batch.item.file.MultiResourceItemWriter.
You define it just like you might any other writer, except that you
wrap another writer with it.
Here's the salient bits from my configuration. Most of this is
boilerplate. I don't include the configuration of the Spring Batch
environment, or the configuration of the reader, because those are
pretty typical. Note that here we configure the writer
for the job and in turn configure its
delegate property, where we have the real writer
implementation. In this case, there's no need to configure the
delegate writer's resource property.
<beans:beans xmlns="http://www.springframework.org/schema/batch"
xmlns:beans="http://www.springframework.org/schema/beans"
xmlns:aop="http://www.springframework.org/schema/aop"
xmlns:tx="http://www.springframework.org/schema/tx"
xmlns:p="http://www.springframework.org/schema/p"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
http://www.springframework.org/schema/batch
http://www.springframework.org/schema/batch/spring-batch-2.0.xsd
http://www.springframework.org/schema/aop
http://www.springframework.org/schema/aop/spring-aop-2.0.xsd
http://www.springframework.org/schema/tx
http://www.springframework.org/schema/tx/spring-tx-2.0.xsd">
<beans:import resource="batch.xml"/>
<job id="batchForCreatingSitemaps">
<step id="sitemap">
<tasklet>
<chunk reader="reader" writer="writer"
commit-interval="${job.commit.interval}"/>
</tasklet>
</step>
</job>
<beans:bean id="siteMapLineAggregator"
class="com...sitemapscreator.SiteMapLineAggregator">
<beans:property name="domain" value="${sitemaps-domain}"/>
</beans:bean>
<beans:bean
class="com...sitemapscreator2.ResourceSuffixCreator"
id="resourceSuffixCreator"/>
<beans:bean id="writer" scope="step"
class="org.springframework.batch.item.file.MultiResourceItemWriter">
<beans:property name="resource"
value="file:#{jobParameters[outputResourcePrefix]}"/>
<beans:property name="resourceSuffixCreator"
ref="resourceSuffixCreator"/>
<beans:property name="saveState" value="true"/>
<beans:property name="itemCountLimitPerResource" value="50000"/>
<beans:property name="delegate">
<beans:bean
class="org.springframework.batch.item.file.FlatFileItemWriter">
<beans:property name="encoding" value="UTF-8"/>
<beans:property name="shouldDeleteIfExists" value="true"/>
<beans:property name="lineAggregator"
ref="siteMapLineAggregator"/>
</beans:bean>
</beans:property>
</beans:bean>
<beans:bean id="siteMapUrlRowMapper"
class="com...sitemapscreator.SiteMapUrlRowMapper"/>
...
</beans:beans>