Archive for February, 2008
February 9, 2008
IS ORM a Dead End?
Before going on to explain why, I would also like to say that ORMs also facilitate Domain Driven Design. To gain this benefit though, your application is no longer relational database centric. Instead your database becomes a pseudo object repository, storing object state and object graphs. This is fine for OO programmers but can look strange to those from a pure relational DBMS background. What DDD is really saying IMO is that OO driven application design calls for an OO database. ORM and a RDMS will do, but your DDD model is still OO not Relational.
So where does this leave us? Well hopefully acknowledging that the OO Model and the Relational Data Model are just two different models. Two different ways of modeling the world around us. Which model should we choose? Well it depends on what we are trying to model and why.
Bob Martin wrote an interesting blog post on the OO Model, ORM and the Active Record pattern. According to uncle Bob, the OO Model tries to model the world in a way that provides immunity to data changes. The idea is that data is hidden (encapsulated) inside objects, and that externally no one knows or depends on the objects data type. Instead you communicate with objects through messages. The data types can change as long as the messages and the objects answer to those messages (behavior) remains the same. This is the basic idea behind polymorphisms, which provides for immunity to implementation change, including changes to the encapsulated data type.
The relational model is very different however. Its goal is to allow you to find the data you want quickly. It does this by recognising the relationships between data types and using set based maths to "join" record sets. To do this, the relational model chooses to expose all entity data. The exact opposite of encapsulation. Exposed object (entity) data is used to join records and filter results sets during querying.
So how do we square the circle? Have a model where data types are hidden, but also where we can perform powerful queries? I don't think you can, and this is why IMO almost all ORM solutions end up exposing the underlying data (all those getters and setters). In a DDD application you learn to live with this ignoring the exposed data and augmenting the data mutation methods with true OO "business" methods that provide data encapsulating, "business domain" behaviour. But given that the data is exposed, there is nothing stopping others accessing the data themselves, breaking encapsulation and bypassing business rules. In fact if you intend to do queries you need exposed data to perform joins, filters etc.
So there is no squaring the circle, and your DDD isn't truely OO. What you have are data records that can also behave like objects, but due to the lack of encapsulation afforded to true objects and the opportunity this provides to violate OO semantics, you cannot say that your design is immune to data type changes. I believe that this is Uncle Bobs main complaint with the Active Record Pattern.
I turn the argument on its head. You are producing an application where you are interested in data types, where you want to display those data types to your users and where you want to explore relationships between data types. This is what we would call a classic database application. In these applications immunity to data type changes is an impossibility. You cannot hide data types, because data is what your user is interested in. Your user is also interested in some behaviour which express business rules, but most of those rules have to do with maintaining data integrity and enforcing relationships between data entities. Your user wants to view his data.
In such an application OO encapsulation amongst domain objects serves very little purpose. Polymorphism is only useful as a means of grouping data types with common attributes, but not as a means of grouping 'objects' with common behavior, and data encapsulation becomes meaningless. So why not forget about objects and data encapsulation and use exposed mutable data types instead? Well functional languages have been using this approach for years, an hashmap (Dictionary) with name/value pairs is a mutable data type. You can represent any data type you like by nesting hashmaps. Accepting that all data will be exposed, and that data types are likely to change is a much better fit for database applications where users want to store, navigate and query data.
Given the hashmap as the primary abstraction, where does behaviour fit in? Well there is data agnostic behaviour such as "Create", "Retrieve", "Update", and "Delete" which applies to all data types. In addition to basic CRUD is querying behaviour like "Select From", "Join" and "Where condition". These are all data type agnostic and could be provided by a framework like LINQ or Rails ActiveRecord. Then there is data type specific behaviour like "Age" which calculates the age of any data type that contains the attribute "date of birth". This would need to be provided by the application programmer and associated with a set of data types.
This is why I think that Bob got it wrong in his conclusion. Data is King in data centric applications (in contrast to behaviour being King). The Active Record pattern as implemented in Rails acknowledges this fact and treats domain entities as pseudo Objects and doesn't try to pretend that they are proper objects with encapsulation. LINQ takes the same approach too.
For database applications where set based data queries are important, then ORM has always been a misnomer in my view. What we have really been doing is data-structure relational mapping. With Rails and LINQ we are now moving into "dictionary relational mapping", which in my opinion is a more natural way to model data centric applications then "ORM".
February 4, 2008
Spring Batch Recent Changes and Upcoming m4 Release

We've been working really hard on Spring Batch getting ready for the Spring Portfolio 2.5 release train, and I thought it would be a good time to update everyone on what is happening. In this article I'm going to expand a bit on the domain modelling, and our decision to raise the profile of some of the core domain objects, and increase their responsibilities. I will also give a few tastes of what is coming in the next couple of releases leading up to 1.0, so people have a chance to comment if they want to.
By way of an apology: there have been some quite significant changes in the internals of Spring Batch since last time I blogged on it, so I feel like I have been negligent. In this article I don't think I can cover all the changes, but I can promise to try and keep more up to date in the future, and I definitely will keep everyone up to date as the 1.0 release approaches. (N.B. the Spring Portfolio 2.5 release train includes Spring Batch 1.0.)
Because we have fixed 70 or more issues since 1.0.0-m3, we felt that it was time to get something out there. The plan that we made last week is to release 1.0.0-m4 tomorrow (February 5), with a slightly narrower scope than planned (e.g. postponing the XML namespace implementation). Then 1.0.0-m5 will come approximately 10-14 days later, with time for at least one release candidate before final release of 1.0.0 on March 20.
Spring Batch Core
The Spring Batch Core is quite a compact API. It actually doesn't contain much if anything we necessarily expect a batch application developer to implement or extend, so it is in effect an internal API. Nevertheless, it has quite a profound effect on Spring Batch users because it shapes the way that we think about a batch job and its execution, and (more importantly for the users) its implementation, configuration and deployment.
Job and Step and the Data Model
Up to 1.0.0-m3 we had a classical case of a mismatch between our API and the Ubiquitous Language of batch. We had to keep explaining that a JobConfiguration was what most people would think of as a "Job" (same for StepConfiguration and "Step"). The "Job" is the thing you configure to run every day, but each time it runs, it has a new identity. This was a clear signal that "Job" was the name of the domain concept, so why were we calling it "JobConfiguration"? Good question. So now (in 1.0.0-m4) the thing that the user configures is a Job, and when it runs we create a new JobInstance. The same goes for Step and StepInstance. So the Entity stereotype is fulfilled by the *Instance objects - they have an id, and a primary key in the database. An example would be the [JobInstance effective February 2 2008], for the ["end-of-day" Job].
The other Entity stereotypes are JobExecution and StepExecution. These haven't changed their names or responsibilities since m3. When a JobInstance is executed we create a JobExecution (ensuring that the instance is not already executing first). At this point the [JobExecution on February 2 at 10pm], of the [JobInstance effective February 2 2008] for the ["end-of-day" Job], is launched. But then if it fails, and is restarted the next day, then we would need a new [JobExecution on February 3 at 10.12pm] for the same JobInstance (i.e. [JobInstance effective February 2 2008] for the ["end-of-day" Job]). Thus Job is 1-to-many with JobInstance, which in turn is 1-to-many with JobExecution. The same applies to Step*.
Job and Step and Execution
We also made a change to the API in respect to execution of the Job and Step in 1.0.0-m4. Up to 1.0.0-m3 we had a separate interface for execution of each - JobExecutor executes Job, and StepExecutor executes Step. This has, as we thought at the time we designed it, benefits of encapsulation - we imagined multiple implementations of StepExecutor all able to execute the same Step. In practice, as we learned more about the contours of the implementation, we found that it was an artificial distinction. The signal, interestingly, was the fact that we had too many "instanceof" checks in Java in our StepExecutor implementations - they always had to make special cases for different Step implementations. In the end it was clear that each Step would have to know how to execute itself. As with all such insights, it is obvious when you see it, but until then it is anything but. Thank you Eric Evans.
So, for example, the central interface of the Step domain is
// … properties that the Job needs to know here …
void execute(StepExecution stepExecution)
throws StepInterruptedException, BatchCriticalException;
}
There is no need for the Step to return anything from the execute method because the StepExecution that is passed in is updated during the progress of the step. It is passed in by the caller, and can be used if needed to monitor the progress of the execution. It can also be used to stop the execution, if the caller needs to interrupt the job (hence the StepInterruptedException), just by setting a flag setTerminateOnly(). This is something that a couple of people have asked me about, so it's probably worth mentioning here: the Step is responsible for examining the value of that flag wherever it can, so there is a mechanism built into the framework for signalling early termination of a job. The effect of setting the flag depends on the implementation of Step, but we provide a SimpleStep that will inspect the flag after every item is processed, and abort if necessary. It also accepts a StepInterruptPolicy strategy which can be used to check for other abnormal conditions (e.g. Thread.isInterrupted()).
The Spring Batch Namespace
Spring Batch is almost the perfect example of where a Spring XML namespace can help to make things easier for an application developer. We haven't implemented the NamespaceHandler yet, but it is scheduled for the next milestone 1.0.0-m5, so now is a good time for you to se how the example feels.
The motto is "make the configuration look like the domain model", and in that spirit, have a look at this example and see if it makes sense. It is a draft of the new namespace that mimics the existing fixed length import sample job, so those of you familiar with that will see the parallels. If you need to find the existing samples, just go to the Spring Batch homepage to download the release (the fixed length sample configuration can also be browsed here).
<job id="fixedLengthImportJob" volatile="false">
<simple-step id="step1" chunk-size="50" save-restart-data="false"
allow-start-if-complete="true" reader-ref="fileInputTemplate">
<processor>
<beans:bean
class="org.springframework.batch.sample.item.processor.TradeProcessor">
<property name="writer" ref="tradeDao" />
</beans:bean>
</processor>
<simple-completion-policy skipLimit="5" />
</simple-step>
<simple-step id="step2" chunk-size="200">
<jdbc-cursor-reader data-source-ref="dataSource">
<query><![CDATA[SELECT ID FROM T_TRADE ORDER BY ID WHERE PROCESSED='N']]></query>
</jdbc-cursor-reader>
<processor>
<beans:bean
class="org.springframework.batch.sample.item.processor.TradeUpdater">
<property name="dao" ref="tradeDao" />
</beans:bean>
</processor>
</simple-step>
<tasklet-step id="step3" chunk-size="1">
<tasklet>
<beans:bean
class="org.springframework.batch.sample.sproc.TradeSummarizer"
p:dataSource-ref="dataSource" />
</tasklet>
</tasklet-step>
</job>
<!– INFRASTRUCTURE SETUP –>
<flat-file-reader id="fileInputTemplate"
resource="data/fixedLengthImportJob/input/20070122.teststream.ImportTradeDataStep.txt"
field-set-mapper-ref="fieldSetMapper"
validator-ref="fixedValidator">
<fixed-length-tokenizer>
<columns>
ISIN=1-12 Quantity=13-15 Price=16-20 Customer=21-29
</columns>
</fixed-length-tokenizer>
</flat-file-reader>
<beans:bean id="fixedValidator"
class="org.springframework.batch.item.validator.SpringValidator">
<property name="validator">
<bean id="tradeValidator"
class="org.springmodules.validation.valang.ValangValidator">
<property name="valang">
<value>
<![CDATA[
{ isin : length(?) < 13 : 'ISIN too long' : 'isin_length' : 12}
]]>
</value>
</property>
</bean>
</property>
</beans:bean>
<beans:bean id="tradeDao"
class="org.springframework.batch.sample.dao.JdbcTradeWriter">
<property name="jdbcTemplate" ref="jdbcTemplate" />
<property name="incrementer">
<bean parent="incrementerParent">
<property name="incrementerName" value="TRADE_SEQ" />
</bean>
</property>
</beans:bean>
<beans:bean id="fieldSetMapper"
class="org.springframework.batch.sample.mapping.TradeFieldSetMapper" />
</batch>
Notice how the details of the implementation of the Job and Step are hidden - there is a difference between the <simple-step/> and the <tasklet-step/>, but the only visible differences are those that it is necessary for the user to know about. The user doesn't need to know that there are different implementations of the Step interface backing this configuration, but the fact that there is an item-oriented and a task-oriented approach to steps is not hidden. That is what we think is important to the user. When we drafted these XML examples we took a couple of examples from the existing Spring Batch samples and hacked them around until the same configuration information was contained, but in most cases the overall size of the files dropped by 50% or more. That must mean we are on the right track.
Postscript
This has been a really quick but quite a deep dive into Spring Batch and some of the recent changes. There are a lot more interesting nuggets that I could share with you about our experience with the usage and implementation of the framework. My JavaOne presentation was accepted, which means that you can look forward to learning more in San Francisco in May, and to hearing some more examples of real-life batch processing. Also, look out for more on this blog as the 2.5 release train comes to the start of its journey (from the user's point of view that is!).
In closing I need to pay tribute to Lucas Ward and Ben Hale who have been instrumental in the Spring Batch development process. I would also like to welcome Robert Kasanicky as our newest committer - Robert has contributed some very high quality code to Spring Batch and I am sure he will continue to do so as we wrap up 1.0 and start work on the next major release. If anyone else is interested in contributing, Robert followed the normal process of hanging out in the forum and on JIRA, contributing ideas and (crucially) patches. I nominated him, and we took a vote, and now he is officialy part of the team.
Feel free to comment here if there is anything in the article you need to clarify. For general discussion of Spring Batch, please use the forum. The home page for Spring Batch is here, or it can easily be found from the Spring main website