Unloading Pages
This code, as it stands, will not help much with preserving memory. Even though we wait with reading a page until it is needed, we never throw pages out again! Here's a test that illustrates the problem:
public void testPagedCollectionPagesOutUnusedObjects() {
Category saved = categoryDao.get(largeCategory.getId());
Collection savedChildren = saved.getSubcategories();
Iterator iter = savedChildren.iterator();
while (iter.hasNext()) iter.next();
assertLoadedCount(5, largeCategory.getSubcategories());
}
This test will fail, as the whole set of subcategories of largeCategory will be loaded into the session cache at this point. In order to fix it, the Iterator has to be able to unload pages as well:
public abstract class PagedLazyCollection extends AbstractCollection {
private class PagedLazyCollectionIterator implements Iterator {
// ....
private void nextIterator() {
unloadPage(currentCollecton);
// ....
}
// ...
}
public abstract void unloadPage(Collection collection);
}
So SpaceCategoryDAO.lazyFindByParentId has to be expanded to tell the collection how to unload objects:
protected Collection lazyFindByParentId(final Long parentId) {
return new PagedLazyCollection(countByParent(parentId)) {
public Collection loadPage(int offset, int pageSize) {
return findByParentId(parentId, offset, pageSize);
}
public void unloadPage(Collection collection) {
unloadFromSessionCache(collection);
}
};
}
SpaceCategoryDAO.unloadFromSessionCache could hardly be easier:
public void unloadFromSessionCache(Collection categories) {
for (Iterator iter = categories.iterator(); iter.hasNext();) {
Category element = (Category) iter.next();
sessionCache.remove(element.getId());
}
}
At this point, we have actually implemented a lazily loaded collection that pages objects in and out as we iterate over it. The amount of code needed is surprisingly small.
Limitations
The code as I have shown has several quite severe limitations. I will explain the strategy for solving these limitations, but if you want to see a full implementation, you have to nag me to write a follow-up article.
- Adding objects to the collection: If we add or remove objects from
Category.getSubcategories, the changes will have inconsistent effects. First, the modification will not happen through the category at all, but only by modifying the parent category of the subcategories in question. This will be reflected when we search for new pages in the collection. If the modification changes something having to do with the paging, things will be bad. We might get the wrong objects in the next page. The simplest way to solve modifications to the collection involve introducing constraints. For example: if we could introduce an immutable listIndex field in the subcategory, we would know that all new subcategories will be added at the end.
- Removing objects from the collection:
Supporting remove can be even simpler: just keep a collection of removed objects in memory, and skip these objects when iterating. Neither of these strategies will work perfectly in all situations. For example, if you have an operation that will remove most of the collection, keeping the removed objects in memory is not such a good idea. In the simple case, we can ignore this problem. In the not-so-simple cases, more research is needed.
- Reference integrity: If an object in a subcategory collection is also used from other places in the code, it will accidentally be thrown out of the session cache when the collection iterates over it. This is a bona fide bug, but it is actually not too hard to solve (use two types of session cache), and the code is pretty uninteresting, so I will leave that as an exercise for the reader, if you really need it. Or you can nag me to fix it for you.
- Intelligent iteration: There are several things we might want to do with the collection in order to use it better. For example, we really need to be able to skip when iterating. Sadly, neither
Iterator nor ListIterator supports this. To avoid expanding the standard Java collection interfaces, I have not implemented my own skip method. Doing so is easy, though. Alternatively, we could implement lazy behavior in PagedLazyIterator.next.
- Sorting and subqueries: If you want to sort the subcategories or search for a subset of them, the current structure does not support this well. There are two good approaches to follow. The most straightforward approach is to just use custom methods on the DAO (like
findByParentIdAndNameOrderedByName()) and return paged collections. In this case, we have abandoned the domain model. If you want to keep the domain model, you need to use something more intelligent than SQL to query your objects. Hibernate criteria or TopLink expressions provide object libraries for queries. Use decorators on top of the collection to modify the criteria or expressions.
Conclusion
Implementing a lazily loaded, rich domain model isn't just possible, it is practically feasible. Even really big collections can be dealt with. The simple case is surprisingly simple, and depending upon your requirements, even the complex cases are manageable.
In this article, I have showed how you can construct a lazily loaded domain model that will support reference integrity, collections, and paged loading of very large collections. This is surprisingly easy to do. So the question is: should we abandon third party object-relation mapping tools like Hibernate, TopLink, and JDO and just write our domain model ourselves?
Of course not. If you look at the full JDBC implementation in the downloadable source, you can see that there are a lot of concerns I had to deal with that an ORM would do for me. Creating the queries and mapping the results to the objects is a tedious and error-prone job, and I don't recommend doing it yourself, even with a JDBC framework like Spring-JDBC. Secondly, even though there isn't that much code involved in writing a lazily loaded collection, it is easy enough to get it wrong. Leaving the job to a third party is a good idea. I trust Ted Neward when he says that ORM is the Vietnam of computer science.
However, the ORMs available today do not even attempt to solve the one-to-very-very-many relationship. You can use the approach I showed with ORMs just as easily as with JDBC. If you create your own TopLinkCategoryDAO, use TopLink as normal, but instead of mapping subcategory, add a subcategory paged collection to the newly created objects. Maybe your domain model has a file name that references a file on disk. Why not make it into a lazy relationship? And if you're an ORM developer reading this article: please, the one-to-very-very-many relationship problem is real need, and you can solve it.
You might also be using technology like JavaSpaces, which as far as I know does not support an ORM-like approach for lazy loading relationships. My implementation of SpaceCategoryDAO shows an approach that can help you make your domain model more intelligent. Again, I also wish the vendors within this space started will start lazy loading more.
Hopefully, this article has made lazy loading seem less magical to you. Java's capabilities to create proxies to lazily load a rich domain model are surprisingly strong and easy to use.
Happy hacking!
Resources
Johannes Brodwall is currently lead Java architect at BBS, the company that
operates Norway's banking infrastructure