Skip to main content

Lazy Loading is Easy: Implementing a Rich Domain Model

July 18, 2006

{cs.r.title}






Lazy loading: An object that doesn't contain all of the data you need but knows how to get it. (Martin Fowler, Patterns of Enterprise Application Architecture)

I have recently explored the wonderful world of lazy loading. The original title of this article was going to be "Lazy Loading Made Easy." It originally started out as an exercise to try something complicated, but as it turns out, lazy loading already is easy.

First off: why would you want to do lazy loading? The most important reason is to be able to have a clean domain model. Consider the (pseudo) class Category:

public class Category {

    public Collection<Category> subcategories = new HashSet<Category>();

    public Category parent;

    public String name;

    public int creationYear;
}

With this class, I would be able to say stuff like:

Category category = dao.get(1);
System.out.println(category.parent.subcategories);

This should print all the siblings of category 1 (whatever that means). However, without lazy loading, I would have to do this:

Category category = dao.get(1);
Category parent = dao.get(category.parentId);
List children = new ArrayList();
for (Long childId : parent.subcategoryIds) {
    children.add(dao.get(childId));
}
System.out.println(children);

This might not look so bad in this simple example, but it adds up. In addition, look at all the places where we need the DAO! In effect, this rips the whole business logic out of the domain objects, as they normally cannot access the DAOs directly. If this code looks unpleasantly familiar, this article is for you.

In order to do the first, easy example efficiently, we have to support lazy loading.

Smoke Testing the DAO

I will start out by writing a simple test to demonstrate the implementation
of a basic DAO. We will then expand this DAO to support lazy loading.
We start with a simple test for saving and retrieving objects:

public class CategoryDaoTest extends TestCase {


    public void testSaveRetrieve() throws Exception {
        Category parent = new Category("parent", 2005);
        CategoryDao categoryDao = new SpaceCategoryDao();
        categoryDao.store(parent);

        Category saved = categoryDao.get(parent.getId());
        assertNotSame(saved, parent);
        assertEquals("name", saved.getName(), parent.getName());
    }


}

This only verifies
that saving and retrieving objects without lazy loading works
correctly. I have created two implementations of the
CategoryDAO: one for JDBC and one using a simple internal
structure (I call this a space, because it is vaguely inspired by href="http://java.sun.com/developer/technicalArticles/tools/JavaSpaces/">JavaSpaces).
I show this example using the simpler spaces structure. For the JDBC
implementation, see the downloadable source. Here are the
get() and store() methods of
SpaceCategoryDao:

public Category get(Long id) {
    if (!idToTuples.containsKey(id)) {
        return null;
    }
    Serializable[] tuple = (Serializable[]) idToTuples.get(id);
    return tupleToObject(tuple);
}
public Long store(Category category) {
    if (category == null) {
        return null;
    }
    if (idToTuples.containsKey(category.getId())) {
        return category.getId();
    }
    store(category.getParent());

    category.setId(new Long(nextId++));
    Serializable[] tuple = objectToTuple(category);
    idToTuples.put(category.getId(), tuple);

    for (Iterator iter = category.getSubcategories().iterator(); iter.hasNext();) {
        store((Category) iter.next());
    }
    return category.getId();
}

The objects are saved by using objectToTuple and restored by tupleToObject:

private Serializable[] objectToTuple(Category category) {
    Long parentId = category.getParent() != null ?
             category.getParent().getId() : null;
    return new Serializable[] {
            category.getId(), category.getName(),
            new Integer(category.getCreationYear()), parentId };
}


private Category tupleToObject(Serializable[] tuple) {
    Category category = new Category((String)tuple[1],
            ((Integer)tuple[2]).intValue());
    category.setId((Long) tuple[0]);
    return category;
}

We now have basic storing and loading of a single object implemented. Using test-driven development, I know that it basically works. We are now set up for solving the problem of lazy loading. First, I will show how we load the related objects without lazy loading, and then I will improve the implementation to load the parent relationship lazily.

Lazy Loading Interfaces

The code above does not deal with the relationships, but the test passes, so I write a new test:

public void testCompareParent() {
    Category saved = categoryDao.get(category.getId());
    assertNotNull("parent", saved.getParent());
    assertEquals("name", saved.getParent().getName(),
                 category.getParent().getName());
}

Note that I have am now putting the category as an instance variable in the test. This test fails, of course. Let's see how we can make it pass.

For a first attempt, we use an eager loading approach.

private Category tupleToObject(Serializable[] tuple) {
    Category category = new Category((String)tuple[1],
                                     (YearMonthDay) tuple[2]);
    category.setId((Long) tuple[0]);
    category.setParent(get((Long) tuple[3]));
    return category;
}

This works, but it doesn't do lazy loading. However, we can replace it by a dynamic proxy. In order to do so, I have to extract an interface (called CategoryItf) to the Category class that contains those methods that I want to use in the parent. This is a bit of a hassle, but as we shall see later, we can simplify this further. Here is the new implementation of tupleToObject

private Category tupleToObject(Serializable[] tuple) {
    Category category = new Category((String)tuple[1],
                                     (YearMonthDay) tuple[2]);
    category.setId((Long) tuple[0]);
    category.setParent(lazyGet((Long) tuple[3]));
    return category;
}

Okay, I am pulling your leg. Here is the implementation of lazyGet:

protected CategoryItf lazyGet(Long id) {
    if (id == null) {
        return null;
    }

    return (CategoryItf)Proxy.newProxyInstance(
                CategoryItf.class.getClassLoader(),
        new Class[] { CategoryItf.class },
        new LazyLoadedObject() {
            protected Object loadObject() {
                return get(id);
            }
        });
}

In order to understand this code, you have to understand dynamic proxies, which were introduced in Java 1.3. The java.lang.reflect.Proxy.newInstance() method will return a dynamically generated object that implements the interfaces given to the method call (in this case CategoryItf), and calls an invocation handler no matter what method is called on this interface. The code passes in an anonymous subclass of a custom class named LazyLoadedObject. Here is the LazyLoadedObject invocation handler:

public abstract class LazyLoadedObject
    implements InvocationHandler {

    private Object target;

    public Object invoke(Object proxy,
                         Method method, Object[] args)
                  throws Throwable {
        if (target == null) {
            target = loadObject();
        }
        return method.invoke(target, args);
    }

    protected abstract Object loadObject();
}

This makes the lazy loading work, but we have to implement an extra interface just for the dynamic proxy. Let's see if we can do it the way the persistence frameworks do it: with bytecode generation.







Lazy Loading Classes

In order to lazy load classes, we will need to use the cglib bytecode manipulation framework, or a similar library. cglib conveniently has an interface to be used for lazy loading, so this code is extremely short.

private Category lazyGet(final Long key) {
    if (key == null) {
        return null;
    }
    return (Category) Enhancer.create(
            Category.class, new LazyLoader() {
        public Object loadObject() throws Exception {
            return get(key);
        }
    });
}
Enhancer and LazyLoader are classes from cglib. So this code contains almost no code that we did not need to express our business problem: what to do when the parent reference is accessed.

I have now implemented the simple relationship between a category and its parent category using lazy loading. Next, let's look at collections.

Lazy Loading Collections

I still have to show you, gentle reader, how to implement lazy loading for collections. After that, I will show how to make sure that category == category.getSubcategory(0).getParent(). (This is called referential integrity.) After referential integrity is in place, we will examine how to create a paged lazily loaded collection: As we iterate through the collection, we will load and unload objects as they are needed.

Like before, we start off with a unit test. The unit test introduces some new test data, but the critical part looks like this:

public void testCompareChildren() {
    Category parentSaved = categoryDao.get(parent.getId());

    Set expectedNames =
        new HashSet(Arrays.asList(
            new String[] { "sibling", "category" }));
    Set actualNames = new HashSet();
    for (Iterator iter =
            parentSaved.getSubcategories().iterator();
            iter.hasNext();) {
        Category subcategory = (Category) iter.next();
        actualNames.add(subcategory.getName());
    }

    assertEquals("children", expectedNames, actualNames);
}

This bit of logic ends up comparing the names of the children of parent to the expected names of sibling and category. Again, it's good to start by implementing this without using lazy loading, like so:

private Category tupleToObject(Serializable[] tuple) {
    Category category =
        new Category((String)tuple[1], (YearMonthDay) tuple[2]);
    category.setId((Long) tuple[0]);
    category.setParent(lazyGet((Long) tuple[3]));
    category.setSubcategories(findByParentId(category.getId()));
    return category;
}

public Collection findByParentId(Long parentId) {
    ArrayList result = new ArrayList();
    for (Iterator iter =
            getChildIdListFor(parentId).iterator();
            iter.hasNext();) {
        Long element = (Long) iter.next();
        result.add(get(element));
    }
    return result;
}

You should try this out and make sure it works, before doing the lazy loading. The findByParentId method adds an index. We can update this index in the store method:

public Long store(Category category) {
    if (category == null) {
        return null;
    }
    if (idToTuples.containsKey(category.getId())) {
        return category.getId();
    }   
    store(category.getParent());

    category.setId(new Long(nextId++));
    Serializable[] tuple = objectToTuple(category);
    idToTuples.put(category.getId(), tuple);

    if (category.getParent() != null) {
        getChildIdListFor(category.getParent().getId()).
            add(category.getId());

    }

    for (Iterator iter =
            category.getSubcategories().iterator();
            iter.hasNext();) {
        store((Category) iter.next());
    }
    return category.getId();
}

private Collection getChildIdListFor(Long parentId) {
    if (!parentIdToIdList.containsKey(parentId)) {
        parentIdToIdList.put(parentId, new ArrayList());
    }
    return (Collection) parentIdToIdList.get(parentId);
}

This should be pretty simple if you're used to creating DAOs (except maybe for the index). Now, let's add lazy loading:

private Category tupleToObject(Serializable[] tuple) {
    Category category = new Category((String)tuple[1],
                                     (YearMonthDay) tuple[2]);
    category.setId((Long) tuple[0]);
    category.setParent(lazyGet((Long) tuple[3]));
    category.setSubcategories(lazyFindByParentId(category.getId()));
    return category;
}

private Collection lazyFindByParentId(final Long parentId) {
    LazyLoadedObject lazySubcategories = new LazyLoadedObject() {
        protected Object loadObject() {
            return findByParentId(parentId);
        }
    };
    return (Collection) Proxy.newProxyInstance(
            Collection.class.getClassLoader(),
            new Class[] { Collection.class },
            lazySubcategories);
}

That was simple enough. The LazyLoadedObject invocation handler I constructed for dealing with interfaces is a perfect match for what we're doing here. It's considered a good thing to use interfaces when dealing with collections, so I am pretty happy with this code. There is no need to do bytecode instrumentation, as with the lazily loaded simple relationship.

Referential Integrity

The following tests illustrate a problem with the current implementation:

public void testReferenceIntegrity() {
    Category saved1 = categoryDao.get(category.getId());
    Category saved2 = categoryDao.get(category.getId());
    assertSame("multiply loaded objects should be the same",
               saved1, saved2);
}

public void testCollectionReferenceIntegrity() {
    Category saved = categoryDao.get(category.getId());
    Category savedSub =
        (Category) saved.getSubcategories().iterator().next();
    assertSame(savedSub.getParent(), saved);
}

Both of these tests fail. This is a real problem: we have two copies of the parent Category. If we change one, they will get out of sync. It is important that the same object is returned in both cases.

In order to implement referential
integrity, we implement a session cache. This is a Map instance
variable on the DAO. Here are the updated store, get and lazyGet
methods:

public Long store(Category category) {
    if (category == null) {
        return null;
    }
    if (idToTuples.containsKey(category.getId())) {
        return category.getId();
    }
    store(category.getParent());

    category.setId(new Long(nextId++));
    Serializable[] tuple = objectToTuple(category);
    idToTuples.put(category.getId(), tuple);
    sessionCache.put(category.getId(), category);
    if (category.getParent() != null) {
        getChildIdListFor(category.getParent().getId()).
                add(category.getId());
    }

    for (Iterator iter = category.getSubcategories().iterator();
            iter.hasNext();) {
        store((Category) iter.next());
    }
    return category.getId();
}

public Category get(Long id) {
    if (!idToTuples.containsKey(id)) {
        return null;
    }


    if (!sessionCache.containsKey(id)) {
        Serializable[] tuple = (Serializable[]) idToTuples.get(id);
        sessionCache.put(id, tupleToObject(tuple));
    }
    return (Category) sessionCache.get(id);


}

private Category lazyGet(final Long key) {
    if (key == null) {
        return null;
    }


    if (sessionCache.containsKey(key)) {
        return (Category) sessionCache.get(key);
    }


    return (Category) Enhancer.create(Category.class,
        new LazyLoader() {
        public Object loadObject() throws Exception {
            return get(key);
        }
    });
}

The tests pass, and we're done.







Implementing a Lazy JDBC DAO

As I have created the SpacesCategoryDAO, I have been creating an CategoryDAO interface:

/**
* Save and retrieve Categories.
*/
public interface CategoryDao {

    /**
     * After this is called, get with the return
     * value should always return an object
     * identical to the argument.
     */
    Long store(Category category);

    /**
     * Returns a Category with the argument id.
     */
    Category get(Long id);


    /**
     * Returns a subset of the subcategories of
     *  the category with the specified id.
     */
    Collection findByParentId(Long parentId,
            int offset, int length);

    /**
     * Returns the count of subcategories of the
     * category with the specified id.
     */
    int countByParent(Long parentId);

    /**
     * After this is called, the schema should
     * be created in the underlying data store
     *  (if appropriate).
     */
    void initialize();

    /**
     * After this is called, no previously
     * constructed object shall be returned from
     * get. */
    void clearSessionCache();

}

I will not cover implementing this interface for JDBC in detail, but I the downloadable source includes the source for JdbcCategoryDao. It is quite prosaic. I will use this interface to implement paged lazy loading in the final part of the article.

Paged Lazy Loading

I have showed that creating a lazily loaded collection is just as easy as creating a lazily loaded relationship. In order to make lazy loading work well, I have implemented a session cache for storing objects. This introduces a few issues with regard to performance: if we have too many objects in the cache, we can run out of memory. This is normally mostly an issue for one-to-really-many collections, and most object-relation mapping tools ignore it beyond providing a way to remove objects from the cache. This is to support what I would like to call one-to-very-very-many relationships.

There are several issues with one-to-very-very-many relationships. One problem that can be solved is that of memory consumption. In the systems I am currently working on, we have one-to-many relationships that can contain tens of thousands to a million objects. Loading all of these up when the first is needed might not be what you want. In order to fix this problem, I will use a strategy of paging: only a subset of the objects will be held in memory at the same time. For the point of illustration, I will use a page size of 5 (which is pointless in the real world, but it makes it easier to illustrate the technique).

public void testPagedObjects() {
    Category saved = categoryDao.get(largeCategory.getId());
    Collection savedChildren = saved.getSubcategories();
    Iterator iter = savedChildren.iterator();
    assertLoadedCount(0, largeCategory.getSubcategories());
    assertEquals("child[0].name",
        "subcategory 0",
        ((Category)iter.next()).getName());
    assertLoadedCount(5, largeCategory.getSubcategories());
}

In this code, largeCategory is a test Category object with 20 subcategories (initialized in the Test setUp). The method assertLoadedCount verifies that exactly the specified number of objects from the collection have been loaded.

private void assertLoadedCount(int expectedLoaded,
                               Collection subcategories) {
    int actuallyLoaded = 0;
    for (Iterator iter = subcategories.iterator(); iter.hasNext();) {
        Category element = (Category) iter.next();
        if (categoryDao.isLoaded(element.getId())) {
            actuallyLoaded++;
        }
    }
    assertEquals("loaded subcategories",
                 expectedLoaded, actuallyLoaded);
}

In order to support this test, I have expanded the CategoryDAO interface with the method boolean CategoryDAO.isLoaded(Category) to check whether an object has been inserted in the session cache. This method can also be used to improve the other lazy-loading tests.

Here's a paged implementation for the SpaceCategoryDAO:

protected Collection lazyFindByParentId(final Long parentId) {
    return new PagedLazyCollection(countByParent(parentId)) {
        public Collection getSubCollection(int offset, int pageSize) {
            return findByParentId(parentId, offset, pageSize);
        }
    };
}

I have introduced several new concepts. First, CategoryDAO now supports a countByParentId method, so we can implement java.util.Collection.size without loading all of the objects. Second, CategoryDAO.findByParentId has to be able to support paging. The implementation of these methods is trivial, and in the SpaceCategoryDAO doesn't really work. So instead, I will show you a typical JDBC implementation:

public int countByParent(Long parentId) {
    String sql = "SELECT count(*) FROM category WHERE parent = ?";
    return getJdbcTemplate().queryForInt(sql, new Object[] { parentId });
}

public Collection findByParentId(Long parentId, int offset, int length) {
    return getJdbcTemplate().query(
        "SELECT * FROM category WHERE parent = ? LIMIT ? OFFSET ?",
        new Object[] { parentId, new Integer(length), new Integer(offset) },
        new RowMapper() {
        // code to create a Category from a Recordset row omitted for brevity
        });
}

Finally, I have introduced a PagedLazyCollection class:

public abstract class PagedLazyCollection extends AbstractCollection {

    private class PagedLazyCollectionIterator implements Iterator {
        // ....
    }

    private final int size;

    public PagedLazyCollection(int size) {
        this.size = size;
    }

    public Iterator iterator() {
        return new PagedLazyCollectionIterator();
    }

    public int size() {
        return size;
    }

    public abstract Collection loadPage(int offset, int pageSize);

}

All the magic happens in the PagedLazyInterceptor:

private class PagedLazyCollectionIterator implements Iterator {

    private final int pageSize = 5;

    private int offset = 0;

    // Trick to avoid null checks
    private Collection currentCollecton = new ArrayList();

    private Iterator currentIterator = currentCollecton.iterator();

    public boolean hasNext() {
        boolean onLastPage = offset >= size;
        return currentIterator.hasNext() || !onLastPage;
    }

    public Object next() {
        if (!currentIterator.hasNext()) {
            nextIterator();
        }
        return currentIterator.next();
    }


    private void nextIterator() {
        currentCollecton = loadPage(offset, pageSize);
        offset += currentCollecton.size();
        this.currentIterator = currentCollecton.iterator();
    }

    public void remove() {
        throw new UnsupportedOperationException("remove not implemented");
    }

}

The interesting bit is in PagedLazyCollectionIterator.nextIterator, where we load the next page. This code should be sufficient to get the test to pass.







Unloading Pages

This code, as it stands, will not help much with preserving memory. Even though we wait with reading a page until it is needed, we never throw pages out again! Here's a test that illustrates the problem:

public void testPagedCollectionPagesOutUnusedObjects() {
    Category saved = categoryDao.get(largeCategory.getId());
    Collection savedChildren = saved.getSubcategories();
    Iterator iter = savedChildren.iterator();
    while (iter.hasNext()) iter.next();

    assertLoadedCount(5, largeCategory.getSubcategories());

}

This test will fail, as the whole set of subcategories of largeCategory will be loaded into the session cache at this point. In order to fix it, the Iterator has to be able to unload pages as well:

public abstract class PagedLazyCollection extends AbstractCollection {

    private class PagedLazyCollectionIterator implements Iterator {
        // ....

        private void nextIterator() {
            unloadPage(currentCollecton);
            // ....
        }

        // ...
    }
   
    public abstract void unloadPage(Collection collection);
}

So SpaceCategoryDAO.lazyFindByParentId has to be expanded to tell the collection how to unload objects:

protected Collection lazyFindByParentId(final Long parentId) {
    return new PagedLazyCollection(countByParent(parentId)) {
        public Collection loadPage(int offset, int pageSize) {
            return findByParentId(parentId, offset, pageSize);
        }
        public void unloadPage(Collection collection) {
            unloadFromSessionCache(collection);
        }
    };
}
SpaceCategoryDAO.unloadFromSessionCache could hardly be easier:

public void unloadFromSessionCache(Collection categories) {
    for (Iterator iter = categories.iterator(); iter.hasNext();) {
        Category element = (Category) iter.next();
        sessionCache.remove(element.getId());
    }
}

At this point, we have actually implemented a lazily loaded collection that pages objects in and out as we iterate over it. The amount of code needed is surprisingly small.

Limitations

The code as I have shown has several quite severe limitations. I will explain the strategy for solving these limitations, but if you want to see a full implementation, you have to nag me to write a follow-up article.

  • Adding objects to the collection: If we add or remove objects from Category.getSubcategories, the changes will have inconsistent effects. First, the modification will not happen through the category at all, but only by modifying the parent category of the subcategories in question. This will be reflected when we search for new pages in the collection. If the modification changes something having to do with the paging, things will be bad. We might get the wrong objects in the next page. The simplest way to solve modifications to the collection involve introducing constraints. For example: if we could introduce an immutable listIndex field in the subcategory, we would know that all new subcategories will be added at the end.
  • Removing objects from the collection:
    Supporting remove can be even simpler: just keep a collection of removed objects in memory, and skip these objects when iterating. Neither of these strategies will work perfectly in all situations. For example, if you have an operation that will remove most of the collection, keeping the removed objects in memory is not such a good idea. In the simple case, we can ignore this problem. In the not-so-simple cases, more research is needed.
  • Reference integrity: If an object in a subcategory collection is also used from other places in the code, it will accidentally be thrown out of the session cache when the collection iterates over it. This is a bona fide bug, but it is actually not too hard to solve (use two types of session cache), and the code is pretty uninteresting, so I will leave that as an exercise for the reader, if you really need it. Or you can nag me to fix it for you.
  • Intelligent iteration: There are several things we might want to do with the collection in order to use it better. For example, we really need to be able to skip when iterating. Sadly, neither Iterator nor ListIterator supports this. To avoid expanding the standard Java collection interfaces, I have not implemented my own skip method. Doing so is easy, though. Alternatively, we could implement lazy behavior in PagedLazyIterator.next.
  • Sorting and subqueries: If you want to sort the subcategories or search for a subset of them, the current structure does not support this well. There are two good approaches to follow. The most straightforward approach is to just use custom methods on the DAO (like findByParentIdAndNameOrderedByName()) and return paged collections. In this case, we have abandoned the domain model. If you want to keep the domain model, you need to use something more intelligent than SQL to query your objects. Hibernate criteria or TopLink expressions provide object libraries for queries. Use decorators on top of the collection to modify the criteria or expressions.

Conclusion

Implementing a lazily loaded, rich domain model isn't just possible, it is practically feasible. Even really big collections can be dealt with. The simple case is surprisingly simple, and depending upon your requirements, even the complex cases are manageable.

In this article, I have showed how you can construct a lazily loaded domain model that will support reference integrity, collections, and paged loading of very large collections. This is surprisingly easy to do. So the question is: should we abandon third party object-relation mapping tools like Hibernate, TopLink, and JDO and just write our domain model ourselves?

Of course not. If you look at the full JDBC implementation in the downloadable source, you can see that there are a lot of concerns I had to deal with that an ORM would do for me. Creating the queries and mapping the results to the objects is a tedious and error-prone job, and I don't recommend doing it yourself, even with a JDBC framework like Spring-JDBC. Secondly, even though there isn't that much code involved in writing a lazily loaded collection, it is easy enough to get it wrong. Leaving the job to a third party is a good idea. I trust Ted Neward when he says that ORM is the Vietnam of computer science.

However, the ORMs available today do not even attempt to solve the one-to-very-very-many relationship. You can use the approach I showed with ORMs just as easily as with JDBC. If you create your own TopLinkCategoryDAO, use TopLink as normal, but instead of mapping subcategory, add a subcategory paged collection to the newly created objects. Maybe your domain model has a file name that references a file on disk. Why not make it into a lazy relationship? And if you're an ORM developer reading this article: please, the one-to-very-very-many relationship problem is real need, and you can solve it.

You might also be using technology like JavaSpaces, which as far as I know does not support an ORM-like approach for lazy loading relationships. My implementation of SpaceCategoryDAO shows an approach that can help you make your domain model more intelligent. Again, I also wish the vendors within this space started will start lazy loading more.

Hopefully, this article has made lazy loading seem less magical to you. Java's capabilities to create proxies to lazily load a rich domain model are surprisingly strong and easy to use.

Happy hacking!

Resources

width="1" height="1" border="0" alt=" " />
Johannes Brodwall is currently lead Java architect at BBS, the company that operates Norway's banking infrastructure
Related Topics >> Programming   |