The Source for Java Technology Collaboration
User: Password:



   

An Open Source Database Benchmark An Open Source Database Benchmark

by Rick Grehan
06/14/2005

Contents
The Architecture
More Under The Hood
Conclusion
Resources

PolePosition is an open source Java framework for benchmarking databases. The impetus behind PolePosition came from the observation that developers evaluating candidate databases for future applications often resorted to constructing ad hoc benchmarks rather than using "canned" benchmark tests (or relying on vendor-provided data). This is entirely understandable; to properly evaluate a database for a specific project, you would want to exercise that database in ways that correspond to the application's use of it. Put another way, if the target application will use the database in read-only fashion, you'll have little interest in a benchmark that runs the database through write operations.

PolePosition was designed with just such people in mind. Using the metaphor of a series of automobile race courses ("Circuits"), PolePosition provides a structure that simplifies the three primary tasks that a database benchmark developer might face: building the tests, adding database drivers, and reporting results.

The current version of PolePosition (which requires J2SE 5.0 support) includes six circuits, each exercising a different mix of tests and named after a different city well-known to car-racing enthusiasts. For example, the Bahrain circuit performs write, query, update, and delete operations on a database filled with simple (lightly structured) objects. The Sepang circuit, by comparison, writes, reads, and then deletes a complex object tree.

In addition, the current version of PolePosition already provides drivers for four well-known database technologies:

  • Hibernate: The object-relational persistence system.
  • JDBC: The current PolePosition assumes MySQL as the relational database behind the JDBC implementation.
  • db4o: A pure object database from db4objects, Inc.
  • JDO: The Java Data Objects persistence architecture.

Finally, PolePosition delivers excellent report generation features. Thanks largely to its abundant use of open source libraries, PolePosition's results can be displayed in tabular form, with a line graph attached, and the whole thing packaged into HTML or PDF files. Developers interested in just the numbers-- perhaps to feed through additional analysis--can access a tab-delimited file of results.

The Architecture

As mentioned earlier, PolePosition uses a race course metaphor, so the classes of interest to developers fall into two broad categories: those that support the creation of new tests, and those that support the addition of new database drivers. In this article, we'll concentrate on those classes and interfaces that a developer will most likely deal with directly in order to implement a new circuit.

Because the portion of the framework that manages the creation of tests is considerably simpler than the portion that supports database drivers, we'll begin with the former.

All tests are kept in the org.polepos.circuits.* package, and each circuit must define a class and an interface. For this illustration, we're going to pretend that we're creating a new circuit (benchmark) called Laconia. The Laconia test will exercise a database's ability to read large, unstructured byte[] arrays, as might happen if a database is being used as a repository for images.

Here's our implementation of the Laconia class:


package org.polepos.circuits.laconia;
import org.polepos.framework.*;

public class Laconia extends Circuit {
        
  @Override
  public String description() {
    return "reads large byte arrays of various sizes";
  }
        
  @Override
  protected void addLaps() {
    add(new Lap("populate",false,false));
    add(new Lap("read"));
  }

  @Override
  public Class requiredDriver() {
    return LaconiaDriver.class;
  }
}

The duty of the description() method is self-explanatory. The addLaps() method assembles (in order) the individual tests that take place within a circuit. Notice that the laps are identified by strings. Each string is a method name; a driver that executes the Laconia race course must implement a populate() method and a read() method.

You'll also notice that we used a different constructor for the two Lap objects. The first constructor, which adds the populate() method to the laps, illustrates a clever feature of PolePosition. While the first false in the constructor identifies this as a Lap that must be done with a "fresh" (newly created) database--which is immaterial here because a new database will always be created for the first Lap--the second false tells PolePosition not to record the results of this Lap. This allows us to use this first Lap to set up the database (in this case, populate it with objects to read) and not have this set-up time counted against the database in the final score.

Finally, the requiredDriver() method returns the reference to the class that must be used to run the given circuit. When a race is being run, PolePosition automatically matches driver to circuit.

The next bit of code we have to construct is an interface that specifies the methods that a driver on the Laconia circuit must implement. That interface appears below. Note that its methods correspond to the lap names identified in addLaps() method of the Laconia class.


package org.polepos.circuits.laconia;

public interface LaconiaDriver {
  void populate();
  void read();
}

This rounds out the files we need to add to the Laconia package, and our circuit has, for the most part, been built. We need to conjure up a driver for the Laconia circuit, but before we can do that, we must define the structure of the data that a LaconiaDriver will be expected to handle.

Data objects that will be manipulated by candidate databases reside in the org.polepos.data package. We'll create a simple Picture class:


package org.polepos.data;
import org.polepos.framework.*;

public class Picture implements CheckSummable {

  private int mIDnum;      // Picture ID number
  private byte[] mContent; // Picture content
        
  /*
  * Default constructor
  */
  public Picture()
  { }
        
  /*
  * Create a new instance of Picture
  */
  public Picture(int IDnum, byte[] content)
  {  this.mIDnum = IDnum;
     this.mContent = content;
  }
        
  // Getters and Setters follow
  public int getIDnum()
  { return mIDnum; }
        
  public byte[] getContent()
  { return mContent; }
        
  public void setIDnum(int IDnum)
  { mIDnum = IDnum; }
        
  public void setContent(byte[]content)
  { mContent = content; }

  /*
  * Checksum used to verify objects
  */    
  public long checkSum() {
  // We'll return the ID number as the checksum
    return mIDnum;
  }
}

The structure of a Picture object is straightforward, including only an integer ID number (that we will ensure is unique when we implement LaconiaDriver) and a byte array. There's also a checkSum() method specified in the CheckSummable interface, which we provide simply by returning the ID number. PolePosition can use checksums to verify that a race has been properly run, but we won't be employing checksums in our example.

At this point, we are ready to build a Driver capable of running the Laconia race. In keeping with the racecourse metaphor, each database provides a Team of drivers, one Driver each for the circuits that the database is able to "race on." All of the drivers for a given database are kept in the org.polepos.teams.database-name package and, by convention, the class that provides the implementation of a given database for given circuit is named CircuitnameDatabasename.class.

So, in the teams\db4o folder, we'll define the LaconiaDb4o class:


package org.polepos.teams.db4o;

import java.util.*;
import org.polepos.circuits.laconia.*;
import org.polepos.data.*;
import com.db4o.*;
import com.db4o.query.*;

public class LaconiaDb4o extends Db4oDriver
  implements LaconiaDriver {
        
  /* 
  * Implementation of the first lap. Put objects
  * in database.
  * NOT scored.
  */
  public void populate() {
                
    // Retrieve object count and object size
    int numobjects = setup().getObjectCount();
    int sizearray = setup().getObjectSize();

    // Fill value
    byte fillval = (byte)'x';
                
    // Loop through the object count. 
    // Instantiate a Picture object of the
    //  appropriate size. Load up its mContent
    //  field, set its ID field to the increment,
    //  and store it.
    begin();
    for(int i=1; i<numobjects; i++)
    {
      byte[] barray = new byte[sizearray];
      Arrays.fill(barray,fillval);
      Picture p = new Picture(i,barray);
      store(p);
    }
    commit();   // Make sure everything's written
  }
        
  /*
  * Implementation of the second lap.
  * Read the objects.
  * Scored.
  */
  public void read() {
                
    // Read all the objects of type Picture
    readExtent(Picture.class);
  }
}

As shown above, populate() reads the property values for the number of objects and the size of the byte array, instantiates each object, fills its array, and stores it in the database. These property values are read from the Circuits.properties file (which we will show in a moment) and are provided automatically by the framework. The db4o store() method--with a single argument of the object to be persisted--is all that we need to put an object in the db4o database. And store() will traverse the object tree automatically, so the byte[] array member of the Picture object is also persisted.

The read() method is even simpler. LaconiaDb4o extends Db4oDriver (already in the framework), which provides fundamental operations likely to be used by all implemented drivers for db4o. One of those operations is readExtent(), which retrieves all of the persisted objects of the specified class.

We're almost done; we just need to specify the properties of each turn through the Laconia circuit. PolePosition gathers its properties files in the settings folder, and the particular file of interest is Circuits.properties. We add the following lines to that file:


# laconia
#
# [objects]: number of objects to create/read
# [size]: size of the byte[] array

laconia.size=200000,400000,600000,800000,1000000,1200000
laconia.objects=20,20,20,20,20,20

As you can see, there will be six turns through the Laconia circuit. Each turn will create the same number of objects, but of increasing size.

Since we've only defined a LaconiaDb4o driver, we should go to the RunSeason.java file and comment out the other team and circuit members of the Team[] and Circuit[] arrays. This is not entirely necessary, because PolePosition automatically matches drivers to circuits, and will discover no Laconia-capable drivers for the other databases. But if we don't, we'll have to sit through a number of races just to see the Laconia output.

The result of the Laconia run appears in Figure 1. Though the chart shows a single line, we can at least see an interesting phenomenon that suggests that db4o improves its scaling at somewhere around 600k object sizes. On either side of that point, the scaling appears to be almost linear, but to the right of the point, the slope is noticeably reduced.

Figure 1
Figure 1. Result of Laconia run (click for full-size image)

More Under The Hood

There's more to PolePosition than we've shown here, mainly because the framework is so well put together that there's little or no fiddling required to implement a circuit and a driver. For example, if you want to add a brand new database to the mix, you'll have to extend the abstract Car class (after all, a driver has got to have a car).

The base Car class is so simple that it's easier to show than to explain.


package org.polepos.framework;

public abstract class Car{
  protected String mWebsite;
  protected String mDescription;
    
  public abstract String name();

  public String getWebsite(){
    return mWebsite;
  }

  public String description(){
    return mDescription;
  }
}

A database will extend this class, adding methods that initialize the database system prior to running a turn around the circuit, as well as methods that "clean up" after a race has been run. Because Db4oCar had already been defined for us, we didn't have to include any code to create our database before the race (or delete it afterward), in our LaconiaDb4o class.

Conclusion

We hope that there will be much more about PolePosition than can be shown in any single article, because people will have contributed to and extended the benchmark. As already stated, PolePosition includes a limited number of database interfaces at this point. New database back ends would certainly be useful, not only so that database performance could be compared, but so that developers interested in learning a new database API can look into the PolePosition source and see how an equivalent operation is performed in database A as compared to database B. In addition, the framework as it stands now anticipates only a single-user model. Implementing some sort of multi-threaded test to simulate multiple users would be exceedingly difficult without extensive modifications to the framework. But if that could be done, PolePosition could provide enterprise-level testing.

Whatever work is involved, there are significant benefits to be had from an open source benchmark framework that is easy to extend, while at the same time providing enough control so that the results can be compared across database technologies.

Gentlemen, start your engines. Please.

Resources

Rick Grehan is currently a QA engineer for Compuware's Nashua Lab.

View all java.net Articles.

 Feed java.net RSS Feeds