Skip to main content

An Open Source Database Benchmark

June 14, 2005

{cs.r.title}









Contents
The Architecture
More Under The Hood
Conclusion
Resources

PolePosition is an
open source Java framework for benchmarking databases. The impetus
behind PolePosition came from the observation that developers
evaluating candidate databases for future applications often
resorted to constructing ad hoc benchmarks rather than using
"canned" benchmark tests (or relying on vendor-provided data). This
is entirely understandable; to properly evaluate a database for a
specific project, you would want to exercise that database in ways
that correspond to the application's use of it. Put another way, if
the target application will use the database in read-only fashion,
you'll have little interest in a benchmark that runs the database
through write operations.

PolePosition was designed with just such people in mind. Using
the metaphor of a series of automobile race courses ("Circuits"),
PolePosition provides a structure that simplifies the three primary
tasks that a database benchmark developer might face: building the
tests, adding database drivers, and reporting results.

The current version of PolePosition (which requires J2SE 5.0
support) includes six circuits, each exercising a different mix of
tests and named after a different city well-known to car-racing
enthusiasts. For example, the Bahrain circuit performs write,
query, update, and delete operations on a database filled with
simple (lightly structured) objects. The Sepang circuit, by
comparison, writes, reads, and then deletes a complex object
tree.

In addition, the current version of PolePosition already
provides drivers for four well-known database technologies:

  • Hibernate: The
    object-relational persistence system.
  • JDBC: The
    current PolePosition assumes MySQL as the relational database
    behind the JDBC implementation.
  • db4o: A pure object
    database from db4objects, Inc.
  • JDO: The Java Data
    Objects persistence architecture.

Finally, PolePosition delivers excellent report generation
features. Thanks largely to its abundant use of open source
libraries, PolePosition's results can be displayed in tabular form,
with a line graph attached, and the whole thing packaged into
HTML or PDF files. Developers interested in just the numbers--
perhaps to feed through additional analysis--can access a
tab-delimited file of results.

The Architecture

As mentioned earlier, PolePosition uses a race course metaphor,
so the classes of interest to developers fall into two broad
categories: those that support the creation of new tests, and those
that support the addition of new database drivers. In this article,
we'll concentrate on those classes and interfaces that a developer
will most likely deal with directly in order to implement a new
circuit.

Because the portion of the framework that manages the creation
of tests is considerably simpler than the portion that supports
database drivers, we'll begin with the former.

All tests are kept in the org.polepos.circuits.*
package, and each circuit must define a class and an interface. For
this illustration, we're going to pretend that we're creating a new
circuit (benchmark) called Laconia. The Laconia test will
exercise a database's ability to read large, unstructured
byte[] arrays, as might happen if a database is being
used as a repository for images.

Here's our implementation of the Laconia class:

package org.polepos.circuits.laconia;
import org.polepos.framework.*;

public class Laconia extends Circuit {
       
  @Override
  public String description() {
    return "reads large byte arrays of various sizes";
  }
       
  @Override
  protected void addLaps() {
    add(new Lap("populate",false,false));
    add(new Lap("read"));
  }

  @Override
  public Class requiredDriver() {
    return LaconiaDriver.class;
  }
}

The duty of the description() method is
self-explanatory. The addLaps() method assembles (in
order) the individual tests that take place within a circuit.
Notice that the laps are identified by strings. Each string is a
method name; a driver that executes the Laconia race course
must implement a populate() method and a
read() method.

You'll also notice that we used a different constructor for the
two Lap objects. The first constructor, which adds the
populate() method to the laps, illustrates a clever
feature of PolePosition. While the first false in the constructor
identifies this as a Lap that must be done with a
"fresh" (newly created) database--which is immaterial here
because a new database will always be created for the first
Lap--the second false tells PolePosition
not to record the results of this Lap. This
allows us to use this first Lap to set up the database
(in this case, populate it with objects to read) and not have this
set-up time counted against the database in the final score.

Finally, the requiredDriver() method returns the
reference to the class that must be used to run the given circuit.
When a race is being run, PolePosition automatically matches driver
to circuit.

The next bit of code we have to construct is an interface that
specifies the methods that a driver on the Laconia circuit must
implement. That interface appears below. Note that its methods
correspond to the lap names identified in addLaps()
method of the Laconia class.

package org.polepos.circuits.laconia;

public interface LaconiaDriver {
  void populate();
  void read();
}

This rounds out the files we need to add to the
Laconia package, and our circuit has, for the most
part, been built. We need to conjure up a driver for the
Laconia circuit, but before we can do that, we must define the
structure of the data that a LaconiaDriver will be
expected to handle.

Data objects that will be manipulated by candidate databases
reside in the org.polepos.data package. We'll create a
simple Picture class:


package org.polepos.data;
import org.polepos.framework.*;

public class Picture implements CheckSummable {

  private int mIDnum;      // Picture ID number
  private byte[] mContent; // Picture content
        
  /*
  * Default constructor
  */
  public Picture()
  { }
        
  /*
  * Create a new instance of Picture
  */
  public Picture(int IDnum, byte[] content)
  {  this.mIDnum = IDnum;
     this.mContent = content;
  }
        
  // Getters and Setters follow
  public int getIDnum()
  { return mIDnum; }
        
  public byte[] getContent()
  { return mContent; }
        
  public void setIDnum(int IDnum)
  { mIDnum = IDnum; }
        
  public void setContent(byte[]content)
  { mContent = content; }

  /*
  * Checksum used to verify objects
  */    
  public long checkSum() {
  // We'll return the ID number as the checksum
    return mIDnum;
  }
}

The structure of a Picture object is
straightforward, including only an integer ID number (that we will
ensure is unique when we implement LaconiaDriver) and
a byte array. There's also a checkSum() method
specified in the CheckSummable interface, which we
provide simply by returning the ID number. PolePosition can use
checksums to verify that a race has been properly run, but we won't
be employing checksums in our example.

At this point, we are ready to build a Driver
capable of running the Laconia race. In keeping with the racecourse
metaphor, each database provides a Team of drivers,
one Driver each for the circuits that the database is
able to "race on." All of the drivers for a given database are kept in
the org.polepos.teams.database-name package
and, by convention, the class that provides the implementation of a
given database for given circuit is named
CircuitnameDatabasename.class.

So, in the teams\db4o folder, we'll define the
LaconiaDb4o class:


package org.polepos.teams.db4o;

import java.util.*;
import org.polepos.circuits.laconia.*;
import org.polepos.data.*;
import com.db4o.*;
import com.db4o.query.*;

public class LaconiaDb4o extends Db4oDriver
  implements LaconiaDriver {
        
  /* 
  * Implementation of the first lap. Put objects
  * in database.
  * NOT scored.
  */
  public void populate() {
                
    // Retrieve object count and object size
    int numobjects = setup().getObjectCount();
    int sizearray = setup().getObjectSize();

    // Fill value
    byte fillval = (byte)'x';
                
    // Loop through the object count. 
    // Instantiate a Picture object of the
    //  appropriate size. Load up its mContent
    //  field, set its ID field to the increment,
    //  and store it.
    begin();
    for(int i=1; i<numobjects; i++)
    {
      byte[] barray = new byte[sizearray];
      Arrays.fill(barray,fillval);
      Picture p = new Picture(i,barray);
      store(p);
    }
    commit();   // Make sure everything's written
  }
        
  /*
  * Implementation of the second lap.
  * Read the objects.
  * Scored.
  */
  public void read() {
                
    // Read all the objects of type Picture
    readExtent(Picture.class);
  }
}

As shown above, populate() reads the property
values for the number of objects and the size of the byte array,
instantiates each object, fills its array, and stores it in the
database. These property values are read from the
Circuits.properties file (which we will show in a
moment) and are provided automatically by the framework. The db4o
store() method--with a single argument of the object
to be persisted--is all that we need to put an object in the db4o
database. And store() will traverse the object tree
automatically, so the byte[] array member of the
Picture object is also persisted.

The read() method is even simpler.
LaconiaDb4o extends Db4oDriver (already
in the framework), which provides fundamental operations likely to
be used by all implemented drivers for db4o. One of those
operations is readExtent(), which retrieves all of the
persisted objects of the specified class.

We're almost done; we just need to specify the properties of
each turn through the Laconia circuit. PolePosition gathers its
properties files in the settings folder, and the particular file of
interest is Circuits.properties. We add the following
lines to that file:

# laconia
#
# [objects]: number of objects to create/read
# [size]: size of the byte[] array

laconia.size=200000,400000,600000,800000,1000000,1200000
laconia.objects=20,20,20,20,20,20

As you can see, there will be six turns through the Laconia
circuit. Each turn will create the same number of objects, but of
increasing size.

Since we've only defined a LaconiaDb4o driver, we
should go to the RunSeason.java file and comment out
the other team and circuit members of the Team[] and
Circuit[] arrays. This is not entirely necessary,
because PolePosition automatically matches drivers to circuits,
and will discover no Laconia-capable drivers for the other
databases. But if we don't, we'll have to sit through a number of
races just to see the Laconia output.

The result of the Laconia run appears in Figure 1. Though the
chart shows a single line, we can at least see an interesting
phenomenon that suggests that db4o improves its scaling at
somewhere around 600k object sizes. On either side of that point,
the scaling appears to be almost linear, but to the right of the
point, the slope is noticeably reduced.

Figure 1
Figure 1. Result of Laconia run (click for full-size image)

More Under The Hood

There's more to PolePosition than we've shown here, mainly
because the framework is so well put together that there's little
or no fiddling required to implement a circuit and a driver. For
example, if you want to add a brand new database to the mix, you'll
have to extend the abstract Car class (after all, a
driver has got to have a car).

The base Car class is so simple that it's easier to show
than to explain.

package org.polepos.framework;

public abstract class Car{
  protected String mWebsite;
  protected String mDescription;
   
  public abstract String name();

  public String getWebsite(){
    return mWebsite;
  }

  public String description(){
    return mDescription;
  }
}

A database will extend this class, adding methods that
initialize the database system prior to running a turn around the
circuit, as well as methods that "clean up" after a race has been
run. Because Db4oCar had already been defined for us,
we didn't have to include any code to create our database before
the race (or delete it afterward), in our LaconiaDb4o
class.

Conclusion

We hope that there will be much more about PolePosition than
can be shown in any single article, because people will have
contributed to and extended the benchmark. As already stated, PolePosition
includes a limited number of database interfaces at this point. New
database back ends would certainly be useful, not only so that
database performance could be compared, but so that developers
interested in learning a new database API can look into the
PolePosition source and see how an equivalent operation is
performed in database A as compared to database B. In addition, the
framework as it stands now anticipates only a single-user model.
Implementing some sort of multi-threaded test to simulate multiple
users would be exceedingly difficult without extensive
modifications to the framework. But if that could be done,
PolePosition could provide enterprise-level testing.

Whatever work is involved, there are significant benefits to be
had from an open source benchmark framework that is easy to extend,
while at the same time providing enough control so that the results
can be compared across database technologies.

Gentlemen, start your engines. Please.

Resources

width="1" height="1" border="0" alt=" " />
Rick Grehan is currently a QA engineer for Compuware's Nashua Lab.
Related Topics >> Databases   |