Search |
|||||||||||
An Open Source Database Benchmark
Tue, 2005-06-14
|
|||||||||||
| |||||
PolePosition is an open source Java framework for benchmarking databases. The impetus behind PolePosition came from the observation that developers evaluating candidate databases for future applications often resorted to constructing ad hoc benchmarks rather than using "canned" benchmark tests (or relying on vendor-provided data). This is entirely understandable; to properly evaluate a database for a specific project, you would want to exercise that database in ways that correspond to the application's use of it. Put another way, if the target application will use the database in read-only fashion, you'll have little interest in a benchmark that runs the database through write operations.
PolePosition was designed with just such people in mind. Using the metaphor of a series of automobile race courses ("Circuits"), PolePosition provides a structure that simplifies the three primary tasks that a database benchmark developer might face: building the tests, adding database drivers, and reporting results.
The current version of PolePosition (which requires J2SE 5.0 support) includes six circuits, each exercising a different mix of tests and named after a different city well-known to car-racing enthusiasts. For example, the Bahrain circuit performs write, query, update, and delete operations on a database filled with simple (lightly structured) objects. The Sepang circuit, by comparison, writes, reads, and then deletes a complex object tree.
In addition, the current version of PolePosition already provides drivers for four well-known database technologies:
Finally, PolePosition delivers excellent report generation features. Thanks largely to its abundant use of open source libraries, PolePosition's results can be displayed in tabular form, with a line graph attached, and the whole thing packaged into HTML or PDF files. Developers interested in just the numbers-- perhaps to feed through additional analysis--can access a tab-delimited file of results.
As mentioned earlier, PolePosition uses a race course metaphor, so the classes of interest to developers fall into two broad categories: those that support the creation of new tests, and those that support the addition of new database drivers. In this article, we'll concentrate on those classes and interfaces that a developer will most likely deal with directly in order to implement a new circuit.
Because the portion of the framework that manages the creation of tests is considerably simpler than the portion that supports database drivers, we'll begin with the former.
All tests are kept in the org.polepos.circuits.*
package, and each circuit must define a class and an interface. For
this illustration, we're going to pretend that we're creating a new
circuit (benchmark) called Laconia. The Laconia test will
exercise a database's ability to read large, unstructured
byte[] arrays, as might happen if a database is being
used as a repository for images.
Here's our implementation of the Laconia class:
package org.polepos.circuits.laconia;
import org.polepos.framework.*;
public class Laconia extends Circuit {
@Override
public String description() {
return "reads large byte arrays of various sizes";
}
@Override
protected void addLaps() {
add(new Lap("populate",false,false));
add(new Lap("read"));
}
@Override
public Class requiredDriver() {
return LaconiaDriver.class;
}
}
The duty of the description() method is
self-explanatory. The addLaps() method assembles (in
order) the individual tests that take place within a circuit.
Notice that the laps are identified by strings. Each string is a
method name; a driver that executes the Laconia race course
must implement a populate() method and a
read() method.
You'll also notice that we used a different constructor for the
two Lap objects. The first constructor, which adds the
populate() method to the laps, illustrates a clever
feature of PolePosition. While the first false in the constructor
identifies this as a Lap that must be done with a
"fresh" (newly created) database--which is immaterial here
because a new database will always be created for the first
Lap--the second false tells PolePosition
not to record the results of this Lap. This
allows us to use this first Lap to set up the database
(in this case, populate it with objects to read) and not have this
set-up time counted against the database in the final score.
Finally, the requiredDriver() method returns the
reference to the class that must be used to run the given circuit.
When a race is being run, PolePosition automatically matches driver
to circuit.
The next bit of code we have to construct is an interface that
specifies the methods that a driver on the Laconia circuit must
implement. That interface appears below. Note that its methods
correspond to the lap names identified in addLaps()
method of the Laconia class.
package org.polepos.circuits.laconia;
public interface LaconiaDriver {
void populate();
void read();
}
This rounds out the files we need to add to the
Laconia package, and our circuit has, for the most
part, been built. We need to conjure up a driver for the
Laconia circuit, but before we can do that, we must define the
structure of the data that a LaconiaDriver will be
expected to handle.
Data objects that will be manipulated by candidate databases
reside in the org.polepos.data package. We'll create a
simple Picture class:
package org.polepos.data;
import org.polepos.framework.*;
public class Picture implements CheckSummable {
private int mIDnum; // Picture ID number
private byte[] mContent; // Picture content
/*
* Default constructor
*/
public Picture()
{ }
/*
* Create a new instance of Picture
*/
public Picture(int IDnum, byte[] content)
{ this.mIDnum = IDnum;
this.mContent = content;
}
// Getters and Setters follow
public int getIDnum()
{ return mIDnum; }
public byte[] getContent()
{ return mContent; }
public void setIDnum(int IDnum)
{ mIDnum = IDnum; }
public void setContent(byte[]content)
{ mContent = content; }
/*
* Checksum used to verify objects
*/
public long checkSum() {
// We'll return the ID number as the checksum
return mIDnum;
}
}
The structure of a Picture object is
straightforward, including only an integer ID number (that we will
ensure is unique when we implement LaconiaDriver) and
a byte array. There's also a checkSum() method
specified in the CheckSummable interface, which we
provide simply by returning the ID number. PolePosition can use
checksums to verify that a race has been properly run, but we won't
be employing checksums in our example.
At this point, we are ready to build a Driver
capable of running the Laconia race. In keeping with the racecourse
metaphor, each database provides a Team of drivers,
one Driver each for the circuits that the database is
able to "race on." All of the drivers for a given database are kept in
the org.polepos.teams.database-name package
and, by convention, the class that provides the implementation of a
given database for given circuit is named
CircuitnameDatabasename.class.
So, in the teams\db4o folder, we'll define the
LaconiaDb4o class:
package org.polepos.teams.db4o;
import java.util.*;
import org.polepos.circuits.laconia.*;
import org.polepos.data.*;
import com.db4o.*;
import com.db4o.query.*;
public class LaconiaDb4o extends Db4oDriver
implements LaconiaDriver {
/*
* Implementation of the first lap. Put objects
* in database.
* NOT scored.
*/
public void populate() {
// Retrieve object count and object size
int numobjects = setup().getObjectCount();
int sizearray = setup().getObjectSize();
// Fill value
byte fillval = (byte)'x';
// Loop through the object count.
// Instantiate a Picture object of the
// appropriate size. Load up its mContent
// field, set its ID field to the increment,
// and store it.
begin();
for(int i=1; i<numobjects; i++)
{
byte[] barray = new byte[sizearray];
Arrays.fill(barray,fillval);
Picture p = new Picture(i,barray);
store(p);
}
commit(); // Make sure everything's written
}
/*
* Implementation of the second lap.
* Read the objects.
* Scored.
*/
public void read() {
// Read all the objects of type Picture
readExtent(Picture.class);
}
}
As shown above, populate() reads the property
values for the number of objects and the size of the byte array,
instantiates each object, fills its array, and stores it in the
database. These property values are read from the
Circuits.properties file (which we will show in a
moment) and are provided automatically by the framework. The db4o
store() method--with a single argument of the object
to be persisted--is all that we need to put an object in the db4o
database. And store() will traverse the object tree
automatically, so the byte[] array member of the
Picture object is also persisted.
The read() method is even simpler.
LaconiaDb4o extends Db4oDriver (already
in the framework), which provides fundamental operations likely to
be used by all implemented drivers for db4o. One of those
operations is readExtent(), which retrieves all of the
persisted objects of the specified class.
We're almost done; we just need to specify the properties of
each turn through the Laconia circuit. PolePosition gathers its
properties files in the settings folder, and the particular file of
interest is Circuits.properties. We add the following
lines to that file:
# laconia
#
# [objects]: number of objects to create/read
# [size]: size of the byte[] array
laconia.size=200000,400000,600000,800000,1000000,1200000
laconia.objects=20,20,20,20,20,20
As you can see, there will be six turns through the Laconia circuit. Each turn will create the same number of objects, but of increasing size.
Since we've only defined a LaconiaDb4o driver, we
should go to the RunSeason.java file and comment out
the other team and circuit members of the Team[] and
Circuit[] arrays. This is not entirely necessary,
because PolePosition automatically matches drivers to circuits,
and will discover no Laconia-capable drivers for the other
databases. But if we don't, we'll have to sit through a number of
races just to see the Laconia output.
The result of the Laconia run appears in Figure 1. Though the
chart shows a single line, we can at least see an interesting
phenomenon that suggests that db4o improves its scaling at
somewhere around 600k object sizes. On either side of that point,
the scaling appears to be almost linear, but to the right of the
point, the slope is noticeably reduced.
There's more to PolePosition than we've shown here, mainly
because the framework is so well put together that there's little
or no fiddling required to implement a circuit and a driver. For
example, if you want to add a brand new database to the mix, you'll
have to extend the abstract Car class (after all, a
driver has got to have a car).
The base Car class is so simple that it's easier to show
than to explain.
package org.polepos.framework;
public abstract class Car{
protected String mWebsite;
protected String mDescription;
public abstract String name();
public String getWebsite(){
return mWebsite;
}
public String description(){
return mDescription;
}
}
A database will extend this class, adding methods that
initialize the database system prior to running a turn around the
circuit, as well as methods that "clean up" after a race has been
run. Because Db4oCar had already been defined for us,
we didn't have to include any code to create our database before
the race (or delete it afterward), in our LaconiaDb4o
class.
We hope that there will be much more about PolePosition than can be shown in any single article, because people will have contributed to and extended the benchmark. As already stated, PolePosition includes a limited number of database interfaces at this point. New database back ends would certainly be useful, not only so that database performance could be compared, but so that developers interested in learning a new database API can look into the PolePosition source and see how an equivalent operation is performed in database A as compared to database B. In addition, the framework as it stands now anticipates only a single-user model. Implementing some sort of multi-threaded test to simulate multiple users would be exceedingly difficult without extensive modifications to the framework. But if that could be done, PolePosition could provide enterprise-level testing.
Whatever work is involved, there are significant benefits to be had from an open source benchmark framework that is easy to extend, while at the same time providing enough control so that the results can be compared across database technologies.
Gentlemen, start your engines. Please.
|
|