Using Berkeley DB
At this point, I've told told you what Berkeley DB is, I've given you some ideas on when it's an appropriate tool to use (and what problems it solves), and I've told you it's a fairly high-quality library. By now, you're probably itching for something more concrete. In this section, I'm going to cover the basics of using Berkeley DB. And then, in the next section, I'll show you how to implement session management using Berkeley DB.
Converting Objects to Data
Recall that, at its heart, a Berkeley DB database is a b-tree. It's a collection of key-value pairs. Keys map to values, and they usually do so uniquely (you can configure things so that keys don't have unique values, but the default is for each key to have a single value). What's more, internally, Berkeley DB is storing both key and values as byte arrays. It doesn't track object types or field names or any data like that -- it stores instances of DatabaseEntry, which is simply a wrapper around a byte array that provides some convenience methods.
This has two consequences. The first is that you have to provide Berkeley DB with some way to transform your objects into instances of DatabaseEntry (alternatively, you can perform the transformation yourself). The second, which we'll return to later, is that the internal order of keys and values in a Berkeley DB database isn't what you might expect: it's the order defined by the "natural order" on the byte arrays.
There are two ways to do create instances of DatabaseEntry:
- You can implement the
EntryBinding interface (you usually do this by extending TupleBinding). EntryBinding has two methods, objectToEntry and entryToObject, and can be thought of as an object translation layer you provide to Berkeley DB.
- You can have your objects implement the
Serializable interface. In this case, you will also need to create a second database to store the class definitions. But once you do so, you can simply use Sleepycat's implementation of serialization, instead of writing lots of different subclasses of TupleBinding.
The first approach, repeatedly subclassing TupleBinding, requires more programmer effort and code. The second approach, using serialization, is slower and requires you to create and maintain a second database of class files (e.g., it's slightly more resource-intensive). There's no good general-purpose answer to the question "Which approach should I use?" but in this article, we'll subclass TupleBinding. It's easy to do and, for small projects, is clearly the better approach.
Here's part of the code for a simple class that performs binding for a class named Session.
public class SessionBinding extends TupleBinding {
public Object entryToObject(TupleInput ti) throws IOException {
Session returnValue;
String sessionKey = ti.readString();
// read rest of arguments using TupleInput and build the session object.
return returnValue;
}
public void objectToEntry(Object object, TupleOutput to) throws IOException {
Session session = (Session)object;
to.writeString(session.getSessionKey());
// write out rest of session fields using TupleOutput
}
}
The point to notice about this code is that it really looks a lot like serialization. It's a translation layer that converts objects to byte arrays, and vice versa. When objects are stored, objectToEntry is called. When they're retrieved, entryToObject is called.
Creating a Database
Now that you know how to convert objects to and from instances of DatabaseEntry, the next step is to actually create a database (otherwise, we have no place to store the entries). Because Berkeley DB automatically persists the database (there's no "in-memory only" mode, as far as I can tell), this can be somewhat complicated, and four different Berkeley DB classes are involved: EnvironmentConfig, Environment, DatabaseConfig, and Database. Each of these objects has a different role in creating "the database." An instance of Database corresponds to a single map (a single set of key-value pairs). It's got some configuration properties (the most important of which are whether it supports transactions and whether key/data pairs have to be unique). It's configured at creation time, using an instance of DatabaseConfig.
An instance of Environment corresponds to a set of instances of Database and a location in the file system. While it isn't a perfect analogy, you can think of a Berkeley DB Database as corresponding to a single table in an SQL database. And the Environment corresponds to a collection of Tables (e.g., a "database" in the SQL model).
One of the main reasons this analogy isn't perfect is that Berkeley DB also allows the existence of secondary databases. A Berkeley DB database is a map from a set of keys to a set of values. That's fine for a lot of scenarios, but in many cases, you need more than one key. For example, if you're a doctor, you might want to index instances of PatientRecord by both socialSecurityNumber and lastName. The way to solve this problem is to use a secondary database. Secondary databases are databases, but they're linked to a primary database and they serve as additional indices. We'll talk more about this later.
In order to create an instance of Environment and Database, you need to decide a few things. The four most important decisions are:
- Where in the file system will databases be stored?
- What should happen if the database doesn't already exist?
- Will the environment support transactions?
- Will the database support transactions?
Once you've made these decisions, creating a database is easy. Here's a code snippet that shows the basic process:
EnvironmentConfig environmentConfig = new EnvironmentConfig();
environmentConfig.setAllowCreate(true);
// perform other environment configurations
File file = new File(DATA_DB_DIR_NAME);
Environment environment = new Environment(file, environmentConfig);
DatabaseConfig databaseConfig = new DatabaseConfig();
databaseConfig.setAllowCreate(true);
// perform other database configurations
_sleepyCatDB = environment.openDatabase(null, DB_NAME , databaseConfig);
About the only really surprising here is the use of "property names." In addition to methods, such as as setAllowCreate, both EnvironmentConfig and DatabaseConfig have methods that allow you to specify a property and its value by name. Thus, for example, in the above code snippet, we could have added in:
environmentConfig.setConfigParam("java.util.logging.level", "INFO");
This is handy for a couple of reasons. The first is that these are also environmental properties. That is, Berkeley DB supports a property file format (the file is named je.properties). If the properties file exists, the values will be taken from there instead of using the values set in your code. This makes it easy to change property values without recompiling. The second is that it makes it easy to write administration tools and scripts that simply store and retrieve name-value pairs.
Fetching and Retrieving Objects
At this point, we know how to transform objects into instances of DatabaseEntry and we know how to create instances of Environment and Database. The next step in our whirlwind tour involves storing and retrieving objects from the database. Fortunately, it's pretty simple. The general sequence for inserting and removing objects is:
- Create a transaction (if you are using transactions). You create transactions using the instance of
Environment. Also note that you can have more than one simultaneous transaction.
- Create the instances of
DatabaseEntry you will be using.
- Call the appropriate
put or delete method on the instance of Database.
- Close the transaction.
Here, for example, is the code from the forthcoming session management example. In order to add a session, we convert the session key (a string, in this example) and then the session itself into instances of DatabaseEntry, and then call the put method. This causes Berkeley DB to add the byte arrays to its b-tree.
public void addSession(Session session) {
addSession(session, null); // null transaction is "autocommit"
}
private void addSession(Session session, Transaction transaction) {
String sessionKey = session.getSessionKey();
try {
DatabaseEntry key = getDatabaseEntry(sessionKey);
DatabaseEntry value = getDatabaseEntry(session);
_mainDB.put(transaction, key, value);
} catch (Exception e) {
System.out.println("Database error");
e.printStackTrace();
}
}
private DatabaseEntry getDatabaseEntry(String string) throws Exception {
return new DatabaseEntry(string.getBytes(DEFAULT_CHARSET));
}
private DatabaseEntry getDatabaseEntry(Session session) throws Exception {
DatabaseEntry returnValue = new DatabaseEntry();
_sessionBinding.objectToEntry(session, returnValue);
return returnValue;
}
Note: Unless you pass it a Comparator, Berkeley DB will use its own ordering on the keys. This isn't an issue for retrieval that's based on specific key values, but can be a problem when you want to retrieve a range of entries.
Adding and inserting records is easy. Updating a record is a little harder: the way you update an entry in Berkeley DB is by removing the old version of an object, and then adding it back. And, of course, you want to do this in a transaction (so that both operations either succeed or fail). In the following example, we create a transaction. (The two arguments to beginTransaction are the parent transaction and an instance of TransactionConfig respectively; most of the time you pass in null for these.) We then simply remove the object and reinsert it. Here's an example of code that performs an update.
public void updateDB(Session session) {
Transaction transaction;
try {
transaction = _environment.beginTransaction(null, null);
removeSession(session, transaction);
session.touch();
addSession(session, transaction);
transaction.commitNoSync();
} catch (Exception e) {
System.out.println("Database error");
e.printStackTrace();
try {
transaction.abort();
}
catch (Exception ignored) {}
}
}
One interesting wrinkle is that this code uses the commitNoSync method. By default, Berkeley DB doesn't write to disk after every operation (e.g., what's stored to disk can be out of sync with the database in memory). In most cases, you have to explicitly tell Berkeley DB to synchronize to disk. When you are committing a transaction, you can opt for one of the following: commit, commitNoSync, or commitWithSync. commit lets Berkeley DB decide whether to synchronize or not (depending on the database and environment configuration); commitNoSync and commitWithSync let you decide explicitly whether or not the synchronization needs to occur.
Final Thoughts
In this article, I introduced Sleepycat's new Java edition of Berkeley DB. You've seen examples of where it is used, reasons for using it, and some basic details on how to do so.
In the next article, we'll do a deep dive into a real-world example. In particular, we'll walk through how to implement session management using Berkeley DB.