Skip to main content

Explorations: Generics, Erasure, and Bridging

December 2, 2003

{cs.r.title}






It's traditional for new columnists to spend a paragraph or two introducing themselves, establishing their bona-fides, and talking about the overall goal of the column. In that spirit, I'll mention that I've been programming in Java for seven years now, I've written a book (Java RMI), co-authored another (Java Enterprise Best Practices), and written scads of individual articles (for O'Reilly's online magazine, ONJava, among other places). In short, even though this is my first column for java.net, I feel like a old hand at this (when Daniel Steinberg asked me if I'd write a monthly column, I thought it was a natural thing to do -- the only place we really quibbled was over the column name. I'm sorry, but "Deep Geeking" just isn't a column I want to write).

As to the goal of the column, well, I'm going to spend the next year or so shining a flashlight into some fairly obscure and dusty corners of the Java universe. The goal is to explain things that are either new, often overlooked, or just plain worthy of additional attention. I don't have any deeper agenda than that. In my first two articles, I'm going to talk about some of the more advanced aspects of the current generics implementation.

More specifically, this first installment is about erasure and bridging. Both of these are code transformations the compiler performs in order to implement the generics specification. My goal in talking about them is to help you understand exactly how the current generics implementation works, and to help you avoid some fairly common mistakes. And then, in my next column, I'm going to dive into the wildcards section of the generics specification.

Acknowledgements: Tom Hill and Martin O'Connor read early drafts of this article and provided feedback. It's almost certainly still got a few big mistakes, but they did a fine job at winnowing the number down.

Downloading and Installing the Generics Specification

The Generics Specification, also known as "generics" or "the generics implementation," is a change to the Java programming language that allows you to specify additional type information at compile time, and thereby reduces the need for casting instances to classes at runtime. Languages like Eiffel and C++ have had generics in one form or another for years, and people have been proposing to add generics to Java for as long as I can remember (the GJ project published papers as early as 1998, and the Pizza project had one published in 1997).

As far as I can tell, the argument over whether to include some form of generics is effectively over: the next major release of Java will contain an implementation of generics. However, while we know that generics will be included in JDK 1.5, the final form of the specification is still uncertain. As the current download of the generics compiler (available from Sun's early access site) puts it:

Disclaimer: This prototype is experimental code developed as part of JSR014 and made available to the developer community for use as-is. It is not a supported product. Use it at your own risk. The specification, language, and implementation are subject to change as a result of your feedback. Because these features have not yet been approved for addition to the Java language, there is no schedule for their inclusion in a product.

In particular, while the basics of the generics specification have been fairly stable for a long time, the more advanced aspects of the specification continue to evolve.

The generics specification is implemented entirely as a set of changes to the class libraries and the javac compiler. It doesn't require any JVM changes, and it doesn't need special class file formats. Because of this, you can download an early access version of the generics compiler and use it today (and the class files you generate will work on most JVMs). The best way to read this article is with the generics compiler nearby, so that you can try out the code examples or play with your own ideas as you think of them.

Here's what you need to do:

  1. Make sure you're running JDK 1.4.2 (using java -version). If you're not running JDK 1.4.2, you'll need to download and install it.

  2. Download the generics implementation. Do not download the version available from the Java Community Process web site. Even though the generics specification was originally developed as a JSR, the version currently on the JCP site is very out of date. The most recent version is available from the Javasoft Early Access web site.

  3. Unzip the download and store the files in a convenient place.

  4. Add the .jars to your bootclasspath (to see how to do this, look at the appropriate scripts in the scripts directory).

Note that the compiler you get has support for lots of other features (besides generics). In fact, all of the new language extensions for JDK 1.5 except Metadata are supported in the latest version of the generics compiler.

A Very Brief Review of the Basics

Before we start talking about more advanced aspects of generics, I'm going to remind you about the problem that generics solve and cover the basic syntax. If you haven't seen generics before, please read my article about generics first.

Let's start by looking at the collection classes defined in the java.util package. A collection is a container; it holds objects. Most of the method signatures use Object for argument types and return values, so you can store any type of object in a container. As an example, consider the following two methods from the Collection interface:

boolean add(Object o) 
boolean contains(Object o)

Having signatures like these makes the Collections library reusable across projects. If I'm writing a database access layer and you're working on a web application for marketing, we can both store our objects in instances of Vector. This flexibility is exactly what makes the Collections library dangerous to use in any given project. When an instance is retrieved from a collection, whether directly or via an Iterator, it has a declared type of Object. It is typical to cast this Object to the type you are looking for like this:

// _patients is a vector of instances of Person
Iterator i = _patients.iterator();
while (i.hasNext()) {
    Patient nextPatient = (Patient) i.next();
    fetchDoctor(nextPatient);
}

When a programmer is forced to use casts on return values from methods, there is the possibility of the Java runtime throwing unexpected instances of ClassCastException. And, since ClassCastException is an unchecked exception, it's very easy for an error in casting an object to cause an error in your code. For example, this threaded discussion came about because someone expected an Iterator but the library they were using returned an Enumeration.

If we instead use type parameters, look how much more clear and flexible the code becomes:

// _patients is a vector of instances of Patients
Iterator<Patient> i = _patients.iterator();
while (i.hasNext()) {
    Patient nextPatient = i.next();
    fetchDoctor(nextPatient);
}

There are only two differences in this code snippet. The first is that the Iterator declaration became an Iterator<Patient> declaration. And the second is that the cast was removed from Patient nextContractor = (Patient) i.next().

This sort of code transformation is at the heart of the generics specification. To begin with, you can think of the generics specification as a way to move the class casts to the declarations, in order to allow the compiler to check that all of the types match and thereby prevent class cast exceptions from being thrown at runtime.

I'm going to conclude this brief review by reminding you of three definitions:

  • Type Parameters are the things in the angle brackets. They're parameters that will get replaced by either classes or parametrized types in member declarations and in object instantiations.
  • A Parameterized Type is a class or interface, along with a set of type parameters.
  • A Raw Type is the class with all of the type parameters removed.

Erasure

Java has always had a very strong interest in backwards compatibility. Changes to the language and to the core libraries are generally required to be backwards-compatible: old bytecode must be able to run on new JVMs and old code must be able to be compiled and run on new JVMs. While this is sometimes controversial, it generally makes life easier for people who have to develop and deploy Java code.

The requirement for backwards compatibility makes implementing generics much harder than it would be otherwise. In particular, it leads to three major restrictions. First, the format of class files (as specified in the Java Language Specification) can't change very much. Second, any code that was compiled using a "pre-generics" class that has since been genericized (for example, any code that uses an ordinary, non-generic Iterator) must still work correctly. Third, any code that was written using a "pre-generics" class that has since been genericized (for example, any code that uses an ordinary, non-generic Iterator) must still compile and work correctly.

In order to implement generics under these restrictions, the generics implementation uses a technique known as erasure. The easiest way to understand erasure is to think of the compiler as performing two distinct tasks. First, it does type checking at compile time using all the type information it has (including the type parameters). Then it transforms the code, using a set of rules that remove all of the type parameters (e.g. all of the parameterized types are mapped to raw types), and inserts a set of casting operations. The complete list of transformations is beyond the scope of this article. However, to give you an idea of what erasure does, here are some of the rules:

  • If a class doesn't use type parameters in its definition, erasure doesn't change the class definition.
  • Parameterized types "drop" their type parameters.
  • Every type parameter is mapped to the appropriate bound. By this, I mean: the type parameter is erased and replaced with the strongest type the compiler can reasonably assert. Thus, in the case of <T>, T is mapped to Object. In the case of <T extends Rentable>, T is mapped to Rentable, and so on.
  • Casts are inserted wherever necessary, to ensure that the code compiles.

Consider, for example, the example of a ShoppingCart for our video store. Written using generics, the code might look like:

public class ShoppingCart<T extends Rentable> {
    private Vector<T> _contents = new Vector<T>();

    public void add(T rentable) {
        if (!_contents.contains(rentable)) {
            _contents.add(rentable);
        }
    }

    public int getTotalPurchasePrice() {
        int totalPrice = 0;
        Iterator<T> iterator = _contents.iterator();
        while(iterator.hasNext()) {
            T itemInCart = iterator.next();
            totalPrice+=itemInCart.getPrice();
        }
        return totalPrice;
    }

    public Iterator<T> getContents() {
        return _contents.iterator();
    }
}

In this code example, ShoppingCart is a parameterized type with type parameter T and T is restricted to types that extend Rentable. Under erasure, all of the type parameters are erased, and casts are introduced where necessary. So the type parameter <T> is either erased completely or replaced with Rentable, and casts to Rentable are inserted in the appropriate places. The "post-erasure" code ShoppingCart therefore looks something like the following:

public class ShoppingCart {
    private Vector _contents = new Vector(); // just a Vector

    public void add(Rentable rentable) { // T became Rentable
        if (!_contents.contains(rentable)) {
            _contents.add(rentable);
        }
    }

    public int getTotalPurchasePrice() {
        int totalPrice = 0;
        Iterator iterator = _contents.iterator();
        while(iterator.hasNext()) {
            Rentable itemInCart = (Rentable) iterator.next(); // cast inserted
            totalPrice+=itemInCart.getPrice();
        }
        return totalPrice;
    }

    public Iterator getContents() { // just an iterator
        return _contents.iterator();
    }
}

The post-erasure code shouldn't look unusual -- the process of erasure really just returns the sort of code you have to write today.

At this point, I'd like to repeat that erasure is an internal compiler step: you as a programmer will never see the erased code unless you use a decompiler and examine the class files generated by the compiler. If you want to decompile the class files, one good (if rapidly aging) decompiler is jad. I often find that the best way to understand what the compiler is doing is to reverse engineer the bytecode, and I recommend that you use a decompiler whenever you're not sure what's going on.

I should also note that the loop in getTotalPurchasePrice is a bit unseemly these days. In JDK 1.5, we'd probably use the nifty new for syntax and write something like:

public int getTotalPurchasePrice() {
    int totalPrice = 0;
    for (T itemInCart: _contents) {
        totalPrice+=itemInCart.getPrice();
    }
    return totalPrice;
}

That's actually much nicer. I wasn't a big fan of the new for loops at first, but they really grow on you.

There are other rules for erasure, concerning other language constructs (like array objects) and defining how to erase for generic methods. They're all pretty reasonable, though, and the above example gives you a good feeling for how erasure works in practice. So, rather than recapitulate the entire generics specification, we're going to move on now and talk about some of the consequences of erasure.







Erasure and Accidental Compile-Time Conflicts

One consequence of erasure is that code sometimes doesn't compile because, after erasure, there are conflicts. You probably suspect that the following two classes will have an issue:

class ShoppingCart<T extends DVD>{
    // ...
}

class ShoppingCart<T extends VideoTape>{
    // ...
}

And the reason is pretty easy to discern: under erasure, these classes have the same name. (In fact, you'd probably have run into problems before you got as far as compiling, because both of these classes want to be stored in ShoppingCart.java.)

But there are more subtle name clashes, as well. If two methods erase to the same method, then you'll get a compile time error. For example, the following code doesn't compile, either:

class TwoForOneSpecial<T extends Rentable, W extends Rentable> {
    public void add(T newRentable) {
        //...
    }
    public void add(W newRentable) {
        //...
    }
}

It fails for a pretty obvious reason: the two add methods have identical erasures, and therefore cause compile-time problems. The problem here is that the system maps both T and W to Rentable, and therefore causes a compilation error (the compiler currently states "add(T) and add(W) have the same erasure").

No such problem exists for the following class:

class GetAFreeVideoTape<T extends Rentable, W extends VideoTape> {
    public void add(T anything) {
        //...
    }
    public void add(W videotape) {
        //...
    }
}
GetAFreeVideoTape will compile because under erasure, it becomes:

class GetAFreeVideoTape {
    public void add(Rentable anything) {
        //...
    }
    public void add(Videotape videotape) {
        //...
    }
}
Erasure and Static Variables

Page three of the generics specification (the June 23, 2003 draft) contains an interesting sentence:

The scope of a type parameter is all of the declared class, except any static members or initializers, but including the type parameter section itself.

This is intriguing -- it says that you can't use type parameters in static fields or methods. That is, code like the following isn't allowed:

public class Store<T> {
    private static T STATIC_VARIABLE;
}

At first glance, this seems like a strange restriction to have. But it does makes sense, and it's a consequence of erasure.

The heart of the problem is the fact that when you use generics, you're not defining new classes. Erasure maps parameterized types to raw types. So, for example, Vector<String> and Vector<Integer> both get erased to the same class (namely, Vector) and the code that accesses these instances casts the values that it fetches from them.

Why is this problematic? Consider the following class (it won't compile, but pretend for a moment that it could):

public class GlobalEventQueue<T extends Event> {
    // stores all events in a global linked list so that
    // all events are serialized.
    private static LinkedList<T> QUEUE = new LinkedList<T> ();
    // ...

    public static synchronized T removeEvent() {
        return QUEUE.removeFirst();
    }
    public static synchronized void addEvent(T event) {
        QUEUE.add(event);
    }
}
GlobalEventQueue<T extends Event> uses a static LinkedList<T>to store events. But you can have many instances of GlobalEventQueue<T extends Event>, and you can subclass it. Suppose you did so, by creating the classes ScreenEventQueue and DiskEventQueue.

public class ScreenEventQueue {
    private GlobalEventQueue<ScreenEvent> _myQueue;
    public ScreenEventView() {
        _myQueue = new GlobalEventQueue<ScreenEvent>();
    }

    public ScreenEvent removeEvent() {
        return _myQueue.removeEvent();
    }
    public void addEvent(ScreenEvent event) {
        return _myQueue.addEvent(event);
    }
}

public class DiskEventQueue {
    private GlobalEventQueue<DiskEvent> _myQueue;
    public DiskEventQueue() {
        _myQueue = new GlobalEventQueue<DiskEvent>();
    }

    public DiskEvent removeEvent() {
        return _myQueue.removeEvent();
    }
    public void addEvent(DiskEvent event) {
        return _myQueue.addEvent(event);
    }
}

What happens if you create an instance of ScreenEventQueue and DiskEventQueue? Well, at runtime, both of these insert and remove objects from the same instance of LinkedList. And they do so by casting the values that they remove. In fact, under erasure, ScreenEventQueue and DiskEventQueue become:

public class ScreenEventQueue {
    private GlobalEventQueue _myQueue;
    public ScreenEventView() {
        _myQueue = new GlobalEventQueue();
    }

    public ScreenEvent removeEvent() {
        return (ScreenEvent) _myQueue.removeEvent();
    }
    public void addEvent(ScreenEvent event) {
        return _myQueue.addEvent(event);
    }
}

public class DiskEventQueue {
    private GlobalEventQueue _myQueue;
    public DiskEventQueue() {
        _myQueue = new GlobalEventQueue();
    }

    public DiskEvent removeEvent() {
        return (DiskEvent) _myQueue.removeEvent();
    }
    public void addEvent(DiskEvent event) {
        return _myQueue.addEvent(event);
    }
}

Which means that we can insert an instance of DiskEvent into the static LinkedList and then try to cast it as a ScreenEvent.Unless there's some synchronization logic somewhere else, we're going to eventually add an instance of DiskEvent to the queue and then try to cast it as a ScreenEvent when we remove it. At runtime, the transformed code will throw instances of ClassCastException.

More generally, if we let static members be defined using type parameters, it's impossible to avoid the possibility of a ClassCast Exception being thrown at runtime. And so the specification carefully limits the scope of type parameters to rule out code like the example above.

Bridging

So far, we've only talked about erasure. However, the current implementation of generics uses another form of code transformation as well. This process, which is referred to as bridging, consists of inserting extra methods into objects. And, like erasure, bridging is motivated by backwards compatibility.

To understand bridging, let's extend our example and have our shopping cart sort the tapes before returning them. To do this, we need to define a Comparator. In the generics specification, Comparator has become the parameterized type Comparator<T>. But it's still an interface, and it has the same methods (they've just become more strongly typed). The interface has become:

public interface Comparator<T> {
    int compare(T o1, T o2);
    boolean equals(Object obj);
}

Here's an implementation of RentableComparator.

public class RentableComparator implements Comparator<Rentable>{

    public int compare(Rentable rentable1, Rentable rentable2) {
        if (null==rentable1) {
            if (null==rentable2) {
                return 0;
            }
            return -1;
        }
        if (null==rentable2) {
            return 1;
        }
        return rentable1.getDisplayName().compareTo(rentable2.getDisplayName());
    }
}

This looks pretty similar to the code you write today, and it's exactly what you want: it's a strongly typed comparator. At compile time, the compiler can check that you're using things correctly. Incorporating RentableComparator into ShoppingCart is easy. We just use our comparator to sort the list before returning an Iterator in the getContents method.

public Iterator<T> getContents() {
    RentableComparator comparator = new RentableComparator();
    Collections.sort(_contents, comparator);
    return _contents.iterator();
}

Again, this looks exactly like the code you write today. And, if you're under deadline pressure, you might just write this code, check that it works, and move on.

But if you stop and think about backwards compatibility, this can get pretty confusing. Suppose, for example, you're also using a legacy library that puts objects into instances of Vector and then sorts them, as in the following code.

public void printSortedCollection(Collection collection, Comparator comparator) {
    Vector vector = new Vector(collection);
    Collections.sort(vector, comparator);
    Iterator i = vector.iterator();
    while (i.hasNext()) {
        System.out.println(i.next());
    }
}

The legacy library has to work, as well. When you pass in an instance of RentableComparator, the right thing will happen (the legacy library will sort the vector correctly). This happens in spite of the fact that your comparator implemented public int compare(Rentable rentable1, Rentable rentable2) and the legacy library was expecting public int compare(Object object1, Object object2).

If you're at all familiar with the way inner classes are implemented, you've already guessed the solution to this problem: the generics compiler actually inserts extra methods, called bridge methods, into the parameterically typed classes (or subclasses) to make sure that the legacy code works. In this case, the compiler will insert code that looks like the following into RentableComparator:

public int compare(Object obj, Object ob1) {
    return compare((Rentable) obj, (Rentable) obj1);
}

This is very nice. With bridging, you get the benefits of static typing in all of your code. And you get backwards compatibility with all of the old libraries you are currently using (or might use).

Final Thoughts

At this point, you're probably a little tired of learning about what the compiler is doing to your code behind the scenes. So in the final section of this column, I'm going to switch gears and talk about what the compiler can't do to your code (at least, as far as I've been able to puzzle it out). In particular, there are two things I wish it did, that it doesn't do.

The first thing on my wish list this Christmas is a typesafe equals method. I really hate code like the following (from the ItemCode class):

public boolean equals(Object object) {
    if (!(object instanceof ItemCode)) {
        return false;
    }
    int otherCode = ((ItemCode) object)._code;
        return (_code == otherCode);
}

This is correct, concise, and perfectly reasonable. But the cast check in the beginning smells bad to me. In most cases, class checks are in there for logical completeness; not because the developer expects it to happen. I'd wager that in many cases the code should really be:

public boolean equals(Object object) {
    if (!(object instanceof ItemCode)) {
        // huh? It's not? Wow. I didn't expect that to happen at all.
        // Oh well. I guess if it's a different class entirely,
        // it's not equal. So returning false is the safe thing to do.
        return false;
    }
    int otherCode = ((ItemCode) object)._code;
    return (_code == otherCode);
}

and that, therefore, all the instanceof really does is obfuscate a logical error. I'll grant you that it might be better to return false than throw a ClassCastException on a production server, but it's a not good thing to do.

Another problem with equals arises from the fact that type parameters are erased. The issue is that, at runtime, we can only check the raw type and not the parameterized type. Put another way, the problem is that the return value in the following code snippet evaluates to true.

Vector<String> vector1 = new Vector<String>();
Vector<Vector<String>> vector2 =  new Vector<Vector<String>>();
return vector1.getClass() == vector2.getClass();

The fact that two very different types (Vector<String> and Vector<Vector<String>>) have the same class makes relying on instanceof (or getClass) problematic.

Ideally, I'd like equals to somehow be a generic method, so that I could get rid of the class casts and take advantage of static typing. But I don't see any way to do it (and still take advantage of all of the code out there that calls equals). So I'm leaving this one open as a challenge -- can anyone see a way to have static typing on the arguments to equals and still preserve backwards compatibility?

Another place you still need to cast, and cast correctly, is inside of serialization code. If you use serialization to persist objects, you're either going to use default serialization (which is often unwise, for the reasons outlined in Java Enterprise Best Practices) or you're going to wind up writing code like:

FileOutputStream fos = new FileOutputStream(_persistenceFileName);
ObjectOutputStream oos = new ObjectOutputStream(fos);
oos.writeObject(settings);
oos.close();

// .....

FileInputStream fis = new FileInputStream(_persistenceFileName);
ObjectInputStream ois = new ObjectInputStream(fis);
Hashtable settings = (Hashtable)ois.readObject();
o = ois.readObject();
ois.close();

Wouldn't it be nice if the compiler could check this code too?

And with that thought, I'll end this month's column. In next month's column, I'll talk about how inheritance interacts with generics and how wildcards work.

William Grosso is the vice president of engineering for Echopass.
Related Topics >> Programming   |