The Source for Java Technology Collaboration
User: Password:



   

Explorations Explorations: Generics, Erasure, and Bridging

by William Grosso
12/02/2003

It's traditional for new columnists to spend a paragraph or two introducing themselves, establishing their bona-fides, and talking about the overall goal of the column. In that spirit, I'll mention that I've been programming in Java for seven years now, I've written a book (Java RMI), co-authored another (Java Enterprise Best Practices), and written scads of individual articles (for O'Reilly's online magazine, ONJava, among other places). In short, even though this is my first column for java.net, I feel like a old hand at this (when Daniel Steinberg asked me if I'd write a monthly column, I thought it was a natural thing to do -- the only place we really quibbled was over the column name. I'm sorry, but "Deep Geeking" just isn't a column I want to write).

As to the goal of the column, well, I'm going to spend the next year or so shining a flashlight into some fairly obscure and dusty corners of the Java universe. The goal is to explain things that are either new, often overlooked, or just plain worthy of additional attention. I don't have any deeper agenda than that. In my first two articles, I'm going to talk about some of the more advanced aspects of the current generics implementation.

More specifically, this first installment is about erasure and bridging. Both of these are code transformations the compiler performs in order to implement the generics specification. My goal in talking about them is to help you understand exactly how the current generics implementation works, and to help you avoid some fairly common mistakes. And then, in my next column, I'm going to dive into the wildcards section of the generics specification.

Acknowledgements: Tom Hill and Martin O'Connor read early drafts of this article and provided feedback. It's almost certainly still got a few big mistakes, but they did a fine job at winnowing the number down.

Downloading and Installing the Generics Specification

The Generics Specification, also known as "generics" or "the generics implementation," is a change to the Java programming language that allows you to specify additional type information at compile time, and thereby reduces the need for casting instances to classes at runtime. Languages like Eiffel and C++ have had generics in one form or another for years, and people have been proposing to add generics to Java for as long as I can remember (the GJ project published papers as early as 1998, and the Pizza project had one published in 1997).

As far as I can tell, the argument over whether to include some form of generics is effectively over: the next major release of Java will contain an implementation of generics. However, while we know that generics will be included in JDK 1.5, the final form of the specification is still uncertain. As the current download of the generics compiler (available from Sun's early access site) puts it:

Disclaimer: This prototype is experimental code developed as part of JSR014 and made available to the developer community for use as-is. It is not a supported product. Use it at your own risk. The specification, language, and implementation are subject to change as a result of your feedback. Because these features have not yet been approved for addition to the Java language, there is no schedule for their inclusion in a product.

In particular, while the basics of the generics specification have been fairly stable for a long time, the more advanced aspects of the specification continue to evolve.

The generics specification is implemented entirely as a set of changes to the class libraries and the javac compiler. It doesn't require any JVM changes, and it doesn't need special class file formats. Because of this, you can download an early access version of the generics compiler and use it today (and the class files you generate will work on most JVMs). The best way to read this article is with the generics compiler nearby, so that you can try out the code examples or play with your own ideas as you think of them.

Here's what you need to do:

  1. Make sure you're running JDK 1.4.2 (using java -version). If you're not running JDK 1.4.2, you'll need to download and install it.

  2. Download the generics implementation. Do not download the version available from the Java Community Process web site. Even though the generics specification was originally developed as a JSR, the version currently on the JCP site is very out of date. The most recent version is available from the Javasoft Early Access web site.

  3. Unzip the download and store the files in a convenient place.

  4. Add the .jars to your bootclasspath (to see how to do this, look at the appropriate scripts in the scripts directory).

Note that the compiler you get has support for lots of other features (besides generics). In fact, all of the new language extensions for JDK 1.5 except Metadata are supported in the latest version of the generics compiler.

A Very Brief Review of the Basics

Before we start talking about more advanced aspects of generics, I'm going to remind you about the problem that generics solve and cover the basic syntax. If you haven't seen generics before, please read my article about generics first.

Let's start by looking at the collection classes defined in the java.util package. A collection is a container; it holds objects. Most of the method signatures use Object for argument types and return values, so you can store any type of object in a container. As an example, consider the following two methods from the Collection interface:

boolean add(Object o) 
boolean contains(Object o)

Having signatures like these makes the Collections library reusable across projects. If I'm writing a database access layer and you're working on a web application for marketing, we can both store our objects in instances of Vector. This flexibility is exactly what makes the Collections library dangerous to use in any given project. When an instance is retrieved from a collection, whether directly or via an Iterator, it has a declared type of Object. It is typical to cast this Object to the type you are looking for like this:

// _patients is a vector of instances of Person
Iterator i = _patients.iterator();
while (i.hasNext()) {
    Patient nextPatient = (Patient) i.next();
    fetchDoctor(nextPatient);
}

When a programmer is forced to use casts on return values from methods, there is the possibility of the Java runtime throwing unexpected instances of ClassCastException. And, since ClassCastException is an unchecked exception, it's very easy for an error in casting an object to cause an error in your code. For example, this threaded discussion came about because someone expected an Iterator but the library they were using returned an Enumeration.

If we instead use type parameters, look how much more clear and flexible the code becomes:

// _patients is a vector of instances of Patients
Iterator<Patient> i = _patients.iterator();
while (i.hasNext()) {
    Patient nextPatient = i.next();
    fetchDoctor(nextPatient);
}

There are only two differences in this code snippet. The first is that the Iterator declaration became an Iterator<Patient> declaration. And the second is that the cast was removed from Patient nextContractor = (Patient) i.next().

This sort of code transformation is at the heart of the generics specification. To begin with, you can think of the generics specification as a way to move the class casts to the declarations, in order to allow the compiler to check that all of the types match and thereby prevent class cast exceptions from being thrown at runtime.

I'm going to conclude this brief review by reminding you of three definitions:

  • Type Parameters are the things in the angle brackets. They're parameters that will get replaced by either classes or parametrized types in member declarations and in object instantiations.
  • A Parameterized Type is a class or interface, along with a set of type parameters.
  • A Raw Type is the class with all of the type parameters removed.

Erasure

Java has always had a very strong interest in backwards compatibility. Changes to the language and to the core libraries are generally required to be backwards-compatible: old bytecode must be able to run on new JVMs and old code must be able to be compiled and run on new JVMs. While this is sometimes controversial, it generally makes life easier for people who have to develop and deploy Java code.

The requirement for backwards compatibility makes implementing generics much harder than it would be otherwise. In particular, it leads to three major restrictions. First, the format of class files (as specified in the Java Language Specification) can't change very much. Second, any code that was compiled using a "pre-generics" class that has since been genericized (for example, any code that uses an ordinary, non-generic Iterator) must still work correctly. Third, any code that was written using a "pre-generics" class that has since been genericized (for example, any code that uses an ordinary, non-generic Iterator) must still compile and work correctly.

In order to implement generics under these restrictions, the generics implementation uses a technique known as erasure. The easiest way to understand erasure is to think of the compiler as performing two distinct tasks. First, it does type checking at compile time using all the type information it has (including the type parameters). Then it transforms the code, using a set of rules that remove all of the type parameters (e.g. all of the parameterized types are mapped to raw types), and inserts a set of casting operations. The complete list of transformations is beyond the scope of this article. However, to give you an idea of what erasure does, here are some of the rules:

  • If a class doesn't use type parameters in its definition, erasure doesn't change the class definition.
  • Parameterized types "drop" their type parameters.
  • Every type parameter is mapped to the appropriate bound. By this, I mean: the type parameter is erased and replaced with the strongest type the compiler can reasonably assert. Thus, in the case of <T>, T is mapped to Object. In the case of <T extends Rentable>, T is mapped to Rentable, and so on.
  • Casts are inserted wherever necessary, to ensure that the code compiles.

Consider, for example, the example of a ShoppingCart for our video store. Written using generics, the code might look like:

public class ShoppingCart<T extends Rentable> {
    private Vector<T> _contents = new Vector<T>();
		
    public void add(T rentable) {
        if (!_contents.contains(rentable)) {
            _contents.add(rentable);
        }
    }
	
    public int getTotalPurchasePrice() {
        int totalPrice = 0;
        Iterator<T> iterator = _contents.iterator();
        while(iterator.hasNext()) {
            T itemInCart = iterator.next();
            totalPrice+=itemInCart.getPrice();
        }
        return totalPrice;
    }
	
    public Iterator<T> getContents() {
        return _contents.iterator();
    }
}

In this code example, ShoppingCart is a parameterized type with type parameter T and T is restricted to types that extend Rentable. Under erasure, all of the type parameters are erased, and casts are introduced where necessary. So the type parameter <T> is either erased completely or replaced with Rentable, and casts to Rentable are inserted in the appropriate places. The "post-erasure" code ShoppingCart therefore looks something like the following:

public class ShoppingCart {
    private Vector _contents = new Vector(); // just a Vector
	
    public void add(Rentable rentable) { // T became Rentable		
        if (!_contents.contains(rentable)) {
            _contents.add(rentable);
        }
    }
	
    public int getTotalPurchasePrice() {
        int totalPrice = 0;
        Iterator iterator = _contents.iterator();
        while(iterator.hasNext()) {
            Rentable itemInCart = (Rentable) iterator.next(); // cast inserted
            totalPrice+=itemInCart.getPrice();
        }
        return totalPrice;
    }
	
    public Iterator getContents() { // just an iterator
        return _contents.iterator();
    }
}

The post-erasure code shouldn't look unusual -- the process of erasure really just returns the sort of code you have to write today.

At this point, I'd like to repeat that erasure is an internal compiler step: you as a programmer will never see the erased code unless you use a decompiler and examine the class files generated by the compiler. If you want to decompile the class files, one good (if rapidly aging) decompiler is jad. I often find that the best way to understand what the compiler is doing is to reverse engineer the bytecode, and I recommend that you use a decompiler whenever you're not sure what's going on.

I should also note that the loop in getTotalPurchasePrice is a bit unseemly these days. In JDK 1.5, we'd probably use the nifty new for syntax and write something like:

public int getTotalPurchasePrice() {
    int totalPrice = 0;
    for (T itemInCart: _contents) {
        totalPrice+=itemInCart.getPrice();
    }
    return totalPrice;
}

That's actually much nicer. I wasn't a big fan of the new for loops at first, but they really grow on you.

There are other rules for erasure, concerning other language constructs (like array objects) and defining how to erase for generic methods. They're all pretty reasonable, though, and the above example gives you a good feeling for how erasure works in practice. So, rather than recapitulate the entire generics specification, we're going to move on now and talk about some of the consequences of erasure.

Pages: 1, 2

Next Page » 

View all java.net Articles.

 Feed java.net RSS Feeds