Search |
||
Explorations: Generics, Erasure, and Bridging
Tue, 2003-12-02
It's traditional for new columnists to spend a paragraph or two introducing themselves, establishing their bona-fides, and talking about the overall goal of the column. In that spirit, I'll mention that I've been programming in Java for seven years now, I've written a book (Java RMI), co-authored another (Java Enterprise Best Practices), and written scads of individual articles (for O'Reilly's online magazine, ONJava, among other places). In short, even though this is my first column for java.net, I feel like a old hand at this (when Daniel Steinberg asked me if I'd write a monthly column, I thought it was a natural thing to do -- the only place we really quibbled was over the column name. I'm sorry, but "Deep Geeking" just isn't a column I want to write). As to the goal of the column, well, I'm going to spend the next year or so shining a flashlight into some fairly obscure and dusty corners of the Java universe. The goal is to explain things that are either new, often overlooked, or just plain worthy of additional attention. I don't have any deeper agenda than that. In my first two articles, I'm going to talk about some of the more advanced aspects of the current generics implementation. More specifically, this first installment is about erasure and bridging. Both of these are code transformations the compiler performs in order to implement the generics specification. My goal in talking about them is to help you understand exactly how the current generics implementation works, and to help you avoid some fairly common mistakes. And then, in my next column, I'm going to dive into the wildcards section of the generics specification. Acknowledgements: Tom Hill and Martin O'Connor read early drafts of this article and provided feedback. It's almost certainly still got a few big mistakes, but they did a fine job at winnowing the number down. Downloading and Installing the Generics SpecificationThe Generics Specification, also known as "generics" or "the generics implementation," is a change to the Java programming language that allows you to specify additional type information at compile time, and thereby reduces the need for casting instances to classes at runtime. Languages like Eiffel and C++ have had generics in one form or another for years, and people have been proposing to add generics to Java for as long as I can remember (the GJ project published papers as early as 1998, and the Pizza project had one published in 1997). As far as I can tell, the argument over whether to include some form of generics is effectively over: the next major release of Java will contain an implementation of generics. However, while we know that generics will be included in JDK 1.5, the final form of the specification is still uncertain. As the current download of the generics compiler (available from Sun's early access site) puts it: Disclaimer: This prototype is experimental code developed as part of JSR014 and made available to the developer community for use as-is. It is not a supported product. Use it at your own risk. The specification, language, and implementation are subject to change as a result of your feedback. Because these features have not yet been approved for addition to the Java language, there is no schedule for their inclusion in a product. In particular, while the basics of the generics specification have been fairly stable for a long time, the more advanced aspects of the specification continue to evolve. The generics specification is implemented entirely as a set of changes to the class libraries and the Here's what you need to do:
Note that the compiler you get has support for lots of other features (besides generics). In fact, all of the new language extensions for JDK 1.5 except Metadata are supported in the latest version of the generics compiler. A Very Brief Review of the BasicsBefore we start talking about more advanced aspects of generics, I'm going to remind you about the problem that generics solve and cover the basic syntax. If you haven't seen generics before, please read my article about generics first. Let's start by looking at the collection classes defined in the
Having signatures like these makes the Collections library reusable across projects. If I'm writing a database access layer and you're working on a web application for marketing, we can both store our objects in instances of
When a programmer is forced to use casts on return values from methods, there is the possibility of the Java runtime throwing unexpected instances of If we instead use type parameters, look how much more clear and flexible the code becomes:
There are only two differences in this code snippet. The first is that the This sort of code transformation is at the heart of the generics specification. To begin with, you can think of the generics specification as a way to move the class casts to the declarations, in order to allow the compiler to check that all of the types match and thereby prevent class cast exceptions from being thrown at runtime. I'm going to conclude this brief review by reminding you of three definitions:
ErasureJava has always had a very strong interest in backwards compatibility. Changes to the language and to the core libraries are generally required to be backwards-compatible: old bytecode must be able to run on new JVMs and old code must be able to be compiled and run on new JVMs. While this is sometimes controversial, it generally makes life easier for people who have to develop and deploy Java code. The requirement for backwards compatibility makes implementing generics much harder than it would be otherwise. In particular, it leads to three major restrictions. First, the format of class files (as specified in the Java Language Specification) can't change very much. Second, any code that was compiled using a "pre-generics" class that has since been genericized (for example, any code that uses an ordinary, non-generic In order to implement generics under these restrictions, the generics implementation uses a technique known as erasure. The easiest way to understand erasure is to think of the compiler as performing two distinct tasks. First, it does type checking at compile time using all the type information it has (including the type parameters). Then it transforms the code, using a set of rules that remove all of the type parameters (e.g. all of the parameterized types are mapped to raw types), and inserts a set of casting operations. The complete list of transformations is beyond the scope of this article. However, to give you an idea of what erasure does, here are some of the rules:
Consider, for example, the example of a
In this code example,
The post-erasure code shouldn't look unusual -- the process of erasure really just returns the sort of code you have to write today. At this point, I'd like to repeat that erasure is an internal compiler step: you as a programmer will never see the erased code unless you use a decompiler and examine the class files generated by the compiler. If you want to decompile the class files, one good (if rapidly aging) decompiler is I should also note that the loop in
That's actually much nicer. I wasn't a big fan of the new There are other rules for erasure, concerning other language constructs (like array objects) and defining how to erase for generic methods. They're all pretty reasonable, though, and the above example gives you a good feeling for how erasure works in practice. So, rather than recapitulate the entire generics specification, we're going to move on now and talk about some of the consequences of erasure. Erasure and Accidental Compile-Time ConflictsOne consequence of erasure is that code sometimes doesn't compile because, after erasure, there are conflicts. You probably suspect that the following two classes will have an issue:
And the reason is pretty easy to discern: under erasure, these classes have the same name. (In fact, you'd probably have run into problems before you got as far as compiling, because both of these classes want to be stored in But there are more subtle name clashes, as well. If two methods erase to the same method, then you'll get a compile time error. For example, the following code doesn't compile, either:
It fails for a pretty obvious reason: the two No such problem exists for the following class:
Erasure and Static VariablesPage three of the generics specification (the June 23, 2003 draft) contains an interesting sentence: The scope of a type parameter is all of the declared class, except any static members or initializers, but including the type parameter section itself. This is intriguing -- it says that you can't use type parameters in static fields or methods. That is, code like the following isn't allowed:
At first glance, this seems like a strange restriction to have. But it does makes sense, and it's a consequence of erasure. The heart of the problem is the fact that when you use generics, you're not defining new classes. Erasure maps parameterized types to raw types. So, for example, Why is this problematic? Consider the following class (it won't compile, but pretend for a moment that it could):
What happens if you create an instance of
Which means that we can insert an instance of More generally, if we let static members be defined using type parameters, it's impossible to avoid the possibility of a BridgingSo far, we've only talked about erasure. However, the current implementation of generics uses another form of code transformation as well. This process, which is referred to as bridging, consists of inserting extra methods into objects. And, like erasure, bridging is motivated by backwards compatibility. To understand bridging, let's extend our example and have our shopping cart sort the tapes before returning them. To do this, we need to define a
Here's an implementation of
This looks pretty similar to the code you write today, and it's exactly what you want: it's a strongly typed comparator. At compile time, the compiler can check that you're using things correctly. Incorporating
Again, this looks exactly like the code you write today. And, if you're under deadline pressure, you might just write this code, check that it works, and move on. But if you stop and think about backwards compatibility, this can get pretty confusing. Suppose, for example, you're also using a legacy library that puts objects into instances of
The legacy library has to work, as well. When you pass in an instance of If you're at all familiar with the way inner classes are implemented, you've already guessed the solution to this problem: the generics compiler actually inserts extra methods, called bridge methods, into the parameterically typed classes (or subclasses) to make sure that the legacy code works. In this case, the compiler will insert code that looks like the following into
This is very nice. With bridging, you get the benefits of static typing in all of your code. And you get backwards compatibility with all of the old libraries you are currently using (or might use). Final ThoughtsAt this point, you're probably a little tired of learning about what the compiler is doing to your code behind the scenes. So in the final section of this column, I'm going to switch gears and talk about what the compiler can't do to your code (at least, as far as I've been able to puzzle it out). In particular, there are two things I wish it did, that it doesn't do. The first thing on my wish list this Christmas is a typesafe
This is correct, concise, and perfectly reasonable. But the cast check in the beginning smells bad to me. In most cases, class checks are in there for logical completeness; not because the developer expects it to happen. I'd wager that in many cases the code should really be:
and that, therefore, all the Another problem with
The fact that two very different types ( Ideally, I'd like Another place you still need to cast, and cast correctly, is inside of serialization code. If you use serialization to persist objects, you're either going to use default serialization (which is often unwise, for the reasons outlined in Java Enterprise Best Practices) or you're going to wind up writing code like:
Wouldn't it be nice if the compiler could check this code too? And with that thought, I'll end this month's column. In next month's column, I'll talk about how inheritance interacts with generics and how wildcards work. »
Related Topics >>
Programming
Comments
Comments are listed in date ascending order (oldest first)
|
||
|
|