Skip to main content

Zero and Shark: a Zero-Assembly Port of OpenJDK

May 21, 2009

{cs.r.title}







May 9, 2007 was a happy day for the Java group
at Red Hat. The release
of OpenJDK meant we could
stop playing catch-up with the free Java solutions we were
maintaining and switch our attention to the real deal.

There was just one problem. On Linux, OpenJDK only worked on x86
machines, but at Red Hat we needed to support PowerPC, Itanium,
and zSeries too. We knew that if we wanted OpenJDK to work on these
platforms then we would have to make it happen ourselves.

Porting OpenJDK to a new platform involves writing and maintaining
several thousand lines of assembly language. This causes each
platform to become its own codebase, each with its own bugs. Each platform requires a specialized implementation, which makes the platform-specific codebases opaque to the developers working on other platforms.
This approach is easier to manage in
the proprietary world (where every codebase has its team of assigned
developers) but it's less suited to open source software, where
developers come and go as their circumstances dictate.

At Red Hat, we wanted to avoid these problems. We started an
experimental port of OpenJDK without assembly language, using free
software libraries to bridge the gaps. This experiment evolved to
become the zero-assembly port of OpenJDK -- Zero -- and its
just-in-time compiler Shark.

The Problem

The majority of the work of porting OpenJDK to a new processor is in
porting the virtual machine, HotSpot. HotSpot is largely written in
C++, but some 10,000 lines of its most critical code are written in
assembly language. This has to be recreated for every new processor
you wish to support, an intensive task that takes on the order of
one work year per platform for a basic port. To make OpenJDK truly
portable it was clear that this assembly language core needed to be
replaced.

HotSpot operates by default in what it calls mixed mode.
Java code is initially executed using a profiling interpreter, which
stores information as it runs so that it can identify the methods in
which it is spending the most time. Once identified, these "hot"
methods are scheduled for compilation to native code. The compiler,
running in a separate thread, has a basic loop in which it takes the
hottest method, compiles it to native code, and inserts it into the
VM. Execution speed increases over time as more and more hot
methods are replaced with compiled code. The important point here
is that in this design the compiler is optional: a
functional, if slow, port could be written by getting the
interpreter alone to work.

The Interpreter

I've been writing in terms of "the interpreter," but HotSpot in fact
contains two: the template interpreter and
the C++ interpreter. In the template interpreter,
each bytecode is implemented by a block of native code --
a template -- written in assembly language. These
templates are generated at interpreter startup, and are chained
together at runtime to execute the method. In the
C++ interpreter, bytecodes are implemented in C++, using a
simple loop and switch construct. Not everything required by the
interpreter can be handled in C++, however, so the C++ code is
supported by a thin assembly language layer. The way the
interpreters slot into the virtual machine is shown in Figure 1.

HotSpot's interpreters

Figure 1. HotSpot's interpreters. Code written in C++ is shown in
green, and code written in assembly language is shown in red.

The template interpreter is the default, for the simple reason that
it's faster. The difference in speed is not so much because it is
written in assembly language but because it is the older of the two;
the design of the VM has very much evolved along with the template
interpreter, and the C++interpreter has to jump through hoops
in order to accommodate interfaces that don't really suit it.

The C++ interpreter has one compelling advantage over the
template interpreter, however: it contains much less assembly
language. Porting it to a new platform can therefore be done much
more quickly, and in mixed mode the difference in execution speed is
largely mitigated as hot methods are replaced with compiled code
over time. Having less assembly language to replace made the
C++ interpreter the better choice for Zero.

The C++ interpreter's assembly language layer has two basic
functions that cannot be performed from within C++: manipulation of
the native stack, and calls to native functions with arbitrary
signatures. Implementing the C++ interpreter without assembly
language basically boils down to finding a way of handling these two
things. There are other things in the assembly layer, things that
could have been written in C++, were they stand-alone functions; but the fact that part of the layer
needs to be written in assembly language means that all of it has to
be.

Stack Manipulation

At the Java level, a Java VM must keep track of which method called
the current method, and which method called that method, and so on.
This is necessary for a variety of reasons, from the simple (figuring out
where a method is returning to when it returns) to the complex
(figuring out the access control context of a method to see if it is
permitted to perform some action by the current security policy).
The straightforward way to handle this is to store the information
in a call stack.

At the machine level, HotSpot is itself an application, and it has a
stack of its own where the individual C and C++ functions store
their own caller information and other data. The format of this
stack is CPU- and OS-specific, its layout defined as part of the
particular platform's Application Binary Interface (ABI).

HotSpot was originally written for i386, a platform notoriously
starved of registers, so instead of maintaining two separate stack
pointers in two separate registers, everything in HotSpot is stored
on the ABI stack. This saves a register, but it requires at least
part of the interpreter to be written in assembly language, as the
ABI stack cannot be manipulated from within C or C++. Even if it
could be accessed by C or C++, the layout of the ABI stack is platform-specific, so the code
to create and access the frames would still require a separate implementation
for each platform.

There's no fundamental need for the stacks to be merged like this --
it's merely a nice optimization -- so in Zero, separate stacks are
used. The Java stack is simply a block of memory managed by some
simple, portable C++, and the ABI stack is left to manage itself.
This change eliminated a swathe of assembly language but raised an
issue with the object locking code, which allocates locks on
the Java stack but tests for locks by looking for pointers
into the ABI stack. Zero works around this by allocating
the Java stack's block of memory on the ABI stack,
using alloca().

Native Calls

Most methods in a Java application will be normal methods,
written in the Java programming language and executed using
interpretation or JIT compilation. In addition, however, Java also
allows for native methods, methods whose code is written in
C or C++, with the Java Native Interface (JNI) providing the bridge
between the two. The C code for a native method might look
something like this:

JNIEXPORT jboolean JNICALL
Java_java_lang_Class_isInstance(JNIEnv* env,
                                jobject cls,
                                jobject obj)
{
  if (obj == NULL)
    return JNI_FALSE;

  return (*env)->IsInstanceOf(env, obj, (jclass) cls);
}

Now, JNI allows for methods with any signature that the underlying
platform supports: they can have any number of arguments, with any
combination of types, and they can have any return type. This poses
a problem for HotSpot, or indeed any runtime written in C++, because
C++ can only call functions whose signature is known at compile
time, whereas with JNI signatures are only known at runtime.
HotSpot's native calling code has traditionally had to be written in
assembly language.

Zero uses a free software library
called libffi to handle
native calls. It was originally written to handle JNI calls for
GIJ, the GNU Interpreter for Java. In other words, libffi solved the exact problem we were facing, making it ideal for Zero. Assembly
language is still required to perform the call, but it's
encapsulated in libffi and the code in Zero is entirely C++.

Shark

Replacing the assembly language for stack manipulation and native
calls allows the whole of the C++ interpreter's support layer to be
rewritten in C++, and this, combined with some build system changes,
was enough to allow interpreter-only builds of OpenJDK on any Linux
system with GCC. This is great, but the resulting VM is very slow.
To get reasonable performance we needed to move beyond
interpretation and find a way to include a JIT compiler.

To implement a JIT without introducing platform-specific code, we
turned to another free software
library, LLVM (Low Level Virtual
Machine). LLVM has a wide range of applicability -- it's an infrastructure for
building both compilers and virtual machines -- but the feature that
was interesting for this project is that it includes JITs that
generate native functions from code expressed in LLVM's intermediate
representation (IR).

Shark is, in essence, very simple. It uses the same interface as
HotSpot's platform-specific compilers, so it slots in with very
little modification to HotSpot itself. When running in mixed mode,
HotSpot's compiler scheduler locates hot methods and invokes Shark
to compile them, one at a time. Shark translates the Java bytecode
of these methods to LLVM IR, and invokes LLVM's JIT to generate the
native code. The native code is then installed in the VM, where it
replaces the interpreted version of the method, and control returns
to the compiler scheduler.

The main difficulty for Shark is that object pointers need to be
available to HotSpot's garbage collectors (GC). HotSpot's existing
compilers have access to the native code they generate, which allows
them a certain flexibility here. They can leave pointers in
registers across GC runs, for example, because they can supply the
GC with information about which registers contain pointers. Shark
can't do this; object pointers need to be dumped to memory across GC
runs and restored afterwards. HotSpot's compilers can inline
pointers in the generated code, too, annotating the code so the GC
can locate and modify them. Again, Shark can't do this; object
pointers must be loaded from memory or passed around between
functions. These extra memory accesses impose a significant
overhead.

Conclusion

Zero and Shark have been written following the philosophy of minimal
modification to existing HotSpot code, an approach that has had a
number of advantages. Zero's development was extremely fast, from
the initial concept in December 2007 to a functional, stable VM that
could build itself in March 2008. It aids stability, too. A HotSpot
build with Zero comprises 6,500 lines of new code from Zero and
450,000 lines from HotSpot: 450,000 lines of code that has enjoyed
ten years of extensive use and rigorous testing. This helped
enormously when testing Zero builds with the Java Compatibility Kit
(JCK): most of the functionality under test was handled by the
original HotSpot code, so most of the issues were already taken care
of. Finally, this approach means Zero can take advantage of new
HotSpot features or optimizations with minimal or no effort. If
someone writes a new garbage collector, for example, then Zero and
Shark can use it straight away because Zero and Shark present
themselves to HotSpot's garbage collectors in exactly the same way
as the existing HotSpot code.

This approach is not without its disadvantages. The interpreter,
for example, does not need to be split into two layers for Zero, and
a rewrite could make it considerably faster. Shark, too, is
limited; the need to keep object pointers visible to HotSpot's
garbage collectors requires a lot of memory accesses that could be
avoided with a different interface. Zero and Shark were never about
extensive HotSpot modifications, however, and to a certain extent
they discourage them. Extensive modifications are, by definition, a
lot of work, and if you're going to do a lot of work chasing
ultimate performance, then why not go the whole way and do a
conventional port? Hand-crafted assembly language will always have
the edge. The point of Zero and Shark was to deliver a portable and
stable VM with reasonable performance. When Shark is ready for
production, that's what they'll be.

Resources


width="1" height="1" border="0" alt=" " />
Gary Benson joined Red Hat as a software engineer in Summer 2001, and has worked in the Open Source Java group and on extending and porting it to all the various platforms Red Hat supports.
Related Topics >> Linux   |   Programming   |   

Comments

quite interesting

very nice.. really.. a new perspective on what is cooking underneath the compiler...

Good Job

Thanks for the nice explanation. Eric R.

Shark as a Compiler?

If shark can translate byte code to native code, how hard would it be to just compile the entire jar. Could you make a native executable out of it?

Re: Shark as a Compiler?

Not really, it's pretty dependent on the rest of HotSpot being there. You could always try GCJ though...