Performance is important. For the types of large-scale
multi-user applications that Java is commonly used to develop, it
can be vital. Unfortunately, identifying performance problems
before they occur is very difficult and fixing them afterwards is
usually very costly. In this article I outline an approach to
boosting the performance of Java code using the embedded compiler
library Janino. It can be
applied to some performance problems even after all other
approaches have been exhausted.
It is a simple observation that software cannot be faster than
the layer beneath it. It follows that improving a program's
performance may require "dropping" to a lower layer to write
performance-critical sections of code. This is a standard technique
that has evolved with each new generation of languages. C
developers revert to assembly language; Java developers to C; and even
JavaScript developers to Java.
This is a bitter situation for Java developers because one of
Java's main benefits is the homogenous environment it provides to
insulate applications from the host system. Revert to a lower-level
language and that is lost. However, it turns out that Java
developers can play this game by compiling their application logic
directly into Java. This effectively moves the evaluation of
domain-specific languages to lower-level Java bytecode, but it's
important to see this approach in the larger context.
Performance: Traditional Approaches
The implications of building slow Java applications go beyond
the observed performance of the delivered system. It has an impact
on developer productivity, deployment costs, and the composability
of the application into larger systems. So what are the options for
companies that have developed a Java application but need better
performance?
Employ better algorithms.
In my experience, this is the safest initial approach to
improving performance. Profiling an application and then changing
poorly performing objects usually requires only local code changes.
These are generally easy to test for compatibility with the
previous implementation.
Deploy better hardware.
This is often the option that provides the greatest initial
return in performance. Doubling the power of a single server is
often cheap compared to the developer hours to make software
faster. But this does not take into account many complicating
factors including the needs of developers, the possibility that the
application will be deployed multiple times, and the scope for
making hardware upgrades.
Depend on faster software.
It is an obvious fact that some Java code (in the form of
libraries, application servers, database drivers, etc.) performs
better than other code. Ensuring that your application depends on
only well-performing external code is obviously a good strategy for
ensuring good performance. Unfortunately, this is difficult to
determine beforehand, since the performance characteristics for any
library are dependent on its operating environment and that is
usually uncertain until the latest phases of development.
Improve the application architecture.
This is often the last resort due to the human resources it
requires. Often a system will need to be partially rewritten and
then fully reevaluated after each architectural change. For
sufficiently large projects, localized architectural changes are
possible, but then in these circumstances the risks are
commensurately higher.
It's frightening how quickly you can exhaust the obviously
available approaches, especially on complex systems where the room
for independent maneuver is small. So what's left?
The Janino Library
Janino is an open source
(BSD-licensed) embedded Java compiler. It is not a compiler that a
developer would use to build an application. Instead, it's
designed to be used within Java applications to compile Java code
on the fly.
Any Java developer who has used JSP will be familiar (if only
indirectly) with the technique of dynamically compiling Java source
code (which is embedded within web pages in the case of JSP) into
classes "on demand." They might also be unfortunate enough to be
familiar with the complications that can arise from it. The reason
that this technique has proven awkward in the past is that any code
that needs to use this technique requires a JDK to be installed
(not just a JRE). This can introduce licensing issues because the
JDK is not (yet) freely distributable. In addition, platform-specific configuration
is needed to locate an appropriate JDK and
to identify the application's class files. Portable implementations
also require that compilation occurs in a separate system process,
a detail that can cause performance issues of its own.
Despite these problems, open source libraries such as Jasper
have provided excellent implementations based on this approach. The
Java-based build tool Ant also
needs to cope with these complications when compiling Java code.
Nevertheless, for the reasons outlined above, the dynamic
compilation of Java code is usually regarded as a last resort--in
the case of JSP, the specification requires it--and certainly not
generally sound practice within an application.
Janino improves on this situation in three different ways:
Instead of passing the work onto javac (or an equivalent Java
compiler), Janino is a Java library that runs in the same JVM as
your application. No extra configuration or JDK installation is
necessary.
Rather than requiring access to your application's class files
and .jar files, Janino obtains classes directly from the JVM. This
means that there are no problems concerning file permissions or
build path configurations.
Janino provides easy-to-use primitives for compiling
expressions, scripts, and classes. Developers using the library
don't need to concern themselves with any of the technicalities of
loading the dynamically generated code: it's done for them. At the
simplest level, they can pass in a string containing Java code and
get back an object.
The broader implication of Janino's availability is that
compiling source code within Java applications, which was once
difficult, messy, and platform-specific, is now none of those
things. It makes possible the targeting of performance hotspots
through the technique of dynamically compiling code.
Applying the Technique
Much of the inefficiency in large applications arises from the
need to make them as flexible as possible. This means that much of
the program's functionality is devolved to individually configured
units. To take an example, an application that is responsible for
crunching data reports may depend upon a whole range of factors
that all affect the final formatting of a report. These might
include the user's locale, the server's locale, current data set,
user preferences, user permissions, and so on. This means that most
program actions need to be delegated many times. For complex
operations over large datasets, this translates to poor performance.
Dynamic code compilation can tackle problems like these by
compiling the rules once instead of evaluating them for each
operation.
The effectiveness of this approach for a given application
depends in general on the answers to two questions:
To what degree is the application configured at runtime?
To what degree is the performance problem localized in the
configured code?
If both are significant, then the application can probably
benefit from dynamic compilation; otherwise, any gains are likely to
be marginal. Naturally, the overhead of compilation is relatively
high, so the application will need to be performing a significant
amount of work to benefit. Specialized libraries that provide
"opaque" services (e.g. searching, pattern matching, or numerical
calculation) are also likely to be good candidates.
To evaluate this technique for my own applications, I produced a
factory class called BasicEvalFactory that can
pre-parse a simple language for performing basic arithmetic over a
set of numeric variables (multiplication, division, addition, and
subtraction). The implementation is intentionally very basic but
care was taken to ensure that evaluation was efficient and
pre-optimized for good performance; remember, we want to compare
Janino to code that is already optimized because that is the
scenario we are in.
The code for the BasicEvalFactory class is 400
lines long and contains a large number of inner classes; too long
to list here, the source is freely available for download
together with all the code used in the evaluation (see the Resources section). All of this code results in a
satisfaction of this simple interface:
public interface Evaluator {
public double evaluate(Variables vars);
}
The evaluate method calculates the value of an expression based
on the variables obtained from this interface:
public interface Variables {
double getVariable(String name);
}
I then produced a second implementation using Janino. Unlike the
first implementation, this one is short enough to be listed in its
entirety here. This is because I was able to simply translate the
expressions I want to evaluate into Java expressions. The power of
this technique is that Janino is doing the work of parsing the
expression and the JVM is doing the work of evaluating it. As a
result, much less code is needed and the implementation is simple
enough to be listed here. It provides a straightforward example of
how to use Janino:
public class JaninoEvalFactory {
private static Pattern PATTERN =
Pattern.compile("([a-zA-Z]+)");
private static SimpleCompiler compiler =
new SimpleCompiler();
public static Evaluator fromString(String string) {
StringBuffer varCode = new StringBuffer();
Matcher matcher = PATTERN.matcher(string);
Set names = new HashSet();
while (matcher.find()) {
String name = matcher.group(0);
if (names.contains(name))
continue;
varCode.append("double " + name
+ " = vars.getVariable(\"" + name
+ "\");");
names.add(name);
}
String source = "package janinotest.eval;\n"
+"public class JaninoEvaluator implements Evaluator {\n"
+"\tpublic double evaluate(Variables vars) {\n" + "\t\t"
+varCode + "\n" + "\t\treturn "+ string +";\n" + "\t}\n"
+"}\n";
try {
compiler.cook(new StringReader(source));
Class clss = compiler.getClassLoader().loadClass(
"janinotest.eval.JaninoEvaluator");
Evaluator eval = (Evaluator) clss.newInstance();
return eval;
} catch (Exception e) {
throw new IllegalArgumentException(e.getMessage());
}
}
}
The single static method in this class does the following
work:
The variables in the expression are identified using
PATTERN and are used to generate a String
containing Java code to assign them values from a
Variables instance that is to be supplied.
This is combined with the supplied expression and a basic class
definition to produce the Java source for an implementation of the
Evaluator interface.
The source is supplied to a Janino SimpleCompiler
object that cooks (compiles) the source code.
The class is then loaded via the compiler's class loader and
instantiated using reflection.
The finished object is returned.
For example, when supplied with the string "(x + y)/(x -
y) * 100/(x*y)", the resulting source is:
public class JaninoEvaluator implements Evaluator {
public double evaluate(Variables vars) {
double x = vars.getVariable("x");
double y = vars.getVariable("y");
return (x + y)/(x - y) * 100/(x*y);
}
}
Those adopting Janino should make themselves aware that there
are some limitations on the Java compilation Janino performs,
though very few. Key among these are the lack of support for any
Java 1.5 language features such as generics and the new
for-loop syntax.
Comparing Performance
I compared the performance of these two implementations using
three expressions of varying complexity, and the results are charted
below in Figure 1. Java was invoked without any flags. The command
line used to launch the test is given below. To run the performance
test yourself you will naturally need to adjust the classpath
appropriately.
As one might expect, the benefits of compilation become more
apparent with increasing complexity. The evaluation of the
moderate expression shows a 7x speed improvement.
These figures are not provided as a benchmark (nor do they prove
that speed increases will be realized), but they are useful
indication of the degree of improvement that developers can expect
in some scenarios.
0 (trivial)
100 * x + 20 / 2 (simple)
(x + y)/(x - y) * 100/(x * y) (moderate)
Figure 1. Chart demonstrating superiority of compiled execution
It is vital that anyone who is looking to adopt Janino for their
own projects perform their own evaluations; optimizations on
established applications rarely produce simple wins. In an
evaluation I conducted previously, I evaluated the performance of
Janino for speeding up the substitution of tokens like
${token-name} within Java strings. Figure 2 below
shows a chart of the results. I compared the following
implementations:
naive-map
A simple but efficient implementation using Java's regular
expression parsing to replace tokens with strings from a map.
fast-map
An optimized implementation of naive-map that pre-stores the
decomposition of the tokenized string.
janino-map
An implementation that pre-compiles the code to generate the
string from map values.
fast-object
An optimized implementation that uses reflection to draw token
values directly from a POJO (Plain Old Java Object).
janino-object
An implementation that pre-compiles the code to generate the
string from map values.
hardwired-object
To benchmark the best possible time, this implementation draws
values directly from a specific Java class.
Figure 2. Chart demonstrating inferiority of compiled
execution
To my surprise, Java's reflective method invocation turns out to be so fast (in this instance) that the JDK-compiled code outperforms Janino's. The optimized map implementation is also faster than the compiled version. These results arise from optimizations that are made by the Sun's Java compiler but not by Janino. In particular, the strategy for implementing the string concatenation operator differs between the compilers. The good news is that in response to this evaluation, Janino's string handling has since been optimized and its performance is now very competitive with that of javac.
Anyone interested in running this performance test for
themselves can do so by downloading the source (see Resources), compiling it, and executing it with the
command line below (again with appropriate modifications to the
class path).
The lesson to draw from this result is that the technique of
optimizing code through dynamic compilation is valuable, but should
only be applied in instances where performance gains can be proved.
As ever, there is no silver bullet.
Some Closing Ideas
There are many situations that lend themselves to this
approach. This list might provide some ideas.
Use of the Java Proxy class can be replaced by dynamically
compiled classes that avoid all reflection, thereby providing a
significant performance improvement.
Requests on the application that include user-defined functions
(which would normally need to be parsed and then evaluated as a
domain-specific language) can be compiled and then evaluated as
Java code much more quickly.
Data records in which the fields are fixed, but not known at
compile time, are typically implemented using
HashMaps. Using Janino, the known fields can be used
to construct a specific private field for each record value. With
some extra work, the fields can still be exposed through the
Map interface. If the number of record types is low as
a proportion of the number of records, the memory savings can be
very significant.
Conclusion
There are many more ways that Janino could be used in
applications and I recommend that anyone who needs better-performing Java code investigate this approach.
Another Janino usage sample
2007-02-22 12:30:37 ejboy
[Reply | View]
Scriptella ETL provides support for Janino out of the box.
In my case JavaScript or JEXL scripts may be faster to write, but Janino is obviously faster. Download examples distribution and check ODBC sample for Janino integration demo.
See also Janino Driver JavaDoc
What about JSR199?
2007-02-15 13:13:50 riejo
[Reply | View]
Thanks for that nice article - I really like the dynamic compilation approach. Actually I using something similar in a dbc-implementation which is part of master's thesis. However, it left me with a question.
With the implementation of JSR 199 in Java 6, the java compiler (javac, or any other) can be dynamically providing access at high abstraction level. Check out the API docs. For me it seems to be equivalent to Janino, or am I wrong?
Now I am wondering if one should use a third party tool like Janino or stick to the JSR199 java compiler? I guess Janino is a little more comfortable to use, but the JSR199 compiler always supports the latest Java versions.
Cheers, Johannes
What about JSR199?
2007-02-15 17:19:52 sptz45
[Reply | View]
If I am correct, with JSR-199 APIs you can only compile Java classes. You cannot compile and evaluate Java expressions, you need a surrounding Java class.
With Janino you can use the ScriptEvaluator class to compile and evaluate Java expressions (like "2+3") or scripts written in Java without any class definitions.
What about JSR199?
2007-02-15 14:08:56 tomgibara
[Reply | View]
Johannes,
I'm glad you enjoyed the article. Your question, "is Janino equivalent to JSR 199" is a tricky one to answer directly because it depends on whether it's equivalent for your purposes. The two certainly have a significant overlap in functionality. Both are intended to be used by Java developers for the purpose of compiling java source-code within applications. But there are significant differences. JSR 199 is clearly intended to be a layer over javac (or its equivalent within a Java distribution) whereas Janino is clearly intended to operate as a Java library (though it can be used independently as command-line compiler). This means that the JSR199 will almost certainly provide a broadly more powerful compiler than Janino; this might matter for your application or it might not.
However, this additional power comes at a cost in complexity which the JSR199 API won't shield you from. To start with, your comment about JSR199 being available in all latest Java versions is alas not true. The tools package description states:
These interfaces and classes are required as part of the Java Platform, Standard Edition (Java SE), but there is no requirement to provide any tools implementing them.
and
There is no requirement for a compiler at runtime.
So availability is not guaranteed and nor is the source/target compatibility of any compiler that might be available. Janino on the other hand provides a fixed scope of functionality for the price of a library dependency. In other words, you can depend on a specific version of the Janino library and know exactly compilation capabilities you can depend on. Again, this might matter for your application, or it might not.
One additional, and very important benefit of Janino's operation as a library is that it has access to the classes within your application without depending on their presence externally on the file system*. I don't believe that JSR 199 caters for this at all.
I'm afraid I can't answer your second question as to which is better for your master's thesis, but hopefully the points above will provide some guidance.
* One a mildly related point - I've always believed it was a lost opportunity that javac did not support linking to classes from elsewhere than the filesystem, say via a http URL - to my knowledge its still not possible to do with javac. This limitation means that dynamic code generation must either be done pre or post compilation. Unfortunate, since producing something like Java classes for DB persistence might be best done during compilation with a 'class server' that could convert the database schema into Java classes at compile time. This would keep the build process loosely coupled with the database without resorting to dynamic class generation within the application.
Tom
What about JSR199?
2007-02-16 00:58:51 riejo
[Reply | View]
Thanks for you reply. You are right about the availability - I've missed that. Still, I think Sun will not change the current implementation so I am confident that there will be a runtime-compiler in Java supporting the latest version.