Skip to main content

Source Code Analysis Using Java 6 APIs

April 10, 2008

{cs.r.title}






Have you ever thought of how tools like "http://checkstyle.sourceforge.net/">Checkstyle or "http://findbugs.sourceforge.net/">FindBugs perform a static
code analysis, or how Integrated Development Environments (IDEs)
like NetBeans or "http://www.eclipse.org/">Eclipse execute quick code fixes or
find the exact references of a field declared in your code? In many
cases, IDEs have their own APIs to parse the source code and
generate a standard tree structure, called an Abstract Syntax
Tree
(AST) or "parse tree," which can be used for deeper
analysis of the source elements. The good news is that it is now
possible to accomplish the said tasks plus a lot more with the help
of three new APIs introduced in Java as part of the Java Standard
Edition 6 release. The APIs that might be of interest to developers of Java
applications that need to perform source code analysis are the "http://www.jcp.org/en/jsr/detail?id=199">Java Compiler API
(JSR 199), the "http://www.jcp.org/en/jsr/detail?id=269">Pluggable Annotation
Processing API
(JSR 269), and the "http://java.sun.com/javase/6/docs/jdk/api/javac/tree/index.html">Compiler
Tree API.

In this article, we explore the features of each of these APIs
and go on to develop a simple demo application that verifies
certain Java coding rules on a set of source code files supplied as
input. This utility also shows the coding violation messages as
well as the location of violated source code as output. Consider a
simple Java class that overrides the equals() method
of the Object class. The coding rule to be verified is
that every class that implements the equals() method
should also override the hashcode() method with the
proper signature
. You can see that the TestClass
class below does not define the hashcode() method,
even though it has the equals() method.

[prettify]
public class TestClass implements Serializable {
 int num;

 @Override
  public boolean equals(Object obj) {
        if (this == obj)
                return true;
        if ((obj == null) || (obj.getClass() != this.getClass()))
                return false;
        TestClass test = (TestClass) obj;
        return num == test.num;
  }
}
[/prettify]

Let us go on and analyze this class as part of the build process
with the help of these three APIs.

Invoking the Compiler from Code: The Java Compiler API

We all use the javac command-line tool for
compiling Java source files to class files. Then why do we need an
API to compile Java files? Well, the answer is quite simple: as the
name describes, this new standard API lets us invoke the compiler
from our own Java applications; i.e., you can programmatically
interact with the compiler and thereby make compilation part of
application-level services. Some typical uses of this API are
listed below.

  • The compiler API helps application servers to minimize the time
    taken to deploy applications, for example, by avoiding the overhead
    of using an external compiler for compiling the servlet sources
    generated from the JSP pages.

  • Developer tools like IDEs and code analyzers can invoke the
    compiler from within the editor or build tools that significantly
    reduce the compile time.

The Java compiler classes are packaged under the
javax.tools package. The ToolProvider
class of this package provides a method called
getSystemJavaCompiler() that returns an instance of
some class that implements the JavaCompiler interface.
This compiler instance can be used to create a compilation task
that will perform the actual compilation. The Java source files to
be compiled will be then passed to the compilation task. For this,
the compiler API provides a file manager abstraction called
JavaFileManager, which allows Java files to be
retrieved from various sources, such as the file system, databases, memory, and so on. In this sample, we use StandardFileManager, a
file manager based on java.io.File. The standard file
manager can be acquired by calling the
getStandardFileManager() method of the
JavaCompiler instance. The code snippet for the
above-mentioned steps is shown below:

[prettify]
//Get an instance of java compiler
JavaCompiler compiler = ToolProvider.getSystemJavaCompiler();

//Get a new instance of the standard file manager implementation
StandardJavaFileManager fileManager = compiler.
        getStandardFileManager(null, null, null);
        
// Get the list of java file objects, in this case we have only 
// one file, TestClass.java
Iterable<? extends JavaFileObject> compilationUnits1 = 
        fileManager.getJavaFileObjectsFromFiles("TestClass.java");
[/prettify]

A diagnostic listener can be optionally passed to the
getStandardFileManager() method to produce diagnostic
reports of any non-fatal problems. In this code snippet, we pass
null values, since we are not collecting the
diagnostics from the tool. For details of the other parameters
passed to these methods, please refer to the "http://java.sun.com/javase/6/docs/api/javax/tools/JavaCompiler.html">
Java 6 API.
The getJavaFileObjectsfromFiles()
method of the StandardJavaFileManager returns all the
JavaFileObject instances that correspond to the
supplied Java source files.

The next step is to create the Java compilation task, which can
be obtained using the getTask() method of
JavaCompiler. At this point, the compilation
task has not yet started. The task can be triggered by invoking the
call() method of CompilationTask. The
code fragment for creating and triggering a compilation task is
shown below.

[prettify]
// Create the compilation task
CompilationTask task = compiler.getTask(null, fileManager, null,
                                        null, null, compilationUnits1);
                                        
// Perform the compilation task.
task.call();
[/prettify]

Assuming no compilation errors, this will generate the
TestClass.class file in the destination directory.

Annotation Processing: The Pluggable Annotation Processing
API

As we all know, Java SE 5.0 introduced support for adding and
processing metadata or annotations to elements like Java
classes, fields, methods, etc. Annotations are typically processed
by build tools or runtime environments to perform useful tasks like
controlling application behavior, generating code, and so on. Java
5 allows both compile-time and runtime processing of annotated
data. Annotation processors are utilities that can be
dynamically plugged into the compiler to analyze source files and
process annotations in them. Annotation processors can make the fullest
use of metadata information to perform many tasks, including but
not limited to the following.

  • Annotations can be used to generate deployment descriptor files
    like persistence.xml or ejb-jar.xml, in the cases of
    entity classes and enterprise beans, respectively.

  • Annotation processors can use metadata information to generate
    code. For example, a processor can generate Home and Remote
    interfaces of a properly annotated enterprise bean.

  • Annotations can be used to verify the validity of code or
    deployment units.

Java 5.0 provided an Annotation Processing Tool (APT)
and an associated mirror-based reflection API
(com.sun.mirror.*) for processing annotations and
modeling the processed information. The APT tool runs matching
annotation processors for the annotations present in the supplied
Java source files. The mirror API provides a compile-time, read-only
view of the source file. The main drawback of APT is that it is not
standardized; i.e., APT is specific to the Sun JDK.

Java SE 6 has introduced a new feature called the Pluggable
Annotation Processing
framework, which provides standardized
support for writing customized annotation processors. It is called
"pluggable" because the annotation processor can be plugged into
javac dynamically and can operate on a set of
annotations that appears in the Java source file. This framework
has two parts: an API for declaring and interacting with annotation
processors -- the package javax.annotation.processing --
and an API for modeling the Java programming language -- the package
javax.lang.model.

Writing a Custom Annotation Processor

The following section explains how to write a custom annotation
processor and plug it into the compilation task. The custom
annotation processor extends AbstractProcessor (which
is the default implementation of the Processor
interface) and overrides the process() method.

The annotation processor class will be decorated with two class-level annotations, @SupportedAnnotationTypes and
@SupportedSourceVersion. The
SupportedSourceVersion annotation specifies the latest
source version the annotation processor supports. The
SupportedAnnotationTypes annotation denotes which
annotations this particular annotation processor is interested in.
For example,

@SupportedAnnotationTypes
("javax.persistence.*")
is to be used if the processor only
needs to process Java Persistence API (JPA) annotations. It is
interesting to note that the annotation processor is invoked even
though no annotations are present if the supported annotation types
are specified as @SupportedAnnotationTypes("*"). This
allows us to take advantage of the modelling API along with the
Tree API to do general purpose source code processing. Using these
APIs, it is possible to get a lot of useful information pertaining
to modifiers, fields, methods, and more. The code snippet of a
custom annotation processor is given below:

[prettify]
@SupportedSourceVersion(SourceVersion.RELEASE_6)
@SupportedAnnotationTypes("*")
public class CodeAnalyzerProcessor extends AbstractProcessor {
    @Override
    public boolean process(Set<? extends TypeElement> annotations,
            RoundEnvironment roundEnvironment) {
        for (Element e : roundEnvironment.getRootElements()) {
                System.out.println("Element is "+ e.getSimpleName());
                // Add code here to analyze each root element
        }
        return true;
    }
}
[/prettify]

The annotation processors are invoked depending on what
annotations are present in the source code, which processors are
configured to be available, and what annotation types the available
processors process. Annotation processing may happen in multiple
rounds. For example, in the first round, the original input Java
source files will be processed; in the second round, the files
generated by the first round of processing will be considered, and
so on. The custom processor should override the
process() of AbstractProcessor. This
method takes two arguments:

  1. A set of TypeElements/annotations found in the
    source file.
  2. The RoundEnvironment that encapsulates information
    about the current processing round of the annotation
    processor.

If a processor claims the annotation types it supports, the
process() method returns true and other
processors are not invoked for these annotations. Otherwise, the
process() method returns a false value
and the next available processor will be invoked, if any.

Plugging In the Annotation Processor

Now that the custom annotation processor is ready to use, let's
see how we can invoke this processor as part of the compilation
process. The processor can be invoked either through
the javac command-line utility or programmatically through
a standalone Java class. The javac utility of Java SE
6 provides an option called -processor that accepts
the fully qualified name of the custom annotation processor to be
plugged in. The syntax for this is as follows:

[prettify]
javac -processor demo.codeanalyzer.CodeAnalyzerProcessor TestClass.java
[/prettify]

where CodeAnalyzerProcessor is the annotation
processor class and TestClass is the input Java file
to be processed. This utility searches for
CodeAnalyzerProcessor in the classpath; hence, it is
important to place this class in the classpath.

The modified code snippet for plugging in the processor
programmatically is shown below. The setProcessors()
method of the CompilationTask allows multiple
annotation processors to be plugged into the compilation task. This
method needs to be called before the call() method.
Also note that if an annotation processor is plugged into the
compilation task, the annotation processing happens first, and then only
the compilation task. Needless to say, annotation processing
will not happen if the code causes compilation errors.

[prettify]
CompilationTask task = compiler.getTask(null, fileManager, null,
                                        null, null, compilationUnits1);
                                        
// Create a list to hold annotation processors
LinkedList<AbstractProcessor> processors = new LinkedList<AbstractProcessor>();

// Add an annotation processor to the list
processors.add(new CodeAnalyzerProcessor());

// Set the annotation processor to the compiler task
task.setProcessors(processors);

// Perform the compilation task.
task.call();
[/prettify]

If we execute the above code, it causes the annotation processor
to fire during the compilation of TestClass.java
printing the name "TestClass."

Accessing the Abstract Syntax Tree: The Compiler Tree API

The Abstract Syntax Tree (AST) is the read-only view of
the source that represents the Java code as a tree of nodes,
where each node represents a Java programming language construct
or Tree, and the children of each node signify
meaningful components of these trees. For example, a Java class is
represented as a ClassTree, method declarations are
represented as MethodTrees, variable declarations as
VariableTrees, annotations as
AnnotationTree and so on.

The Compiler Tree API provides access to the Abstract Syntax
Tree of Java source code and also provides some utilities like
TreeVisitors, TreeScanners, etc., for
performing operations on the AST. A deeper analysis of the source
content can be done using a TreeVisitor, which visits
all child tree nodes to extract the required information about
fields, methods, annotations, and other class elements. The tree
visitors are implemented in the style of visitor design pattern.
When a visitor is passed to a tree's accept method, the
visitXYZ method most applicable to that tree is invoked.

The Java Compiler Tree API provides three implementations of
TreeVisitor; namely, SimpleTreeVisitor,
TreePathScanner, and TreeScanner. The demo
application uses a TreePathScanner to extract
information about the Java source file. The
TreePathScanner is a TreeVisitor that
visits all the child tree nodes and provides support for
maintaining a path for the parent nodes. The scan()
method of the TreePathScanner needs to be invoked to
scan the tree. To visit nodes of a particular type, just override
the corresponding visitXYZ method. Inside your visit
method, call super.visitXYZ to visit descendant nodes.
The code snippet of a typical visitor class is shown below:

[prettify]
public class CodeAnalyzerTreeVisitor extends TreePathScanner<Object, Trees>  {
    @Override
    public Object visitClass(ClassTree classTree, Trees trees) {
        ---- some code ----
        return super.visitClass(classTree, trees);
    }
    @Override
    public Object visitMethod(MethodTree methodTree, Trees trees) {
        ---- some code ----
        return super.visitMethod(methodTree, trees);
    }
} 
[/prettify]

You can see that the visit methods accepts two arguments: the
tree representing the node (ClassTree for the class
node, MethodTree for the method node, etc.) and a
Trees object. The Trees class provides
utility methods for fetching path information of elements in the
tree. It is important to note that the Trees object
acts as a bridge between JSR 269 and the Compiler Tree API. In this
sample, there is only a single root element, which is
TestClass itself.

[prettify]
CodeAnalyzerTreeVisitor visitor = new CodeAnalyzerTreeVisitor();

@Override
public void init(ProcessingEnvironment pe) {
        super.init(pe);
        trees = Trees.instance(pe);
}
for (Element e : roundEnvironment.getRootElements()) {
        TreePath tp = trees.getPath(e);
        // invoke the scanner
        visitor.scan(tp, trees);
}
[/prettify]

The following section explains the retrieval of source code
information using Tree API and populates a common model used for code verification later. The visitClass()
method is called whenever a class, interface, or enum type is
visited within the AST with ClassTrees as the argument.
Similarly, the visitMethod() method is called for all the
methods with MethodTree as argument,
visitVariable() for all the variables with
VariableTree as the argument, and so on.

[prettify]
@Override
public Object visitClass(ClassTree classTree, Trees trees) {
         //Storing the details of the visiting class into a model
         JavaClassInfo clazzInfo = new JavaClassInfo();

        // Get the current path of the node     
        TreePath path = getCurrentPath();

        //Get the type element corresponding to the class
        TypeElement e = (TypeElement) trees.getElement(path);

        //Set qualified class name into model
        clazzInfo.setName(e.getQualifiedName().toString());

        //Set extending class info
        clazzInfo.setNameOfSuperClass(e.getSuperclass().toString());

        //Set implementing interface details
        for (TypeMirror mirror : e.getInterfaces()) {
                clazzInfo.addNameOfInterface(mirror.toString());
        }
        return super.visitClass(classTree, trees);
  }
[/prettify]

The JavaClassInfo used in this code snippet is the
custom model for storing information about the Java code. After
executing this code, information pertaining to the class like the
fully qualified class name, superclass name, interfaces implemented
by TestClass, etc., are extracted and stored in our
custom model for future verification purposes.

Setting the Source Location

So far we have been busy obtaining the information about various
nodes of the AST and populating the model objects for class, method,
and field information. With this information, we can verify if the
source follows good programming practices, conforms to
specifications, and so on. This information can be quite useful to
verification tools like Checkstyle or FindBugs, but they might also
require the location details of the source token(s) that violates
the rule, so that they provide the error location details to
users.

The SourcePositions object, which is part of the
Compiler Tree API, maintains the positions of all the AST nodes
within the compilation unit tree. This object provides useful
information about start and end positions of a
ClassTree, MethodTree,
FieldTree, etc., within the file. A position is
defined as a simple character offset from the start of a
CompilationUnit where the first character is at offset
0. The code snippet below shows how we can get the
character offset position of the passed Tree object
from the start of the compilation unit.

[prettify]
public static LocationInfo getLocationInfo(Trees trees, 
                                                TreePath path, Tree tree) {
        LocationInfo locationInfo = new LocationInfo();
        SourcePositions sourcePosition = trees.getSourcePositions();
        long startPosition = sourcePosition.
                        getStartPosition(path.getCompilationUnit(), tree);
        locationInfo.setStartOffset((int) startPosition);
        return locationInfo;
}
[/prettify]

However, if we need to get the position of the token that gives
the name of the class or method itself, then this would not
suffice. To find the actual token positions within the source, one
option is to search for the tokens within the char content of the
source files. We can get the char content from the
JavaFileObject corresponding to the compilation unit
first as demonstrated below.

[prettify]
//Get the compilation unit tree from the tree path
CompilationUnitTree compileTree = treePath.getCompilationUnit();

//Get the java source file which is being processed
JavaFileObject file = compileTree.getSourceFile();

// Extract the char content of the file into a string
String javaFile = file.getCharContent(true).toString();

//Convert the java file content to a  character buffer
CharBuffer charBuffer = CharBuffer.wrap (javaFile.toCharArray()); 
[/prettify]

The following code snippet locates the position of the class
name token within the source. The
java.util.regex.Pattern and
java.util.regex.Matcher classes are used to obtain the
actual position of the class name token. The content of Java source
is converted to a character buffer using
java.nio.CharBuffer. The matcher searches for the
first occurrence of the token matching the class name in the
character buffer, starting from the start position of the class
tree within the compilation unit tree.

[prettify]
LocationInfo clazzNameLoc = (LocationInfo) clazzInfo.
                        getLocationInfo();
 int startIndex = clazzNameLoc.getStartOffset();
 int endIndex = -1;
 if (startIndex >= 0) {
   String strToSearch = buffer.subSequence(startIndex, 
   buffer.length()).toString();
   Pattern p = Pattern.compile(clazzName);
   Matcher matcher = p.matcher(strToSearch);
   matcher.find();
   startIndex = matcher.start() + startIndex;
   endIndex = startIndex + clazzName.length();
  } 
 clazzNameLoc.setStartOffset(startIndex);
 clazzNameLoc.setEndOffset(endIndex);
 clazzNameLoc.setLineNumber(compileTree.getLineMap().
              getLineNumber(startIndex));
[/prettify]

The LineMap class of the Complier Tree API provides
a map of character positions and line numbers within the
CompilationUnitTree. We can get the line number of the
concerned token by passing the start offset position to the
getLineMap() method of the CompilationUnitTree.

Verifying the Source Against Rules

Now that we have successfully retrieved the required information
from the AST, the next task is to verify if the predefined coding
standard rules are satisfied by the sources under consideration.
The coding rules are configured in an XML file and are managed by a
custom class called RuleEngine. This class fetches
rules from the XML file and fires them one by one. If a rule is
not satisfied by the class, it returns a list of
ErrorDescription objects. The
ErrorDescription object encapsulates the error
messages and the location of the error in the source code.

[prettify]
ClassFile clazzInfo = ClassModelMap.getInstance().
                getClassInfo(className);
for (JavaCodeRule rule : getRules()) {
        // apply rules one by one
        Collection<ErrorDescription> problems = rule.execute(clazzInfo);
        if (problems != null) {
                problemsFound.addAll(problems);
        }
}
[/prettify]

Each rule is implemented as a Java class; the model information
of the class to be verified is passed on to this class. The rule
class encapsulates the logic to verify the rule logic using this
model information. The implementation of a sample rule
(OverrideEqualsHashCode) is shown below. This rule
mandates that a class that overrides the equal()
method should also override the hashcode() method.
Here we iterate through the methods of the class and check if it
follows the equals() and hashcode()
contract. In TestClass, the hashcode()
method is absent while the equals() method is present,
causing the rule to return the ErrorDescription model
containing the appropriate error message and the location details
of the error.

[prettify]
public class OverrideEqualsHashCode extends JavaClassRule {
    @Override
    protected Collection<ErrorDescription> apply(ClassFile clazzInfo) {
        boolean hasEquals = false;
        boolean hasHashCode = false;
        Location errorLoc = null;
        for (Method method : clazzInfo.getMethods()) {
            String methodName = method.getName();
            ArrayList paramList = (ArrayList) method.getParameters();
            if ("equals".equals(methodName) && paramList.size() == 1) {
                if ("java.lang.Object".equals(paramList.get(0))) {
                    hasEquals = true;
                    errorLoc = method.getLocationInfo();
                }
            } else if ("hashCode".equals(methodName) &&
                method.getParameters().size() == 0) {
                hasHashCode = true;
            }
        }
        if (hasEquals) {
            if (hasHashCode) {
                return null;
            } else {
                StringBuffer errrMsg = new StringBuffer();
                errrMsg.append(CodeAnalyzerUtil.
                                getSimpleNameFromQualifiedName(clazzInfo.getName()));
                errrMsg.append(" : The class that overrides 
                                        equals() should ");
                errrMsg.append("override hashcode()");
                Collection<ErrorDescription> errorList = new 
                                                ArrayList<ErrorDescription>();
                errorList.add(setErrorDetails(errrMsg.toString(), 
                                                        errorLoc));
                return errorList;
            }
        }
        return null;
    }
}
[/prettify]

Running the Sample

You can download the binary file of this demo application from
the Resources section. Save this file to
any local directory. Use the following command to execute the
application from command prompt:

[prettify]
java -classpath \lib\tools.jar;.; demo.codeanalyzer.main.Main 
[/prettify]

Summary

This article discussed how the new Java 6 APIs can be utilized
to invoke the compiler from Java code and how to parse and analyze
source code using pluggable annotation processors and tree
visitors. Using standard Java APIs instead of IDE-specific
parsing/analyzing logic makes code reuse possible across different
tools and environments. We have only scratched the surface of the
three compiler-related APIs here; you can find many more useful
features by taking a deeper plunge into these APIs.

Resources


width="1" height="1" border="0" alt=" " />
Seema Richard is a Java architect at the Trivandrum-based software company UST Global.
Related Topics >> Programming   |   Testing   |   

Comments

Compilation error: "The

Compilation error: "The method getJavaFileObjectsFromFiles(Iterable<? extends File>) in the type StandardJavaFileManager is not applicable for the arguments (String)" on the following line of code: Iterable<? extends JavaFileObject> compilationUnits1 = fileManager.getJavaFileObjectsFromFiles("TestClass.java");

Broken source code link! Very

Broken source code link!
Very interesting article to parse java code!
Use fileManager.getJavaFileObjects("TestClass.java") to fix compile error.

Source Code Link and

Source Code Link and Binary(JAR) Link are broken......

Could you please fix code listings? They are broken (instead ...

Could you please fix code listings? They are broken (instead of listings I can see [prettify] and all code in one line, unreadable). Thanks a lot in advance!

The code listings have been formatted correctly.

The code listings have been formatted correctly.

How can I access comments in the source code while visiting ...

How can I access comments in the source code while visiting the AST?

My example:

public class Test {

    public static String test() {
       String res = "Result"; //this is line 1 of method test
       res += " of test method"; /*this is line 2*/
       return res;  /**@line 3*/
    }

}

All three examples (line 1, 2 and 3 of test()) can work, I need to access the comment, also the ability to modify it so I can write modified code onto another file.
The third one would be using javadoc, on the return tree I wish to access the javadoc and be able to collect the tag name (line) and the value associated (3) and then be able to modify the value.

How can I do this (specially the third one)?

In CodeAnalyzerProcessor, I had to move the creation of the ...

In CodeAnalyzerProcessor, I had to move the creation of the visitor inside the loop, otherwise information accumulates in a single JavaClassInfo object that is shared between all classes. Thanks for an interesting article.