The Source for Java Technology Collaboration
User: Password:



   

Source Code Analysis Using Java 6 APIs Source Code Analysis Using Java 6 APIs

by Seema Richard, Deepa Sobhana
04/10/2008

Have you ever thought of how tools like Checkstyle or FindBugs perform a static code analysis, or how Integrated Development Environments (IDEs) like NetBeans or Eclipse execute quick code fixes or find the exact references of a field declared in your code? In many cases, IDEs have their own APIs to parse the source code and generate a standard tree structure, called an Abstract Syntax Tree (AST) or "parse tree," which can be used for deeper analysis of the source elements. The good news is that it is now possible to accomplish the said tasks plus a lot more with the help of three new APIs introduced in Java as part of the Java Standard Edition 6 release. The APIs that might be of interest to developers of Java applications that need to perform source code analysis are the Java Compiler API (JSR 199), the Pluggable Annotation Processing API (JSR 269), and the Compiler Tree API.

In this article, we explore the features of each of these APIs and go on to develop a simple demo application that verifies certain Java coding rules on a set of source code files supplied as input. This utility also shows the coding violation messages as well as the location of violated source code as output. Consider a simple Java class that overrides the equals() method of the Object class. The coding rule to be verified is that every class that implements the equals() method should also override the hashcode() method with the proper signature. You can see that the TestClass class below does not define the hashcode() method, even though it has the equals() method.


public class TestClass implements Serializable {
 int num;

 @Override
  public boolean equals(Object obj) {
        if (this == obj)
                return true;
        if ((obj == null) || (obj.getClass() != this.getClass()))
                return false;
        TestClass test = (TestClass) obj;
        return num == test.num;
  }
}

Let us go on and analyze this class as part of the build process with the help of these three APIs.

Invoking the Compiler from Code: The Java Compiler API

We all use the javac command-line tool for compiling Java source files to class files. Then why do we need an API to compile Java files? Well, the answer is quite simple: as the name describes, this new standard API lets us invoke the compiler from our own Java applications; i.e., you can programmatically interact with the compiler and thereby make compilation part of application-level services. Some typical uses of this API are listed below.

  • The compiler API helps application servers to minimize the time taken to deploy applications, for example, by avoiding the overhead of using an external compiler for compiling the servlet sources generated from the JSP pages.

  • Developer tools like IDEs and code analyzers can invoke the compiler from within the editor or build tools that significantly reduce the compile time.

The Java compiler classes are packaged under the javax.tools package. The ToolProvider class of this package provides a method called getSystemJavaCompiler() that returns an instance of some class that implements the JavaCompiler interface. This compiler instance can be used to create a compilation task that will perform the actual compilation. The Java source files to be compiled will be then passed to the compilation task. For this, the compiler API provides a file manager abstraction called JavaFileManager, which allows Java files to be retrieved from various sources, such as the file system, databases, memory, and so on. In this sample, we use StandardFileManager, a file manager based on java.io.File. The standard file manager can be acquired by calling the getStandardFileManager() method of the JavaCompiler instance. The code snippet for the above-mentioned steps is shown below:


//Get an instance of java compiler
JavaCompiler compiler = ToolProvider.getSystemJavaCompiler();

//Get a new instance of the standard file manager implementation
StandardJavaFileManager fileManager = compiler.
        getStandardFileManager(null, null, null);
        
// Get the list of java file objects, in this case we have only 
// one file, TestClass.java
Iterable<? extends JavaFileObject> compilationUnits1 = 
        fileManager.getJavaFileObjectsFromFiles("TestClass.java");

A diagnostic listener can be optionally passed to the getStandardFileManager() method to produce diagnostic reports of any non-fatal problems. In this code snippet, we pass null values, since we are not collecting the diagnostics from the tool. For details of the other parameters passed to these methods, please refer to the Java 6 API. The getJavaFileObjectsfromFiles() method of the StandardJavaFileManager returns all the JavaFileObject instances that correspond to the supplied Java source files.

The next step is to create the Java compilation task, which can be obtained using the getTask() method of JavaCompiler. At this point, the compilation task has not yet started. The task can be triggered by invoking the call() method of CompilationTask. The code fragment for creating and triggering a compilation task is shown below.


// Create the compilation task
CompilationTask task = compiler.getTask(null, fileManager, null,
                                        null, null, compilationUnits1);
                                        
// Perform the compilation task.
task.call();

Assuming no compilation errors, this will generate the TestClass.class file in the destination directory.

Annotation Processing: The Pluggable Annotation Processing API

As we all know, Java SE 5.0 introduced support for adding and processing metadata or annotations to elements like Java classes, fields, methods, etc. Annotations are typically processed by build tools or runtime environments to perform useful tasks like controlling application behavior, generating code, and so on. Java 5 allows both compile-time and runtime processing of annotated data. Annotation processors are utilities that can be dynamically plugged into the compiler to analyze source files and process annotations in them. Annotation processors can make the fullest use of metadata information to perform many tasks, including but not limited to the following.

  • Annotations can be used to generate deployment descriptor files like persistence.xml or ejb-jar.xml, in the cases of entity classes and enterprise beans, respectively.

  • Annotation processors can use metadata information to generate code. For example, a processor can generate Home and Remote interfaces of a properly annotated enterprise bean.

  • Annotations can be used to verify the validity of code or deployment units.

Java 5.0 provided an Annotation Processing Tool (APT) and an associated mirror-based reflection API (com.sun.mirror.*) for processing annotations and modeling the processed information. The APT tool runs matching annotation processors for the annotations present in the supplied Java source files. The mirror API provides a compile-time, read-only view of the source file. The main drawback of APT is that it is not standardized; i.e., APT is specific to the Sun JDK.

Java SE 6 has introduced a new feature called the Pluggable Annotation Processing framework, which provides standardized support for writing customized annotation processors. It is called "pluggable" because the annotation processor can be plugged into javac dynamically and can operate on a set of annotations that appears in the Java source file. This framework has two parts: an API for declaring and interacting with annotation processors -- the package javax.annotation.processing -- and an API for modeling the Java programming language -- the package javax.lang.model.

Writing a Custom Annotation Processor

The following section explains how to write a custom annotation processor and plug it into the compilation task. The custom annotation processor extends AbstractProcessor (which is the default implementation of the Processor interface) and overrides the process() method.

The annotation processor class will be decorated with two class-level annotations, @SupportedAnnotationTypes and @SupportedSourceVersion. The SupportedSourceVersion annotation specifies the latest source version the annotation processor supports. The SupportedAnnotationTypes annotation denotes which annotations this particular annotation processor is interested in. For example, @SupportedAnnotationTypes ("javax.persistence.*") is to be used if the processor only needs to process Java Persistence API (JPA) annotations. It is interesting to note that the annotation processor is invoked even though no annotations are present if the supported annotation types are specified as @SupportedAnnotationTypes("*"). This allows us to take advantage of the modelling API along with the Tree API to do general purpose source code processing. Using these APIs, it is possible to get a lot of useful information pertaining to modifiers, fields, methods, and more. The code snippet of a custom annotation processor is given below:


@SupportedSourceVersion(SourceVersion.RELEASE_6)
@SupportedAnnotationTypes("*")
public class CodeAnalyzerProcessor extends AbstractProcessor {
    @Override
    public boolean process(Set<? extends TypeElement> annotations,
            RoundEnvironment roundEnvironment) {
        for (Element e : roundEnvironment.getRootElements()) {
                System.out.println("Element is "+ e.getSimpleName());
                // Add code here to analyze each root element
        }
        return true;
    }
}

The annotation processors are invoked depending on what annotations are present in the source code, which processors are configured to be available, and what annotation types the available processors process. Annotation processing may happen in multiple rounds. For example, in the first round, the original input Java source files will be processed; in the second round, the files generated by the first round of processing will be considered, and so on. The custom processor should override the process() of AbstractProcessor. This method takes two arguments:

  1. A set of TypeElements/annotations found in the source file.
  2. The RoundEnvironment that encapsulates information about the current processing round of the annotation processor.

If a processor claims the annotation types it supports, the process() method returns true and other processors are not invoked for these annotations. Otherwise, the process() method returns a false value and the next available processor will be invoked, if any.

Plugging In the Annotation Processor

Now that the custom annotation processor is ready to use, let's see how we can invoke this processor as part of the compilation process. The processor can be invoked either through the javac command-line utility or programmatically through a standalone Java class. The javac utility of Java SE 6 provides an option called -processor that accepts the fully qualified name of the custom annotation processor to be plugged in. The syntax for this is as follows:


javac -processor demo.codeanalyzer.CodeAnalyzerProcessor TestClass.java

where CodeAnalyzerProcessor is the annotation processor class and TestClass is the input Java file to be processed. This utility searches for CodeAnalyzerProcessor in the classpath; hence, it is important to place this class in the classpath.

The modified code snippet for plugging in the processor programmatically is shown below. The setProcessors() method of the CompilationTask allows multiple annotation processors to be plugged into the compilation task. This method needs to be called before the call() method. Also note that if an annotation processor is plugged into the compilation task, the annotation processing happens first, and then only the compilation task. Needless to say, annotation processing will not happen if the code causes compilation errors.


CompilationTask task = compiler.getTask(null, fileManager, null,
                                        null, null, compilationUnits1);
                                        
// Create a list to hold annotation processors
LinkedList<AbstractProcessor> processors = new LinkedList<AbstractProcessor>();

// Add an annotation processor to the list
processors.add(new CodeAnalyzerProcessor());

// Set the annotation processor to the compiler task
task.setProcessors(processors);

// Perform the compilation task.
task.call();

If we execute the above code, it causes the annotation processor to fire during the compilation of TestClass.java printing the name "TestClass."

Accessing the Abstract Syntax Tree: The Compiler Tree API

The Abstract Syntax Tree (AST) is the read-only view of the source that represents the Java code as a tree of nodes, where each node represents a Java programming language construct or Tree, and the children of each node signify meaningful components of these trees. For example, a Java class is represented as a ClassTree, method declarations are represented as MethodTrees, variable declarations as VariableTrees, annotations as AnnotationTree and so on.

The Compiler Tree API provides access to the Abstract Syntax Tree of Java source code and also provides some utilities like TreeVisitors, TreeScanners, etc., for performing operations on the AST. A deeper analysis of the source content can be done using a TreeVisitor, which visits all child tree nodes to extract the required information about fields, methods, annotations, and other class elements. The tree visitors are implemented in the style of visitor design pattern. When a visitor is passed to a tree's accept method, the visitXYZ method most applicable to that tree is invoked.

The Java Compiler Tree API provides three implementations of TreeVisitor; namely, SimpleTreeVisitor, TreePathScanner, and TreeScanner. The demo application uses a TreePathScanner to extract information about the Java source file. The TreePathScanner is a TreeVisitor that visits all the child tree nodes and provides support for maintaining a path for the parent nodes. The scan() method of the TreePathScanner needs to be invoked to scan the tree. To visit nodes of a particular type, just override the corresponding visitXYZ method. Inside your visit method, call super.visitXYZ to visit descendant nodes. The code snippet of a typical visitor class is shown below:


public class CodeAnalyzerTreeVisitor extends TreePathScanner<Object, Trees>  {
    @Override
    public Object visitClass(ClassTree classTree, Trees trees) {
        ---- some code ----
        return super.visitClass(classTree, trees);
    }
    @Override
    public Object visitMethod(MethodTree methodTree, Trees trees) {
        ---- some code ----
        return super.visitMethod(methodTree, trees);
    }
} 

You can see that the visit methods accepts two arguments: the tree representing the node (ClassTree for the class node, MethodTree for the method node, etc.) and a Trees object. The Trees class provides utility methods for fetching path information of elements in the tree. It is important to note that the Trees object acts as a bridge between JSR 269 and the Compiler Tree API. In this sample, there is only a single root element, which is TestClass itself.


CodeAnalyzerTreeVisitor visitor = new CodeAnalyzerTreeVisitor();

@Override
public void init(ProcessingEnvironment pe) {
        super.init(pe);
        trees = Trees.instance(pe);
}
for (Element e : roundEnvironment.getRootElements()) {
        TreePath tp = trees.getPath(e);
        // invoke the scanner
        visitor.scan(tp, trees);
}

The following section explains the retrieval of source code information using Tree API and populates a common model used for code verification later. The visitClass() method is called whenever a class, interface, or enum type is visited within the AST with ClassTrees as the argument. Similarly, the visitMethod() method is called for all the methods with MethodTree as argument, visitVariable() for all the variables with VariableTree as the argument, and so on.


@Override
public Object visitClass(ClassTree classTree, Trees trees) {
         //Storing the details of the visiting class into a model
         JavaClassInfo clazzInfo = new JavaClassInfo();

        // Get the current path of the node     
        TreePath path = getCurrentPath();

        //Get the type element corresponding to the class
        TypeElement e = (TypeElement) trees.getElement(path);

        //Set qualified class name into model
        clazzInfo.setName(e.getQualifiedName().toString());

        //Set extending class info
        clazzInfo.setNameOfSuperClass(e.getSuperclass().toString());

        //Set implementing interface details
        for (TypeMirror mirror : e.getInterfaces()) {
                clazzInfo.addNameOfInterface(mirror.toString());
        }
        return super.visitClass(classTree, trees);
  }

The JavaClassInfo used in this code snippet is the custom model for storing information about the Java code. After executing this code, information pertaining to the class like the fully qualified class name, superclass name, interfaces implemented by TestClass, etc., are extracted and stored in our custom model for future verification purposes.

Setting the Source Location

So far we have been busy obtaining the information about various nodes of the AST and populating the model objects for class, method, and field information. With this information, we can verify if the source follows good programming practices, conforms to specifications, and so on. This information can be quite useful to verification tools like Checkstyle or FindBugs, but they might also require the location details of the source token(s) that violates the rule, so that they provide the error location details to users.

The SourcePositions object, which is part of the Compiler Tree API, maintains the positions of all the AST nodes within the compilation unit tree. This object provides useful information about start and end positions of a ClassTree, MethodTree, FieldTree, etc., within the file. A position is defined as a simple character offset from the start of a CompilationUnit where the first character is at offset 0. The code snippet below shows how we can get the character offset position of the passed Tree object from the start of the compilation unit.


public static LocationInfo getLocationInfo(Trees trees, 
                                                TreePath path, Tree tree) {
        LocationInfo locationInfo = new LocationInfo();
        SourcePositions sourcePosition = trees.getSourcePositions();
        long startPosition = sourcePosition.
                        getStartPosition(path.getCompilationUnit(), tree);
        locationInfo.setStartOffset((int) startPosition);
        return locationInfo;
}

However, if we need to get the position of the token that gives the name of the class or method itself, then this would not suffice. To find the actual token positions within the source, one option is to search for the tokens within the char content of the source files. We can get the char content from the JavaFileObject corresponding to the compilation unit first as demonstrated below.


//Get the compilation unit tree from the tree path
CompilationUnitTree compileTree = treePath.getCompilationUnit();

//Get the java source file which is being processed
JavaFileObject file = compileTree.getSourceFile();

// Extract the char content of the file into a string
String javaFile = file.getCharContent(true).toString();

//Convert the java file content to a  character buffer
CharBuffer charBuffer = CharBuffer.wrap (javaFile.toCharArray()); 

The following code snippet locates the position of the class name token within the source. The java.util.regex.Pattern and java.util.regex.Matcher classes are used to obtain the actual position of the class name token. The content of Java source is converted to a character buffer using java.nio.CharBuffer. The matcher searches for the first occurrence of the token matching the class name in the character buffer, starting from the start position of the class tree within the compilation unit tree.


LocationInfo clazzNameLoc = (LocationInfo) clazzInfo.
                        getLocationInfo();
 int startIndex = clazzNameLoc.getStartOffset();
 int endIndex = -1;
 if (startIndex >= 0) {
   String strToSearch = buffer.subSequence(startIndex, 
   buffer.length()).toString();
   Pattern p = Pattern.compile(clazzName);
   Matcher matcher = p.matcher(strToSearch);
   matcher.find();
   startIndex = matcher.start() + startIndex;
   endIndex = startIndex + clazzName.length();
  } 
 clazzNameLoc.setStartOffset(startIndex);
 clazzNameLoc.setEndOffset(endIndex);
 clazzNameLoc.setLineNumber(compileTree.getLineMap().
              getLineNumber(startIndex));

The LineMap class of the Complier Tree API provides a map of character positions and line numbers within the CompilationUnitTree. We can get the line number of the concerned token by passing the start offset position to the getLineMap() method of the CompilationUnitTree.

Verifying the Source Against Rules

Now that we have successfully retrieved the required information from the AST, the next task is to verify if the predefined coding standard rules are satisfied by the sources under consideration. The coding rules are configured in an XML file and are managed by a custom class called RuleEngine. This class fetches rules from the XML file and fires them one by one. If a rule is not satisfied by the class, it returns a list of ErrorDescription objects. The ErrorDescription object encapsulates the error messages and the location of the error in the source code.


ClassFile clazzInfo = ClassModelMap.getInstance().
                getClassInfo(className);
for (JavaCodeRule rule : getRules()) {
        // apply rules one by one
        Collection<ErrorDescription> problems = rule.execute(clazzInfo);
        if (problems != null) {
                problemsFound.addAll(problems);
        }
}

Each rule is implemented as a Java class; the model information of the class to be verified is passed on to this class. The rule class encapsulates the logic to verify the rule logic using this model information. The implementation of a sample rule (OverrideEqualsHashCode) is shown below. This rule mandates that a class that overrides the equal() method should also override the hashcode() method. Here we iterate through the methods of the class and check if it follows the equals() and hashcode() contract. In TestClass, the hashcode() method is absent while the equals() method is present, causing the rule to return the ErrorDescription model containing the appropriate error message and the location details of the error.


public class OverrideEqualsHashCode extends JavaClassRule {
    @Override
    protected Collection<ErrorDescription> apply(ClassFile clazzInfo) {
        boolean hasEquals = false;
        boolean hasHashCode = false;
        Location errorLoc = null;
        for (Method method : clazzInfo.getMethods()) {
            String methodName = method.getName();
            ArrayList paramList = (ArrayList) method.getParameters();
            if ("equals".equals(methodName) && paramList.size() == 1) {
                if ("java.lang.Object".equals(paramList.get(0))) {
                    hasEquals = true;
                    errorLoc = method.getLocationInfo();
                }
            } else if ("hashCode".equals(methodName) &&
                method.getParameters().size() == 0) {
                hasHashCode = true;
            }
        }
        if (hasEquals) {
            if (hasHashCode) {
                return null;
            } else {
                StringBuffer errrMsg = new StringBuffer();
                errrMsg.append(CodeAnalyzerUtil.
                                getSimpleNameFromQualifiedName(clazzInfo.getName()));
                errrMsg.append(" : The class that overrides 
                                        equals() should ");
                errrMsg.append("override hashcode()");
                Collection<ErrorDescription> errorList = new 
                                                ArrayList<ErrorDescription>();
                errorList.add(setErrorDetails(errrMsg.toString(), 
                                                        errorLoc));
                return errorList;
            }
        }
        return null;
    }
}

Running the Sample

You can download the binary file of this demo application from the Resources section. Save this file to any local directory. Use the following command to execute the application from command prompt:


java -classpath <JAVA_HOME>\lib\tools.jar;.; demo.codeanalyzer.main.Main <comma separated list of source files to be verified>

Summary

This article discussed how the new Java 6 APIs can be utilized to invoke the compiler from Java code and how to parse and analyze source code using pluggable annotation processors and tree visitors. Using standard Java APIs instead of IDE-specific parsing/analyzing logic makes code reuse possible across different tools and environments. We have only scratched the surface of the three compiler-related APIs here; you can find many more useful features by taking a deeper plunge into these APIs.

Resources

View all java.net Articles.

 Feed java.net RSS Feeds