The Source for Java Technology Collaboration
User: Password:



   

Getting Groovy with XML Getting Groovy with XML

by Jack Herrington
08/12/2004


Contents
Java Straight to the DOM
Simplifying with XPath
Groovy Take One
Groovy and the DOM
Wrapping the XML DOM
Groovy and Object Orientation
Conclusions
Resources

XML sucks. Oh, wait, XML rocks. Well, it actually does a lot of both. It rocks because of all of the editors, validators, and tools written for it. XML has all but replaced any notion of a new custom text-based data language. But it also sucks because it's hard to use. Using a DOM to read and manipulate XML is a pain, and SAX is even worse. XPath helps a little, but even XSLT, the ultimate XML processing tool, is hard to learn, follows an uncommon functional programming paradigm, and is overkill for small problems.

Is there something that we can do to take the pain out of XML? The E4X committee, which is made up of representatives from a bunch of big companies (Mozilla.org, Microsoft, Macromedia, etc.), seems to think so. They have an extension layer proposal for JavaScript (ECMAScript) that will make building and accessing an XML DOM as easy as working with objects using the "dot" notation.

Here is example XML data file that I will use for all of the examples in this article:


<transactions> 
    <account id="a"> 
        <transaction amount="500" /> 
        <transaction amount="1200" /> 
    </account> 
    <account id="b"> 
        <transaction amount="600" /> 
        <transaction amount="800" /> 
        <transaction amount="2000" /> 
    </account> 
</transactions> 

Wouldn't it be great if you had this XML document attached to a variable named doc? You could say this:

var id = doc.transactions.account[0].id 

And have the id variable set to a. That is what E4X is all about. It's about making XML access simple and easy to understand. The only problem is that E4X hasn't been approved yet, and isn't shipping. So can we make XML simpler today?

When I asked myself that question, I thought of applying my new favorite embedded language, Groovy, to the task. You can judge for yourself how far I have simplified the task, but I hope in the meantime that you will learn something about XML and a lot about Groovy.

What are these fixes? First, we will use a dot notation for traversing the DOM tree, instead of using accessors. We will also default any node access to map to the first child of that node in the tree. This means that you won't have to indicate which child you want to work with if there is only one child. Access will always default to the first matching child. And finally, we will make XPath access simpler through a native method on every node.

In twelve-step programs they have you admit your addiction in order begin to to deal with it. In order to understand how bad using the DOM is, we need to start with a hand-coded example.

Java Straight to the DOM

The example I will use throughout this article is to take the original XML data file and to add up all of the transaction numbers by account. We will call this function calculateAccounts, and it should return a hash or a map that has an entry for each account with the correct values. In this case, that means 1700 for account a and 3400 for account b.

The simplest way to do this would be to use the DOM using Java:


import java.util.Hashtable; 
import java.io.File; 
import javax.xml.parsers.DocumentBuilder; 
import javax.xml.parsers.DocumentBuilderFactory; 
import org.w3c.dom.Document; 
import org.w3c.dom.NodeList; 
import org.w3c.dom.Element; 
import org.w3c.dom.Node; 

public class GroovyXML1 
{ 
    public static Hashtable calculateAmounts( String fileName )
        throws Exception
    { 

We first read in the XML file:


        // Read the XML 
        DocumentBuilderFactory factory =
            DocumentBuilderFactory.newInstance(); 
        DocumentBuilder builder =
            factory.newDocumentBuilder(); 
        Document doc = builder.parse( new File( fileName ) ); 

        // Initialize the list of account values 
        Hashtable accountValues = new Hashtable(); 

Next we iterate through the account nodes:


        // Get the initial account nodes 
        NodeList accountNodes =
            doc.getChildNodes().item(0).getChildNodes(); 
        for(int accountNodeIndex = 0;
            accountNodeIndex < accountNodes.getLength();
            accountNodeIndex++ ) 
        { 

One of the problems with DOM access is that we have to account for the white-space nodes that are in the tree. This conditional ensures that we only look at the element nodes.


            // Go only through the account Element nodes 
            if ( accountNodes.item(accountNodeIndex).getNodeType() ==
                                    Node.ELEMENT_NODE ) 
            { 

Because Node doesn't have a convenient accessor to get attributes, we need to cast the Node to an Element before we can get the account id.


              Element accountElement =
                  (Element)accountNodes.item( accountNodeIndex); 
              // Get the account ID 
              String accountID =
                  accountElement.getAttribute( "id" ); 

Now we need to iterate through the transaction nodes to add up the amounts.


              // Go through the transaction nodes
              // within the account node 
              int amount = 0; 
              NodeList transactionNodes =
                  accountElement.getChildNodes(); 

              for( int transIndex = 0;
                   transIndex < transactionNodes.getLength();
                   transIndex++ ) 
              { 
                 // Go through just the elements 
                 if ( transactionNodes.item( transIndex ).getNodeType() ==
                            Node.ELEMENT_NODE ) 
                 { 
                    // Add the amount to the amount counter 
                    Element transaction =
                        (Element)transactionNodes.item( transIndex ); 
                    Integer value =
                        new Integer( transaction.getAttribute( "amount" ) ); 
                    amount += value.intValue(); 
                 } 
              } 

And the final step in the processing is to add the amount to the hash table. Because hash tables only take objects, we need to wrap the total in an Integer object before we can add it to the output.


              // Add the account total to the hash table 
              accountValues.put( accountID, new Integer( amount ) ); 
           } 
        } 

        return accountValues; 
    } 

With the results in hand, we can output the results to see if we did our math correctly.


    public static void main( String[] args)
        throws Exception
    { 
      System.out.println( "Using XML DOM" ); 
      Hashtable out = calculateAmounts( "test_data.xml" ); 
      System.out.println( "a = " + out.get( "a") ); 
      System.out.println( "b = " + out.get( "b") ); 
    } 
} 

Of the twenty-odd lines that were involved in getting the results, only two of those were the actual algorithm itself. So it's no surprise when we can't actually see the algorithm forest for all of the infrastructure trees.

Perhaps things would be better if we used XPath.

Pages: 1, 2, 3, 4

Next Page » 

View all java.net Articles.

 Feed java.net RSS Feeds