Skip to main content

XML Manipulation using XMLTask

November 1, 2006

{cs.r.title}








XMLTask is an external task for the popular build tool Ant that permits complex manipulations of XML in a simple and consistent fashion, without having to deal with XML style sheets (XSL). XMLTask can be used for many common tasks that developers face, including manipulating J2EE and Spring descriptors, creating XHTML websites, and driving workflows via XML configuration files.

Why Use XMLTask?

Developers often face having to automate the changing of XML documents during a build or deployment. The traditional means of doing this using Ant are by using the standard Ant tasks replace or style/xslt and defining an XML style sheet.

The replace task works well for simple search/replace operations (for example, changing property files) but lacks the ability to perform powerful searches and complex replacement operations. Additionally, it is not XML-aware and can result in a non-"well-formed" XML document or a document with an inconsistent character set (this is particularly important when manipulating internationalized content such as XHTML).

Unlike replace, the style/xslt task performs XML-aware changes but at the price of greater complexity. The developer has to develop an additional XML style sheet as part of their build/deployment. Often this is not a trivial task.

XMLTask was developed to provide a mechanism for changing XML files in a simple and consistent fashion, providing tools to help with commonly encountered requirements in Ant builds. XMLTask allows developers to identify sections of XML, then insert, remove, and cut/copy/paste content. Content can be read from files, and moved between XML documents.

For the purposes of this article, we'll take a common example of XMLTask usage and modify a servlet descriptor for a deployment of a servlet-based application.

If you want to follow the article with a copy of XMLTask, you should download it before continuing here. XMLTask will work with any version of Ant 1.5 and above. The examples used below are available for download. See the Resources section for the appropriate links.

XMLTask Basics

XMLTask requires referencing in your Ant build.xml file like any other external task. Simply use:

<taskdef name="xmltask" 
            classname="com.oopsconsultancy.xmltask.ant.XmlTask"/>

XMLTask requires a source file to read, and a destination file to write to. (These can be the same file, but this practice is discouraged unless you have an appropriate backup!) You can specify source and destination files using the appropriate attributes:

 <!-- modify the servlet --> 
<target name="servlet">
   <xmltask source="src/web.xml" dest="target/web.xml">
      <!-- nothing to do here yet -->
   </xmltask>
</target>

Given the above, XMLTask will simply read in the file src/web.xml, perform no modifications, and write the resultant XML out to target/web.xml. The input file src/web.xml will not be changed.

Now you need to tell XMLTask what to do with the contents of src/web.xml. You give XMLTask a set of instructions for changing the XML content, and identify the elements that XMLTask will operate on by using the standard XPath syntax--a path-like notation for finding XML content. A complete explanation of XPath is outside the scope of this document, but the examples below use straightforward XPath expressions and should be readily understandable. Some tutorials are referenced in the Resources section for those who wish to investigate XPath in greater depth.

Each XMLTask instruction works by specifying an XPath to an XML element, and what to do for that element. If an XPath resolves to more than one XML element, then XMLTask will apply the instruction to each XML element specified. If an XPath resolves to no XML elements then that instruction will not be applied at all. You can instruct XMLTask to fail in these scenarios by setting the 'failWithoutMatch' attribute to 'true'.

Let's look at some simple usages of XMLTask.

Inserting Content

You can use XMLTask to insert text and XML content specified either in the Ant build script on in external files. Let's take a standard servlet descriptor (web.xml) without an application name:

 <?xml version="1.0" encoding="UTF-8"?> 
<web-app id="ApplicationName" version="2.4">
   <display-name></display-name>
   ....
</web-app>

You can use XMLTask to insert a display name. You can imagine scenarios where you would want to do this dynamically--generating a development and production version of your servlet application, inserting the build time/date, etc.

 <xmltask source="src/web.xml" dest="target/web.xml"> 
   <insert path="/web-app/display-name"
          xml="Development Application"/>
</xmltask>

which results in:

 <?xml version="1.0" encoding="UTF-8"?> 
<web-app id="ApplicationName" version="2.4">
   <display-name>Development Application</display-name>
   ....
</web-app>

You can see that the <insert> instruction identifies where the insertion must occur (by specifying a path to the XML to be changed: /web-app/display-name) and what should be inserted. <insert> can insert simple text, XML specified in the Ant script, or the content of a file. To illustrate both the more complex XML insertion and specifying the position of insertion, let's insert a new chunk of XML into web.xml to configure a servlet. The XML is contained within a CDATA section within the <insert> instruction.

 <xmltask source="src/web.xml" dest="target/web.xml"> 
   <insert path="/web-app/servlet[1]/servlet-class"
             position="after">
      <![CDATA[
      <init-param>
         <param-name>parameter 1</param-name>
         <param-value>value for parameter 1</param-value>
      </init-param>
      ]]>
   </insert>
</xmltask>

which will insert the new <init-param> section specified after the <servlet-class> section of the first servlet definition. The resultant descriptor will look like:

 <?xml version="1.0" encoding="UTF-8"?> 
<web-app id="XmlTaskDemo" version="2.4">
   <display-name/>
   <servlet>
      <servlet-name>UnitTests</servlet-name>
      <servlet-class>com.oopsconsultancy.servlet.tests.UnitTests</servlet-class>
      <init-param>
         <param-name>parameter 1</param-name>
         <param-value>value for parameter 1</param-value>
      </init-param>
      ....

You can insert the contents of a file (called, in this example, insertion.xml).

 <xmltask source="src/web.xml" dest="target/web.xml"> 
   <insert path="/web-app/servlet[1]/servlet-class"
         position="after" file="insertion.xml"/>
</xmltask>

The XML that is inserted can be an XML document fragment and consequently doesn't have to have a root node. So a valid file for insertion can look like:

 <!-- common servlet initialisations --> 
<init-param>
   <param-name>parameter 1</param-name>
   <param-value>value for parameter 1</param-value>
</init-param>
<init-param>
   <param-name>parameter 2</param-name>
   <param-value>value for parameter 2</param-value>
</init-param>
Removing Content

You can remove content just as easily.

 <xmltask source="src/web.xml" dest="target/web.xml"> 
   <remove path="/web-app/servlet-mapping[servlet-name/text() = 'UnitTests']"/>
</xmltask>

This tells XMLTask to remove the <servlet-mapping> section that has a <servlet-name> section of 'UnitTests'. Note that if you'd specified the instruction as simply:

 <xmltask source="src/web.xml" dest="target/web.xml"> 
   <remove path="/web-app/servlet-mapping"/>
</xmltask>

then we'd remove every <servlet-mapping> section.

<remove> will tell XMLTask to remove the XML element defined and everything below it. A common requirement is to simply remove the text content within an XML element. You can do that just as easily.

 <xmltask source="src/web.xml" dest="target/web.xml"> 
   <remove path="/web-app/display-name/text()"/>
</xmltask>

which will remove the configured display name, but leave behind the XML elements defining it.

Replacing XML

You can replace XML easily using the <replace> instruction.

 <xmltask source="src/web.xml" dest="target/web.xml"> 
   <replace path="/web-app/servlet[1]/servlet-class/text()"
         withText="com.oopsconsultancy.examples.NewServlet">
</xmltask>

will remove the text under the <servlet-class> element in the first servlet and replace it with the new servlet class specification, com.oopsconsultancy.examples.NewServlet. Note that you can use <insert> here, but it would instead insert text adjacent to the existing text under the <servlet-class> element.

To modify an attribute, you can use an XPath expression with an attribute identifier.

 <xmltask source="src/web.xml" dest="target/web.xml"> 
   <replace path="/web-app/@id" withText="New Application">
</xmltask>

This performs a replacement on the 'id' attribute of the <web-app> element (the '@' symbol is used by XPath to indicate an attribute).

Comments and XML

A trivial way to change XML configurations is to have one master configuration containing all the required elements, with those not required commented out. It's a simple task to then uncomment those sections required, while having the ease of maintaining only one file. XMLTask allows you to create multiple XML documents from a master very easily, by allowing you to determine which elements are commented/uncommented during a build.

Given a web.xml that contains two different versions of a servlet configuration, both disabled by default:

 <?xml version="1.0" encoding="UTF-8"?> 
<web-app id="XmlTaskDemo" version="2.4">
   <display-name>Application</display-name>
  
   <!--
   <servlet>
      <servlet-name>Pricing Model 1</servlet-name>
      <servlet-class>com.oopsconsultancy.pricing.model1</servlet-class>
      ...
   </servlet>
   -->
  
   <!--
   <servlet>
      <servlet-name>Pricing Model 2</servlet-name>
      <servlet-class>com.oopsconsultancy.pricing.model1</servlet-class>
      ...
   </servlet>
   -->

XMLTask can enable any of these by uncommenting the appropriate element.

 <xmltask source="src/web.xml" dest="target/web.xml"> 
   <uncomment path="/web-app/comment()[2]"/>
</xmltask>

will uncomment the 2nd commented section. Note that to specify a comment, you have to use the XPath comment() function.

Combining Instructions

XMLTask can chain these simple instructions together to provide a means to perform complex modifications of a file. For instance, in the example below, you can read the source web.xml, insert content, change attributes and remove content, and write out to a new target web.xml.

 <xmltask source="src/web.xml" dest="target/web.xml"> 
   <replace path="/web-app/@id" withText="New XMLTask Demo"/>
   <insert path="/web-app/servlet[1]/servlet-class"
               position="after">
      <![CDATA[
      <init-param>
         <param-name>parameter 1</param-name>
         <param-value>${dev.property}</param-value>
      </init-param>
      ]]>
   </insert>
   <remove path="/web-app/servlet[1]/load-on-startup"/>
</xmltask>

The above changes the application id to 'New XMLTask Demo', inserts a servlet initialization parameter, and removes the <load-on-startup> element and contents. Note that if the Ant property dev.property (referenced in the insertion) has been assigned a value, it will be expanded during the <insert> instruction. Each XMLTask instruction operates on the XML resulting from the last instruction--not the original document. So, for example, removing the first servlet definition will then mean that the second servlet definition becomes the first, and so on.

To diagnose sets of instructions and understand what effect each instruction has, you can switch on reporting.

 <xmltask source="src/web.xml" dest="target/web.xml" 
         report="true">

This prints the XML document out after each XMLTask instruction.

Cut/Copy/Paste and Buffers

XMLTask can cut, copy, and paste among XML documents. The XML cut or copied is stored in a buffer, which can be used to paste into the same document or a different one loaded by a new XMLTask instance.

A common usage scenario is to use a document as a template, copy one section of it and then replicate through the document by pasting the section back in, and changing each pasted section.

 <xmltask source="src/web.xml" dest="target/web.xml"> 
   <!-- copy the first servlet -->
   <copy path="/web-app/servlet[1]" buffer="servlet-buffer"/>
   <!-- paste a copy back in after the first servlet -->
   <paste path="/web-app/servlet[1]" position="after"
         buffer="servlet-buffer"/>
   <!-- and change the name -->
   <replace path="/web-app/servlet[2]/servlet-name/text()"
         withText="UnitTests-2"/>
</xmltask>

The above copies the first servlet definition within the src/web.xml into a buffer called servlet-buffer. It then pastes it back into the web.xml from 'servlet-buffer' after the first servlet definition and changes the name.

Any number of buffers can be defined, each with an arbitrary name, and they will persist for the lifetime of the Ant build. So a buffer can be written to in one XMLTask instance, and written out during another.

 <xmltask source="src/web.xml"> 
   <copy path="/web-app/servlet[1]" buffer="servlet definition"/>
</xmltask>

<xmltask source="src/web-2.xml" dest="target/web-2.xml">
   <paste path="/web-app/servlet[1]" position="after"
         buffer="servlet definition"/>
</xmltask>

The above uses two XMLTasks to copy and paste between different files. The first XMLTask copies some XML from src/web.xml. Note that because you're simply copying into a buffer (called 'servlet definition') then you don't need to tell XMLTask to write out to a destination file.

The second XMLTask reads src/web-2.xml, and pastes the contents of the buffer into it (after the first servlet definition). This then gets written out as target/web-2.xml.

Buffers can be appended to from multiple <cut> or <copy> instructions. For example, you may want to select a number of servlet definitions and write them into a new web.xml. The first stage (the copying) would look like this:

<!-- we want to copy the 1st, 3rd and 4th servlet definitions --> 
<xmltask source="src/web.xml">
   <copy path="/web-app/servlet[1]" buffer="sbuf"/>
   <copy path="/web-app/servlet[3]" buffer="sbuf" append="true"/>
   <copy path="/web-app/servlet[4]" buffer="sbuf" append="true"/>
</xmltask>

To see the contents of a buffer (for diagnostic purposes) you can print it out to standard out.

<xmltask source="src/web.xml"> 
   <copy path="/web-app/servlet[1]" buffer="sbuf"/>
   <print buffer="servlet definition"/>
</xmltask>

XMLTask can read its initial input from a buffer, and write the results of a set of instructions to a buffer. You can simply specify a sourcebuffer instead of a source file, and/or a destbuffer instead of a dest file. The below creates an XML document from scratch, and writes it to a buffer (for later copying/pasting).

 <xmltask destbuffer="dest buffer"> 
   <insert path="/">
      <![CDATA[
         <web-app id="new webapp!"/>
      ]]>
   </insert>
</xmltask>

XMLTask doesn't require a source document, since you're creating a new document. The <insert> creates a root element called <web-app>, and that gets written to the buffer 'dest buffer' when XMLTask completes. This buffer is now available for further XMLTask targets to use.

XMLTask can write to Ant properties in the same fashion (by specifying property instead of buffer in the cut, copy, and paste instructions). However, there are a couple of caveats to be aware of. The first is that Ant properties are immutable. Consequently they can't be re-written or appended to. The second issue is that properties store text only. If you have specific character encoding requirements with your XML, then converting document fragments to text may cause difficulties later on. It's advisable to use XMLTask buffers in preference to properties wherever possible for these reasons.

Handling DTDs

When manipulating XML documents that have or require document type definitions (DTDs), you need to tell XMLTask how to handle these. DTD configuration falls into two categories:

  1. Telling XMLTask where to find a DTD. XMLTask doesn't perform validation using a DTD, but will still require a DTD specified in an input XML document so it can resolve XML entities.
  2. If required, telling XMLTask what DTD its output conforms to.

For example, a Servlet spec 2.3 web.xml document will begin:

<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE
         web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
         "http://java.sun.com/dtd/web-app_2_3.dtd">
<web-app id="XMLTask Demo" version="2.3">
....

XMLTask will attempt to load this DTD from http://java.sun.com. If you're behind a firewall then this will fail, and your build will exit with a (usually unexpected) java.net.ConnectException. Fortunately Ant provides a mechanism to disable this remote lookup, and XMLTask can take advantage of it.

You can use the Ant <xmlcatalog> to specify a local location for the above DTD:

 <xmlcatalog id="dtd"> 
   <dtd
      publicId="-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
      location="./servlet-2.3.dtd"/>
</xmlcatalog>

specifies a catalog of XML DTDs called dtd. For the required DTD with the public ID from Sun Microsystems, you should specify a local copy in the file ./servlet-2.3.dtd. (simply capture a copy via the given URL and your browser).

You now need to tell XMLTask about this.

 <xmltask source="src/web.xml" dest="target/web.xml"> 
   <xmlcatalog refid="dtd"/>
</xmltask>

XMLTask will use the local copy of the DTD in ./servlet-2.3.dtd in preference to the remote copy.

Because XMLTask is performing transformations on the document you may need to tell it what DTD schema the resultant document should be output with (remember that XMLTask will not perform a validation--the optional <xmlvalidate> task is appropriate for this). If XMLTask is outputting a modified document that conforms to the DTD referenced in the source document, then you can tell XMLTask to re-use this:

<!-- we're amending a 2.3 web.xml document --> 
<xmltask source="src/web.xml" dest="target/web.xml" preserveType="true">

If you want to output a document with a new DTD, then you have to specify the public and system IDs used by that DTD.

<!-- we're creating a 2.3 web.xml document from scratch --> 
<xmltask
   source="src/web.xml" dest="target/web.xml"
   public="-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
   system="http://java.sun.com/dtd/web-app_2_3.dtd">
Paths and Namespaces

XML documents containing multiple XML namespaces are increasingly common. XMLTask can handle these, but you need to be a little more careful when creating your XPath specifications. For instance, a servlet 2.4 web.xml file will begin:

<?xml version="1.0" encoding="UTF-8"?> 
<web-app id="XMLTask Demo"
   xmlns="http://java.sun.com/xml/ns/j2ee" version="2.4"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http:/java.sun.com/dtd/web-app_2_4.xsd">
   <display-name>Application Display Name</display-name>
   <servlet>
   ....

and the default namespace is set to "http://java.sun.com/xml/ns/j2ee". Although the remainder of the web.xml looks perfectly normal, you need to modify your XPath specifications to take account of multiple namespaces, and specify XML elements appropriately. A full discussion of XPaths and namespaces is beyond the scope of this document--however, the solution is straightforward and simply requires us to specify the local name of the XML elements required.

<insert path="/web-app/display-name" xml="New Application Name"/>

becomes:

<insert path="/:web-app/:display-name" xml="New Application Name"/>

Scoping each XML element name with a preceding ':' is sufficient to tell the XPath mechanism that you're interested in the local name of the element.

Driving Ant Builds via XMLTask

XMLTask has the facility to read XML files during an Ant build and execute Ant targets according to the contents of these files. This facility makes it straightforward for you to drive an Ant build by using a configuration file, an XML file output from an earlier process or similar.

For example, you may have a file, build-configuration.xml, representing a set of builds required.

<builds> 
   <build>development</build>
   <build>systemtest</build>
   <build>production</build>
</builds>

This specifies that you need a development version of your software, a system test version, and a production version. XMLTask can read this configuration and call Ant targets to build your software according to the above configuration.

<!-- drives the build --> 
<target name="main">
   <xmltask source="build-configuration.xml">
      <call path="/builds/build" target="compile-and-release">
         <param name="build.version" path="text()"/>
      </call>
   </xmltask>
</target>

<!-- performs a build -->
<target name="compile-and-release">
   <echo>Building ${build.version}</echo>
   ....
</target>

The XMLTask reads the input file build-configuration.xml and calls the Ant target 'compile-and-release' for each <build> element it encounters. For each 'compile-and-release' target called, the property 'build.version' is set to the text contents of the <build> element. In this example this would be development, systemtest, and production. By changing the contents of build-configuration.xml, you can change the build output. The above configuration would result in three separate builds.

Conclusion

XMLTask allows the straightforward modification of XML documents by using the power of XPath to identify XML elements to change, but without the pain of learning XSL. You've seen during this article how to perform insertions, replacements, and deletions to permit simple maintenance of XML files, including (but not limited to) J2EE descriptors, Spring configurations, and XHTML documents. XMLTask can be used to parse XML configurations and drive Ant builds. It's worthwhile visiting the XMLTask home page to see more features not addressed here, and for further examples.

Resources


Brian Agnew is the founder and principal consultant with OOPS Consultancy Ltd, located in London, UK.
Related Topics >> Web Services and XML   |