|
|
|||||||||||||||||||||||||||||||||
by S. W. Eran Chinthaka | |||||||||||||||||||||||||||||||||
| Event | Valid Methods |
|---|---|
| All states | getProperty(), hasNext(), require(), close(), getNamespaceURI(), isStartElement(), isEndElement(), isCharacters(), isWhiteSpace(), getNamespaceContext(), getEventType(),getLocation(), hasText(), hasName()
|
START_ELEMENT
|
next(), getName(), getLocalName(), hasName(), getPrefix(), getAttributeXXX(), isAttributeSpecified(), getNamespaceXXX(), getElementText(), nextTag()
|
ATTRIBUTE
|
next(), nextTag() getAttributeXXX(), isAttributeSpecified()
|
NAMESPACE
|
next(), nextTag() getNamespaceXXX()
|
END_ELEMENT
|
next(), getName(), getLocalName(), hasName(), getPrefix(), getNamespaceXXX(), nextTag()
|
CHARACTERS
|
next(), getTextXXX(), nextTag()
|
CDATA
|
next(), getTextXXX(), nextTag()
|
COMMENT
|
next(), getTextXXX(), nextTag()
|
SPACE
|
next(), getTextXXX(), nextTag()
|
START_DOCUMENT
|
next(), getEncoding(), getVersion(), isStandalone(), standaloneSet(), getCharacterEncodingScheme(), nextTag()
|
END_DOCUMENT
|
close()
|
PROCESSING_INSTRUCTION
|
next(), getPITarget(), getPIData(), nextTag()
|
ENTITY_REFERENCE
|
next(), getLocalName(), getText(), nextTag()
|
DTD
|
next(), getText(), nextTag()
|
Now let's see how we can play around a bit with the StAX API. We need to download StAX API .jar and an implementation of the StAX API. Both are available from Ibiblio; the StAX API .jar is stax-api-1.0.jar. As for the implementation, there are several available. Let's use the woodstox implementation available as wstx-asl-2.9.2.jar.
Let's first print events from sample1.xml, which can be found in the sample code folder in the Resources section.
<article:Article xmlns:article="http://www.article.org"
xmlns:author="http://author.org">
<!-- This sample1.xml is used for samples in
"Introducing StAX" article -->
<Name>Introducing StAX</Name>
<author:Author>Eran Chinthaka</author:Author>
<?This_is_some_processing_instruction?>
</article:Article>
First you need to create an instance of XMLStreamReader. The StAX API provides XMLInputFactory to create an instance of XMLStreamReader.
FileInputStream fileInputStream =
new FileInputStream(fileLocation);
XMLStreamReader xmlStreamReader =
XMLInputFactory.newInstance().
createXMLStreamReader(fileInputStream);
Then we need to ask the parser to proceed through each event. XMLStreamReader provides an iterator-like API to check the existence of a next event.
while (xmlStreamReader.hasNext()) {
printEventInfo(xmlStreamReader);
}
xmlStreamReader.close();
This code will iterate until xmlStreamReader has no further events to be thrown. Note that closing the xmlStreamReader instance is not required, but is considered good programming practice.
Now we need to get the events from the parser and call appropriate methods to extract information about the XML.
int eventCode = reader.next();
switch (eventCode) {
case 1 :
System.out.println("event = START_ELEMENT");
System.out.println("Localname = "+reader.getLocalName());
break;
case 2 :
System.out.println("event = END_ELEMENT");
System.out.println("Localname = "+reader.getLocalName());
break;
case 3 :
System.out.println("event = PROCESSING_INSTRUCTION");
System.out.println("PIData = " + reader.getPIData());
break;
..............................
..............................
..............................
The interesting thing to note here is that the user must call the parser to proceed by calling reader.next(). The parser will proceed to the next step only after that. This is the main difference between pull and push parsing. In push parsing, as with SAX, once the SAX parser starts sending events, the user or the client application has no control over it. But in pull parsing, as seen here, the client application can decide the phase of parsing at its own discretion.
Say you want to process only one element of the XML, if present. In this approach you put a simple if statement in the START_ELEMENT handling code and you are done. If you do not want to process any XML after that, you can simply close the stream and forget about it, rather than having to parse the all of the XML.
One typical example of this kind of processing is when you relay pieces of XML. Most of the time the intermediary node will look for a particular XML element and will then forward it to the proper destination, without requiring parsing of the whole XML chunk.
When you run the above piece of code against sample1.xml, the output will be as follows:
event = START_ELEMENT
Localname = Article
========================
event = COMMENT
Comment = This sample1.xml is used for samples in "Introducing StAX" article
========================
event = START_ELEMENT
Localname = Name
========================
event = CHARACTERS
Characters = Introducing StAX
========================
event = END_ELEMENT
Localname = Name
========================
event = START_ELEMENT
Localname = Author
========================
event = CHARACTERS
Characters = Eran Chinthaka
========================
event = END_ELEMENT
Localname = Author
========================
event = PROCESSING_INSTRUCTION
PIData =
========================
event = END_ELEMENT
Localname = Article
========================
event = END_DOCUMENT
Document Ended
========================
Now let's try to write the same XML to the output using XMLStreamWriter interface.
In just the same way as we create XMLStreamReader to read XML using the XMLInputFactory, we need to create an instance of XMLStreamWriter using the XMLOutputFactory.
XMLStreamWriter writer = XMLOutputFactory.newInstance().
createXMLStreamWriter(outStream);
Then this writer can be used to write events. For example:
writer.writeStartElement("Name")writer.writeEndElement()writer.writeComment("This sample1.xml is used for samples in \"Introducing StAX\" article")writer.writeNamespace("author", "http://author.org")writer.writeCharacters("Introducing StAX")writer.writeProcessingInstruction("This_is_a_processing_instruction")Having written these events to the XMLStreamWriter you must flush and close the writer.
writer.flush();
writer.close();
StAX contains two distinct APIs to work with XML. One is the cursor API and the other is the iterative API. What we have discussed so far is the cursor API. As you can see, the cursor API always points to one thing at a time and it always moves forward, and never goes backward. The iterator API, on the other hand, tries to visualize the XML stream as a set of event objects. The base iterator API is called XMLEvent, and there are subinterfaces for each event type. The XMLEventReader interface has following methods to interact with the XML info set.
public XMLEvent nextEvent() throws XMLStreamException;
public boolean hasNext();
public XMLEvent peek() throws XMLStreamException;
More information on the iterator API can be found in Sun's online tutorial.
Most of the applications that process XML benefit from stream parsing and most of the time do not require the entire DOM model in memory. Having mentioned that as the main advantage we have in pull parsing, let's look at the other aspects.
In the end, how does StAX compare with some of the existing XML parsing technologies available today? A table in Jeff Ryan's "Does StAX Belong in Your XML Toolbox?" does a good job of assessing the various approaches, in terms of API style, ease of use, CPU/memory use, and more. Check it out.
This approach of XML processing gives more control to the client application than to the parser, enabling much faster and more memory-efficient processing. This is becoming a standard across different domains of XML processing. For example, Apache Axis2, one of the prominent SOAP processing engines, improved its performance four times, on average, over its predecessor by using a StAX-based XML processing model called Axiom. Axiom is much more memory-efficient and performant than the existing object models available today, due to the usage of StAX as its XML parsing technology.
S. W. Eran Chinthaka is a pioneering member of Apache Axis2, AXIOM and Synapse projects, working fulltime with WSO2 Inc..
View all java.net Articles.
Showing messages 1 through 5 of 5.
We have moved to StAX from SAX, much easier to work with. From personal experience I can say that:
So far we are using parser that comes with JWSDP 1.6 and it seems to work OK.
|
|