Data is everywhere. It is on your computer, on the internet, squirreled away on some corporate server. We have telephone records, class schedules, blogs, personal websites, and our personal favorite websites. There're Flickr, MySpace, and online genealogical tools. This article introduces simple techniques and APIs available in the SwingX-WS project to help you write rich client applications in Java that take advantage of all these data sources for creating Swing mashups, or Smashups.
Accessing Web Content
While there are many great sources of online data, many of them do not provide public APIs for accessing that data. When was the last time you used a public API for accessing your bank records? And yet all this data is available, if only we could access it! Well, we can: it's called screen scraping, and it is d-i-r-t-y. Essentially, it means viewing the source of a website, figuring out what information in that page you want, and then parsing out the data.
There be sharks in them there waters
Before digging too deeply, I want to point out that relying on screen scraping for data access is extremely perilous. Web site authors might change their websites at any time, breaking production applications. For this reason, it is not recommended to rely on screen scraping for mission-critical applications. Rather, it is a fun and interesting technique for creating small applications (often that won't be used by anybody beside yourself or your mom). This technique requires that if a website changes and breaks your parsing code, that you'll quickly fix your code and redeploy an updated application.
However, there is a silver lining. In this article, I talk a lot about screen scraping, but these same techniques can be applied to quick scripting of Swing UIs that use XML as their data sources. Typically, we end up writing a SAX or StAX parser, or maybe a mound of code for parsing a DOM tree. This article introduces techniques that can be used to make parsing of XML files a lot easier.
Accessing Web Content with URL
The first step in writing a Smashup is accessing web content. There are several approaches we could take. Let's start with the easiest: java.net.URL. The humble URL class has been a part of the JDK since version 1.0. For HTTP GET requests, URL is an excellent choice.
//read and download the page text
URL url = new URL("http://www.java.net");
InputStream in = url.openStream();
StringBuffer buffer = new StringBuffer();
byte[] data = new byte[1024];
int length = -1;
while ((length = in.read(data)) != -1) {
buffer.append(new String(data, 0, length);
}
System.out.println(buffer);
//read an image
Image img = ImageIO.read(new URL("http://java.net/images/header_jnet_new.jpg"));
Clearly, reading page text could benefit from a helper class, but otherwise our faithful old friend URL is a simple, straightforward way to access web content.
Apache Jakarta HttpClient
But URL doesn't help at all when it comes to HTTP POST. Enter the Apache Jakarta HttpClient project. HttpClient is an excellent low-level library for working with HTTP servers (or in other words, nearly every server on the planet). Using some low-level constructs such as the HttpClient and HttpMethod classes, you can access web resources via the GET, POST, DELETE, PUT, HEAD, or TRACE HTTP methods. You can also authenticate, deal with proxy servers, manage cookies, manage headers, and manage parameters.
For fairness in comparison, here is the minimal amount of HttpClient code that equals the functionality of the above URL code example:
//read and download the page text
HttpClient client = new HttpClient();
HttpMethod method = new GetMethod("http://www.java.net");
client.executeMethod(method);
System.out.println(method.getResponseBodyAsString());
method.releaseConnection();
//read an image
method = new GetMethod("http://java.net/images/header_jnet_new.jpg");
client.executeMethod(method);
Image img = ImageIO.read(method.getResponseBodyAsStream());
method.releaseConnection();
This code should be fairly self-explanatory. In some ways this is an improvement over URL:
No looping and byte arrays. Yah!
If I need to use HTTP POST instead of GET, all I need to do is create a PostMethod instead of a GetMethod.
I can set a flag on GetMethod to automatically follow HTTP redirects.
There are also some problems. All this configuration of the GetMethod (such as setFollowsRedirects, configuring proxy support, authentication, etc.) is begging for a higher-level framework to abstract away the tedious details of establishing a connection. There is also no support for HTML concepts such as HTML Forms, since HttpClient is, well, about HTTP. It was also more work to download an Image than it was to use good ol' URL.
Enter SwingX-WS
The org.jdesktop.http package of the SwingX-WS project provides a set of higher-level constructs for interacting with HTTP-based servers. These classes were written for two reasons: first, because HttpClient needs a higher-level framework for simplifying the common case; second, because SwingX-WS shouldn't be tied by API to a third-party library, especially one that is likely to be succeeded by another project in the not-too-distant future. Again, here is the sample code, which achieves the same end result as the previous examples:
//read and download the page text
Session s = new Session();
Response r = s.get("http://www.java.net");
System.out.println(r.getBody());
//read an image
r = s.get("http://java.net/images/header_jnet_new.jpg");
Image img = ImageIO.read(r.getBodyAsStream());
This code introduces two new classes: Session and Response. Session represents an HTTP session from the client's perspective. For example, if I were implementing a tabbed web browser, I would have one Session per tab. Each Session maintains its own cookie policy. Each may also support several simultaneous connections. Each Session also maintains its own password/authentication state.
Session has several convenience methods which make executing a GET, POST, or other request very simple. Note that all these methods block.
get(String uri): Executes a GET request on the given URI.
get(String uri, Parameter... params): Executes a GET request using the specified parameters on the given URI.
post(String uri): Executes a POST request on the given URI.
post(String uri, Parameter... params): Executes a POST request using the specified parameters on the given URI.
execute(Method method, String uri): Executes the given method on the given URI.
execute(Method method, String uri, Parameter... params): Executes the given HTTP method (such as Method.POST, Method.GET, Method.DELETE, etc.) on the given URI, using the specified parameters.
A Riddle
So what happens if the following is executed?
Session s = new Session();
s.post("http://mycompany.com/servlet?fruit=\"apple\"", new Parameter("color", "green"));
Answer: what you would hope. The fruit="apple" parameter is actually extracted from the URL and included with the "color" parameter in the body of the post.
SwingX-WS Requests
In addition to these convenience methods, you can gain greater control over the HTTP request by using a Request object together with the execute(Request req) method of Session.
Request encapsulates the details of making an HTTP request. It includes a flag for whether to automatically follow redirects, as well as properties for the method to use and the URL to request from. It also provides the API for reading and setting the parameters and headers for the HTTP request. Finally, there are several methods for setting the request body, including:
setBody(String)
setBody(byte[])
setBody(SimpleDocument) (a type of DOM document)
setBody(InputStream)
When a Request is executed, a Response is produced. In addition to providing access to the response body (as a byte array, String, InputStream, or Reader), Response also provides the response status code, headers, and base URL (the base URL that originated this response).
Asynchronous HTTP Requests
The Session and associated APIs are very useful, and very powerful. However, although they support multithreaded access, all of the get, post, and execute methods of the Session class block. If you were to call these methods from a Swing event handler, the entire GUI would freeze while the call was being made!
There are many ways to call a blocking method from your Swing code, including using SwingWorker, Spin, or Foxtrot. However, this problem is not unique to Swing. In the world of thin-client web browser applications, this has long been a usability problem. Each request from the web client to the server involved "blocking" the client while a new page was requested from the server. Recently, DHTML has been renamed Ajax, and with this came a whole new paradigm for web programming. Instead of blocking the entire web app while communicating with the server, Ajax allows the developer to execute long running tasks on a background thread by using the XMLHttpRequest JavaScript object. The interesting part of the story is that Ajax applications have essentially the same issues as rich clients: they both need to perform I/O on a background thread.
In the SwingX-WS project, we recently introduced the XmlHttpRequest class. This class is based on the W3C working draft specification www.w3.org/TR/XMLHttpRequest, and is similar to the XMLHttpRequest object found in web browsers today. Those of you familiar with Ajax, rejoice!
In addition to enjoying widespread use, XmlHttpRequest is incredibly easy to use:
final XmlHttpRequest req = new XmlHttpRequest();
req.addOnReadyStateChangedListener(new PropertyChangeListener() {
public void propertyChange(PropertyChangeEvent evt) {
if (XmlHttpRequest.ReadyState.LOADED == evt.getNewValue()) {
//update my Swing GUI here
textArea.setText(req.getResponseText());
}
}
});
req.open(Method.GET, "http://www.java.net");
//called from the Swing event handling code,
// this starts the background process
req.send();
First, create an XmlHttpRequest object. Then attach a PropertyChangeListener that will listen to the "readyState" property of the XmlHttpRequest object. The ready state indicates what state the XmlHttpRequest is in. States include:
UNINITIALIZED
OPEN
SENT
RECEIVING
LOADED
In this example, I listen for the LOADED ready state, indicating that the data is fully downloaded from the server. I then read the entire text by calling getResponseText(). Because this is an XmlHttpRequest instance (rather than the superclass AsyncHttpRequest), I can also call the getResponseXML() method to get a DOM document representing the response.
There is also the JsonHttpRequest class, which has the getResponseJSON and getResponseMap methods. These methods parse the response and construct either a JSONObject or a map containing the data.
The XmlHttpRequest and associated classes make it trivial to access web content on a background thread, and update the Swing UI on the proper event dispatch thread.
HTML Forms
Earlier I mentioned that one of the downsides to using HttpClient directly is that it doesn't support any HTML-isms, such as HTML forms. SwingX-WS contains a package, org.jdesktop.html.form, which does exactly that. It provides support both for parsing HTML forms out of HTML code, and for setting values in that form and submitting it back to the web server!
The core API is the Form interface. It defines the model portion of the HTML form element. For example, there is no id or name property in the Form interface, because they have more to do with HTML than the form concept itself.
The Form interface has methods for retrieving the "action" (the URL to navigate to when the form is submitted) and the HTTP method to use on submit. A Form is composed of a set of Inputs. Each Input represents an HTML <input> element. An Input has a name and value. The name property is read only, but the value property is read/write and takes a String.
HTML forms also support the concept of a group of inputs tied together by name (these are represented as radio buttons in the form), and a combo-box- or list-type selection component. The RadioInput and Select interfaces are used to represent these constructs.
Let's see how these pieces fit together. The following code fragment will log in to java.net using your username and password.
Session s = new Session();
Response r = s.get("https://www.java.net/servlets/TLogin");
Form form = Forms.getFormById(r, "loginform");
form.getInput("loginID").setValue(username);
form.getInput("password").setValue(password);
r = form.submit(s);
First I create the session and get the HTML page associated with the java.net login servlet. Next, I call into the Forms utility class to parse an HTML form from the response and create an implementation of the Form interface. The rest simply sets the input values and submits the form. By the way, when the form is submitted, java.net returns a session cookie. You can inspect this cookie in the response. The important point is that this is all handled for you automatically.
The Forms class contains several utility methods to help you parse out an HTML form from an HTML document:
getForm(Response r, String expression): Uses the given XPath expression to find the <form> tag in the HTML document to use in creating a Form.
getFormByName(Response r, String name): Creates a Form based on the HTML form with the given name.
getFormById(Response r, String id): Creates a Form based on the HTML form with the given ID.
getFormByIndex(Response r, int n): creates a Form based on the nth form in the HTML file.
Note: The Forms methods accept XML and attempts to use malformed HTML. JTidy is used behind the scenes and attempts to fix the HTML prior to parsing it.
These classes make it trivial to interact with existing HTML documents containing Forms.
Simplified DOM and XPath
I've shown how to very easily get access to web content including both XML and HTML documents. But actually accessing the content of these document types is no easy matter! HTML is often malformed, and even when using tools such as TagSoup or JTidy, parsing the resulting XML document can be very verbose.
XPath was created for this very reason. Elliott Rusty Harold wrote a great article, "The Java XPath API," that briefly outlines how and why you would want to use XPath for evaluating DOM documents. Rather than recapitulate those arguments, if you aren't yet a believer in XPath, I suggest you read that article.
Although XPath has simplified the job of extracting data from DOM documents, its usage in Java isn't as concise as I'd like. To use XPath in Java, you have to do the following:
Session s = new Session();
Response r = s.get("http://www.google.com");
DocumentBuilder builder =
DocumentBuilderFactory.newInstance().newBuilder();
ByteArrayInputStream in =
new ByteArrayInputStream(r.getBody().getBytes());
Document dom = builder.parse(in);
in.close();
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression exp =
xpath.compile("//a"); //gets all the anchor tags in the document
NodeList nodes = xpath.evaluate(exp, dom, XPathConstants.NODE_LIST);
for (int i=0; i<nodes.length(); i++) {
Node n = nodes.item(i);
String href =
xpath.evaluate("@href", n, XPathConstants.NODE)
.getTextContent();
String text = n.getTextContent();
System.out.println(text + "(" + href + ")");
}
By contrast, the following code snippet performs the same task as that above, but uses the helper classes of the org.jdesktop.dom package.
Session s = new Session();
Response r = s.get("http://www.google.com");
SimpleDocument dom =
SimpleDocumentBuilder.simpleParse(r.getBody());
for (Node n : dom.getElements("//a")) {
String href = dom.getString("@href");
String text = n.getTextContent();
System.out.println(text + "(" + href + ")");
}
In the third line, I create a new SimpleDocument by asking the SimpleDocumentBuilder to parse the response body. In the fourth line, I then iterate over all the elements of the DOM tree that match the specified XPath expression!
The benefit of this code is not merely the reduced number of lines. Rather, it is in the conceptual weight of the code. How many different concepts are required to understand the traditional DOM/XPath code segment? There are two factories, a builder, an optional compilation step, the NodeList API, and XPathConstants. In the second code block, you need to know two new classes, and the DOM API.
SimpleDocument can be created two ways. First, you can create it via its constructor, which requires a delegate DOM document. SimpleDocument simply wraps a normal org.w3c.dom.Document object, and provides convenience methods atop it. Second, you can create a SimpleDocument by calling one of the static simpleParse methods on SimpleDocumentBuilder, or one of its non-static parse methods.
SimpleDocument and SimpleDocumentBuilder both extend the normal DOM API, and can therefore be used anywhere traditional DOM is required.
Let's look a moment at some of the key API in SimpleDocument:
toXML(): A simple method to convert a SimpleDocument into the equivalent XML String.
getElements(String expression): Returns all the DOM elements that match the given XPath expression.
getElement(String expression): Returns the element matching the given XPath expression.
getString(String expression): Returns a String (the textContent() of the DOM element) matching the given XPath expression.
The getElements and getString methods on SimpleDocument are convenience methods that delegate to org.jdesktop.xpath.XPathUtils. SimpleDocument manages a cache of compiled XPath expressions, so you don't have to worry about compiling them. If you want to manage compilation manually, you can skip the SimpleDocument helper methods and instead use XPathUtils directly.
Conclusion
Working with HTML and XML content in Java rich clients can be very productive. With the inclusion of a little bit of new API and leveraging existing open source projects, I've demonstrated how you can very easily download and work with various types of content, including XML, HTML, and JSON. Using XPath and SimpleDocument, you can easily extract data from XML documents and use this data in your Swing applications.
Thanks! I should give props to Jesse Wilson, who suggested an API very similar to what I ended up doing for Session, Request, and Response.
how do you compare with "event bus" ?
2006-10-13 06:32:54 dmdevito
[Reply | View]
event bus is a Swing-oriented single-process publish/subscribe event routing library. How do you compare with it ? It looks like event bus is a more general solution, while you are providing more helper classes for specific, but more common, domain (i.e. html). Can you give me your thinkings about the comparison ? Thanks.
-- Dominique
how do you compare with "event bus" ?
2006-10-13 07:55:30 rbair
[Reply | View]
I don't see how the two relate. Are you specifically talking about the XmlHttpRequest and related classes? They are definitely intended to mirror their peers in the web world. You could always write a standard PropertyChangeListener that would propagate the ready state change event on the even bus, if you like.
how do you compare with "event bus" ?
2006-10-13 08:31:05 dmdevito
[Reply | View]
Slide 49 of the Hop
on Board the Swinging EventBus! presentation says: EventBus
is perfect for updating client from server
. So,
I see XmlHttpRequest and the EventBus project related.
Yes, I was specifically talking about the XmlHttpRequest class. The use of this class implies to follow an
event-way. So does the "event bus" way of programming. I see a difference as the EventBus does not imply a
specific transport layer, while you have choosen to micmic the html way of the web world. That the major
difference I see between your approach and the "event bus" approach.
Are you going to implement the XmlHttpRequest way on top of a more event-oriented
approach a la EventBus ? Or you have just focusing on the quite common, so widely adopted
html way ?
how do you compare with "event bus" ?
2006-10-13 08:36:25 rbair
[Reply | View]
I see them as entirely separate. There are many event buses out there (heck, I expect JSR 296 might end up with one). I'd prefer not to introduce any more complexity into the basic API design than is necessary. If you are using an event bus to decrease complexity of large applications -- well you can defintely plug XmlHttpRequest into the event bus to participate in that system.
I use java.util.Scanner to harvest the content of a webpage. For instance:
InputStream in = url.openStream();
String data = new java.util.Scanner(in).useDelimiter("\\Z").next();
in.close();
To me, SOAP is a web service, but not the only one. Often you don't have a SOAP endpoint to talk to. I may be abusing the accepted term, but I think of REST also as a web service.
trying to login using a form over SSL
2006-10-12 12:07:56 codecraig
[Reply | View]
I am trying to login using a form over SSL...like your an example....and I get this error
javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
any idea
trying to login using a form over SSL
2006-10-12 12:57:35 rbair
[Reply | View]
What form are you trying to log into?
It seems like this error may be related to the server you are attempting to login to. Java is really uptight, and won't use SSL with any server that doesn't have a real (ie: not self signed) certificate, such as one purchased from Thawte or Verisign.
Try logging into java.net (which also uses https).
Incidently, I'd love it if the mechanism used by WebStart were available for ther rest of us. Instead of tanking when a self-signed cert is encountered, there should be a hook I can use to prompt the user and get their permission to connect to the host anyway.
trying to login using a form over SSL
2006-10-12 13:25:26 codecraig
[Reply | View]
thats probably the problem. thanks!
The last example does not work
2006-10-12 09:54:06 jglick
[Reply | View]
It cannot parse HTML, it needs XML.
[Fatal Error] :11:3: The element type "meta" must be terminated by the matching end-tag "</meta>".
Exception in thread "main" org.xml.sax.SAXParseException: The element type "meta" must be terminated by the matching end-tag "</meta>".
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:239)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:283)
at org.jdesktop.dom.SimpleDocumentBuilder.parse(SimpleDocumentBuilder.java:97)
at org.jdesktop.dom.SimpleDocumentBuilder.parse(SimpleDocumentBuilder.java:49)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
at org.jdesktop.dom.SimpleDocumentBuilder.parse(SimpleDocumentBuilder.java:104)
at org.jdesktop.dom.SimpleDocumentBuilder.simpleParse(SimpleDocumentBuilder.java:238)
at org.jdesktop.dom.SimpleDocumentBuilder.simpleParse(SimpleDocumentBuilder.java:284)
at swingxwstest.Main.main(Main.java:11)
Anyway I don't quite follow what the line
String href = dom.getString("@href");
is doing. The loop variable n is not used.
The last example does not work
2006-10-12 10:16:20 rbair
[Reply | View]
Sorry, that one slipped by me. Replace "http://www.google.com" with "http://csszengarden.com". Also, you are right, it should be : String href = dom.getString("@href", n);
This last line is asking for the String value of the href attribute of the node, n.
The last example does not work
2007-09-12 01:47:54 bodat
[Reply | View]
hey, the SimpleDocumentBuilder.simpleParse(param) will always returns an org.xml.sax.SAXParseException, any help will be usefull..
This is fun!
2006-10-12 07:12:18 wsnyder6
[Reply | View]
I can think of alot of neat applications of this project!
I like the JXForm idea for submitting data - but what something for just viewing data?
One thing I would like to do is take URL repsonses (Documents) and bind them to Swing components. [In fact, I wrote a binding framework (similar to the inital SwingLabs .5 API) and used JXPath to do this.]
This is fun!
2006-10-12 08:39:06 rbair
[Reply | View]
Hey!
Definitely. Imagine something like:
label.bind("//div[@id='welcome']/p/text()");
I'm also thinking that a custom TableModel, TreeModel, and ListModel might be nice while we're waiting for JSR 295.
This is fun!
2007-07-16 21:31:00 3ufblogg
[Reply | View]
This is fun!
2006-10-12 09:50:55 wsnyder6
[Reply | View]
Actually, if you need help with binding, I'm game. I've got some code that constructs DocumentDataModels - and allows you to spin off chunks of the underlying Document into child DataModels - useful for Tree/Table/List bindings.
But the downside is, I wrote it during some down time at work, so while I can take the ideas, I can't quite take the code....
This is fun!
2006-10-12 09:19:44 wsnyder6
[Reply | View]
Yes! Definitely like that.
And definitely would be cool to see some extensions (or an alternate implementation of JSR295) that would allow that.