Skip to main content

Taking a Tour of ROME

February 2, 2006

{cs.r.title}









Contents
The Main Streets of ROME
Be Cautious of Uncertain Turns
Making Your Own Routes
Let Us Not Circle Over the Same Paths
No Need to Revisit Known Routes
More Roads to be Explored
Resources

On the java.net project page for ROME, the famous line from Ambrose Bierce is quoted:

All roads, howsoe'er they diverge, lead to Rome

In this case, it's all feeds that may be reached by ROME. The ROME in question is a Java library that provides a single interface to web syndication feeds while abstracting the differences between RSS and Atom. ROME version 0.8 contains many bug fixes and support for Atom 1.0. With it you can read, create, merge, filter, and otherwise mash up your favorite syndicated streams.

ROME uses the JDom library for parsing the XML and building the objects that the developer uses. JDom, in turn, can use the XML parser it has built into it or use others that conform to the JAXP specification. For the sample code in this article, JDom was set up to use the Xerces parser from the Apache project (distributed with JDom).

The Main Streets of ROME

To begin with, let's look at a fairly simple case of using the classes and methods from ROME. The code in FeedReader.java (all code is downloadable from the link in Resources, below) shows a simple Java class that reads a feed from a URL on the command line. It parses the feed and presents some simple information about it: title, author, description, publication date, copyright, etc. It lists the URIs of any syndication modules (such as Dublin Core) that the feed uses, and then the titles of the entries (or articles, if you prefer). Lastly, it shows the URL of the image that the feed references, if one is given.

We start with a quick look at the imports section, just the imports provided by ROME itself. These five come from three of the six namespaces that ROME uses:

com.sun.syndication.feed
Provides the parent class for the RSS and Atom beans.
com.sun.syndication.feed.atom
Provides the implementation classes for the core elements of Atom feeds.
com.sun.syndication.feed.module
Provides the beans for handling syndication modules. The example code uses the Module interface from this namespace.
com.sun.syndication.feed.rss
As with the Atom-related namespace above, this provides the implementation classes for RSS feeds.
com.sun.syndication.feed.synd
This namespace holds most of the bean classes that an application will actually use. The interfaces and concrete classes that provide access to feeds while abstracting their details are here. The sample code uses SyndFeed and SyndEntryImpl from this namespace.
com.sun.syndication.io
Lastly, this namespace provides input and output classes for reading and parsing the feeds themselves, prior to their instantiation as classes from the previous namespaces. The sample code uses the XmlReader class from here, which handles the character-set issues in the XML reading/parsing process. It also uses the SyndFeedInput class, which will use the XmlReader to actually pull in the contents of the feed.

The heart of the program starts around line 21:

final URL feedUrl = new URL(args[0]);

final SyndFeedInput input = new SyndFeedInput();
final SyndFeed feed =
    input.build(new XmlReader(feedUrl));

An object of the SyndFeedInput class is built to read and parse a feed, and it in turn uses an on-the-fly instance of the XmlReader class to provide the input stream. The XmlReader class is very handy here, since it tries to handle all character-encoding issues for you. The SyndFeedInput object handles creating feed objects from input sources (like the XmlReader). SyndFeed is an interface to all of the types of feeds ROME provides support for. With a SyndFeed handle, you can treat all feeds identically.

The next lines use accessor methods of the SyndFeed handle to pick out interesting parts of the feed for display to the screen:

System.out.println("Title: " + feed.getTitle());
System.out.println("Author: " + feed.getAuthor());
// and so on...

Most of the classes that an application will use follow the Java Bean pattern, with member data accessible via getter and setter methods. Members like the syndication modules and the feed entries themselves return objects that can be managed via the List interface. This is done in the for loops, for the syndication modules and entries.

for (final Iterator iter =
     feed.getModules().iterator();
     iter.hasNext();)
{
    System.out.println("\t" +
        ((Module)iter.next()).getUri());
}

The Module interface works for extension modules as the SyndFeed does for feeds, exposing data via bean-style "get" methods. The URI of a syndication module uniquely identifies it, so that is what the sample program displays. The fact that all components are managed as beans make them easy to use and exchange; for example, in a server-side aggregator.

Be Cautious of Uncertain Turns

There are a few different types of exceptions that may get thrown in the process of reading and parsing a feed. The sample code takes a short-cut approach by just putting the whole block of main logic in a try-catch construct. The URL class may complain if the input is badly formed, and the XmlReader and SyndFeedInput classes have their error cases, as well. Some of these are lower-level I/O exceptions that get propagated upwards.

Depending on how you are using the ROME libraries, you may want to have finer control over the exception handling. Or you can follow the example here for a simpler approach.

Making Your Own Routes

ROME is by no means limited to just reading feeds. It provides beans for creating them, as well. To demonstrate this, we'll partially recreate the "inbox" functionality of the del.icio.us social bookmarking service.

The inbox feature allows a user to choose several feeds--from individual tags such as "java" or from other users such as "rjray"--and combine them into a single feed. It effectively merges all of the separate RSS channels into one, which is itself made available as RSS. The code in DeliciousMerger.java reproduces this, with a difference: it outputs all of the entries, not just the 30 most-recent ones.

Diving straight into the code:

final StringBuffer tagList =
    new StringBuffer(args[0]);
for (int argidx = 1; argidx < args.length;
     argidx++)
{
    tagList.append(", ");
    tagList.append(args[argidx]);
}
newFeed.setTitle("Combined del.icio.us Tags: " +
    tagList);
newFeed.setDescription("Aggregation of tags: " +
    tagList);
newFeed.setFeedType("rss_1.0");
newFeed.setAuthor("DeliciousMerger");
newFeed.setLink("http://del.icio.us");

The first six lines here create a string that combines all of the tags passed on the command line, for use in the description of the new feed. And the next five start the creation of the new feed by setting the title, author, description, base URL, and feed type. The type is worth looking at more closely: at present, ROME can produce feeds in several flavors of RSS, as well as Atom 0.3 and Atom 1.0. The type-strings for the options are:

  • rss_0.9
  • rss_0.91
  • rss_0.92
  • rss_0.93
  • rss_0.94
  • rss_1.0
  • rss_2.0
  • atom_0.3
  • atom_1.0

In the example code, a RSS 1.0 feed is being created.

The following lines are all that are needed to collect the entries from each feed (per the tags given on the command line) into a single list:

feedUrl = new URL(urlBase + args[idx]);
feed = input.build(new XmlReader(feedUrl));
entries.addAll(feed.getEntries());

The first two lines are virtually identical to the first example, while the third line takes advantage of the List-style return value of getEntries to simplify collecting entries.

Further down, the code does a little shuffling around to convert the contents of the ArrayList into an ordinary array of SyndEntry objects:

SyndEntry[] entriesArray =
    new SyndEntry[entries.size()];
entriesArray =
    (SyndEntry[])entries.toArray(entriesArray);
Arrays.sort(entriesArray,
    merger.new OrderByDate());

First, the toArray method from the List interface is used to get the array representation (the odd calling syntax lets toArray know how to properly cast the elements of the array it is creating).

The sorting itself uses a small inner class (defined further down) that implements the Comparator interface, in order to sort the array of entries by their dates. This turns the list from several segments that were sorted individually into one single list sorted completely. The built-in comparison logic of the Date class does the real work of sorting for us.

The next line after sorting simply sets the entries for the new feed by calling setEntries with the array of entries cast back into a List. Going from the list to the array and back was just for the sake of sorting.

After all of this, writing the new feed is almost anti-climactic. It's almost easier than SyndFeedInput was, since it is being sent to the console:

output.output(newFeed,
    new PrintWriter(System.out));

With newFeed capable of turning itself into XML, all the SyndFeedOutput object needs is an output stream to send it to.

Let Us Not Circle Over the Same Paths

The DeliciousMerger class combines everything like the del.icio.us inbox feature does, but it also repeats elements as the inbox does. And since del.icio.us is social in nature, when a link pops up in a feed, it is often linked by others using the same tag, causing it to reappear. Let's fix that.

The code in DeliciousMerger2.java is based very closely on DeliciousMerger.java. Where it differs is in the for loop about halfway down the code:

for (final Iterator iter =
     feed.getEntries().listIterator();
     iter.hasNext(); )
{
    final SyndEntry entry =
        (SyndEntry)iter.next();
    if (! seenUrls.containsKey(entry.getLink()))
    {
        entries.add(entry);
        seenUrls.put(entry.getLink(), entry);
    }
}

Here, rather than adding all the entries blindly, we keep a HashMap object that we use to keep track of each URL as it is seen. If a URL is already present in the map, then it doesn't get added to the new list a second (or third) time. Those lines (plus the declaration of seenUrls and the extra imports) are the only differences between the two.

No Need to Revisit Known Routes

Because ROME needs the JDom package anyway, you have it at your fingertips, available should you need to parse any XML of your own. Because of this, it is almost as easy to filter out URLs that you already have saved, as it was to eliminate duplicates. You can fetch your full set of bookmarks from del.icio.us and use them to pre-populate the seenUrls map.

DeliciousMerger3.java does this by adding a static method called readDelBookmarkFile, which is added towards the bottom (before the private inner class we use for sorting). Since this article isn't about JDom, we'll go lightly on this part. The command-line argument list now expects the first argument to be the name of the file to which you saved your complete bookmark list. The parsing of this file is very simplistic, and only looks for the bare minimum tag and attribute sets needed to get the data we want. Since JDom gives us the matching child elements (we want the ones named post) in a handy List, sticking them in the table is as easy as looping over the feed entries elsewhere:

for (final Iterator iter = children.iterator();
     iter.hasNext(); )
{
    element = (Element)iter.next();
    key = element.getAttributeValue("href");
    if ((key != null) &&
        (key.length() > 0))
    {
        marks.put(key, key);
    }
}

Because we're parsing with JDom, reading the file may throw a JDomException, not just IOException. The catch block checks for this. The creation of the string that combines the tags into a comma-separated list starts at argument 2 rather than 1, since the first argument is now the bookmarks filename.

More Roads to be Explored

This should give you a good start on using ROME. It doesn't end here; ROME has even more features, such as creating feeds, injecting module information, etc. The examples get you going, and hopefully provide room to experiment and expand. You could implement command-line options to control the number of elements produced by the merger classes, or sub-class the beans to allow extra annotations (what the source was, or information for CSS/XHTML rendering).

ROME also has a plugin model. The ROME project's Wiki page provides links to some current plugin projects. These provide support for RSS modules such as site content, iTunes podcasting extensions, and Creative Commons license information, and can be used as examples for writing your own.

But that is a road for another day.

Resources

width="1" height="1" border="0" alt=" " />
Related Topics >> Programming   |