Skip to main content

More RSS for Java

October 30, 2003

{cs.r.title}






In my previous article, we looked at the creation of a simple RSS feed using the Informa API. In this article, we are going to refine the earlier solution and look at some other aspects of the Informa API and the RSS specification, and take a brief look at the OPML format.

Handling Updates the RSS Way

One of the main problems with the solution proposed in the first article is that whenever a feed was requested, the RSS file was downloaded from the source server and parsed before being displayed. A more sensible approach would be to cache loaded feeds and periodically update them. Now, we could just take a guess and assume that a feed will update every hour and handle the update ourselves, and indeed this would be a sensible approach to take, if not for the fact that some RSS feeds can provide this information for us.

With the RSS 0.91 specification, three update-related elements were added: <ttl>, <skiphours>, and <skipdays>. The <ttl> element specifies the time-to-live for the feed – the number of minutes that a feed can be cached before updating the feed. The <skiphours> and <skipdays> elements defined on which hours and days a feed will not be created. Let's look at a fragment of an RSS 0.91 file for how these elements might be used:

  <ttl>60</ttl>
  <skiphours>
    <hour>0</hour>
    <hour>1</hour>
    <hour>2</hour>
    <hour>3</hour>
    <hour>4</hour>
    <hour>5</hour>
    <hour>6</hour>
    <hour>22</hour>
    <hour>23</hour>
  </skiphours>
  <skipdays>
    <day>Saturday</day>
    <day>Sunday</day>
  </skipdays>

Here, the <ttl> value of 60 specifies that the feed may be cached for 60 minutes. The nested <hour> elements inside of the <skiphours> elements tells us that no feed will be generated between 10 at night and six in the morning, and the <skipdays> that no feed will be generated on Saturday or Sunday.

Seems simple so far? Well, RSS gives us another way to specify this kind of information in the form of an external module. Modules enable an RSS file (conforming to 1.0 or greater of the RSS specification) to be extended with additional elements that can add extra functionality to a feed. The Syndication module (sy) adds information concerning update-frequency information. Let's look at a snippet from an RSS file using the Syndication module:

  <rss version="2.0"
  xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
  xmlns:content="http://purl.org/rss/1.0/modules/content/">
  …
  <sy:updatePeriod>daily</sy:updatePeriod>
  <sy:updateFrequency>4</sy:updateFrequency>
  <sy:updateBase>2003-08-03T12:00</sy:updateBase>
  …

The xmlns:sy=http://purl.org/rss/1.0/modules/syndication/ line imports the Syndication module into the sy namespace (much as you would import a taglib into a JSP); all Syndication module tags will now be prefixed with an sy:. The <sy:updatePeriod>daily</sy:updatePeriod> and <sy:updateFrequency>4</sy:updateFrequency> tags let us know that the file is updated four times daily, with <sy:updateBase>2003-08-03T12:00sy:updateBase> giving us a base time from which to work out the updates.

So why do we have two different ways of letting us know the same thing? Well, the Syndication module was introduced as it was felt that the existing and elements were overly verbose and not really flexible enough. These older elements are unlikely to be removed from the specification, but it's likely that newer RSS feeds will start to phase out their use in favor of the features provided by the Syndicaton module.

Handling Subscription Updates with Informa

The tag detailed in the previous article had a major flaw – each time the tag was requested, the feed was downloaded and parsed. We are going to add a new class to handle the selective updating of feeds based on the provided update information.

Support for <ttl>, <skiphours>, and <skipdays> may be added in a future version of Informa, but it already supports the newer Syndication module tags. You should be familiar with the basics of the ChannelIF interface from the previous article. The 0.3.0 Informa release added new methods to match the tags provided by the Syndication module -- getUpdateBase(), getUpdateFrequency(), and getUpdatePeriod(). Now in and of itself, this wouldn't give us too much in the way of functionality -- we still need something that can use these attributes accordingly. Luckily, the latest version of Informa adds a new FeedManager to manage multiple classes, which also handles the caching of requested feeds. It's worth noting that if the feed in question doesn't supply time-to-live information, then sensible defaults are used.

Let's take a look at some of the FeedManager methods in a little more detail.

public static void setChannelBuilder(ChannelBuilderIF chBuilder)

public ChannelBuilderIF getChannelBuilder()

These methods set and return the ChannelBuilder to be used when loading a feed. As you may remember from the previous article, different ChannelBuilderIF implementations can be used to create storage for the feed (0.3.0 added a Hibernate implementation). We won't have to worry too much about this at present -- by default, the FeedManager uses the default implementation, which creates a an in-memory feed.

public FeedIF addFeed(String feedUri)
    throws FeedManagerException

public void removeFeed(String feedUri)

The addFeed method adds a feed to the manager and returns it. If there is a problem finding or parsing the file specified, then a FeedManagerException will be thrown.

public FeedIF getFeed(String feedUri)
    throws FeedManagerException

Finally, the getFeed method returns a managed feed. So far, this looks like a simple store for a series of RSS feeds. However, behind the scenes, the FeedManager class is actually taking into account any defined time-to-live parameters. After the initial request for a feed, the FeedManager maintains a cache of the feed. If another request is made for the feed before it should be refreshed, the cached copy is returned; otherwise, the feed is reloaded and parsed.

It's worth noting here the use of the FeedIF class -- the FeedIF class contains metadata concerning a Channel, information such as the last time the data was updated, or the copyright of the Channel. The FeedManager class will update the lastUpdated property of the FeedIF when it reloads the channel. This makes it easy for you to display the last-update times in your applications.

Now we'll look at how our custom tag from the last article will look after being updated to use the FeedManager:

public class ManagedRssFeedTag extends TagSupport {
  private String uri;
  private String var;
  private ChannelIF channel;
  private static FeedManager feedManager =
      new FeedManager();

  /**
  * Returns the name of the bean used to store
  * the Rss Feed
  */
  public String getVar() {
    return var;
  }

  /**
  * Specifies the name of the bean that will
  * hold the channel
  */
  public void setVar(String var) {
    this.var = var;
  }

  /**
  * Retrieves the URI for the feed
  */
  public String getUri() {
    return uri;
  }

  /**
  * Sets the URI for the feed
  */
  public void setUri(String uri) {
    this.uri = uri;
  }

  public int doStartTag() throws JspException {
    JspWriter out = pageContext.getOut();

    try {
      FeedIF feed = feedManager.addFeed(getUri());
      pageContext.setAttribute(getVar(), feed);
    } catch (FeedManagerException e) {
      throw new JspException(e);
    }

    return EVAL_BODY_INCLUDE;
  }
}

Comparing this to the previous version of this tag, you can see that the only major code changes are that we are now using the FeedManager to get the feed, rather than parsing it directly in the doStartTag method, and that we are now dealing with a FeedIF object rather than the previous use of the ChannelIF code. This greatly simplifies the tag's code itself, and also has the effect of giving us the efficiencies provided by using the FeedManager.

The changes to the JSP are likewise quite trivial:

<rss:readFeed 
uri="http://freshmeat.net/backend/fm-releases-software.rdf"
var="feed">
 
  <IMG src="<c:out value="${feed.channel.image.location}"/>">
  <A HREF="<c:out value="${feed.site}"/>">
  <strong><c:out value="${feed.title}"/></strong></A>
 
  <ol>
    <c:forEach var="item" items="${feed.channel.items}">
      <li><A HREF="<c:out value="${item.link}"/>">
      <c:out value="${item.title}"/></A></li>
    </c:forEach>
  </ol>
  [Last Updated: <c:out value="${feed.lastUpdated}"/>]
</rss:readFeed>

Here we display a list of the posts made in the channel as before; however, to access the channel itself, we are now going via the feed bean. As we have the metadata associated with the channel, we also display the time the channel was last loaded -- as we covered before, the FeedManager will only reload the feed according to the time-to-live parameters specified in the RSS file itself.

Let's see how our finished tag looks in a web page. Figure 1 shows the latest software releases from Freshmeat:

Screenshot of the tag in action
Figure 1. Output from RSS tag

OPML Overview

The Outline Processor Markup Language (OPML) was developed by
UserLand Software in 2000. This
XML format has been designed to allow the transfer of outline-structured information,
such as MP3 playlists, and can even be used for editing documents
(see outliners.com
for more detail). One of the major uses for the OPML format has been to manage a collection of
RSS feeds -- this is unsurprising, given UserLand Software's involvement in the RSS specification.
We will look
at processing an OPML file used to describe a list of RSS feeds to which someone has
subscribed. For example, the FeedDemon
aggregator program
allows the export and import of OPML files -- this can be handy when moving between
different aggregator services, or keeping
desktop aggregators in sync between the office and home.

Let's take a look at an OPML file produced from FeedDemon:

<?xml version="1.0"?>
<opml version="1.1">
  <head><title>Favorite Channels</title></head>
  <body>
    <outline text="Java.net"
    title="Slashdot"
    type="rss"
    version="RSS"
    xmlUrl="http://today.java.net/pub/q/news_rss?x-ver=1.0"
    htmlUrl="http://today.java.net/"
    description="Java.net"/>
    <outline text="Wired News"
    title="Wired News"
    type="rss"
    version="RSS"
    xmlUrl="http://www.wired.com/news_drop/netcenter/netcenter.rdf"
    htmlUrl="http://www.wired.com/"
    description="Wired"/>
  </body>
</opml>

As you can see, it's very simple indeed. Each feed is detailed in an outline
element. The type field lets us know what kind of data we're referencing --
here it's RSS, but could be an MP3 audio format. The xmlUrl gives
us a pointer to where the XML format of the data can be found (the URL for the
RSS of the feed), with the htmlUrl element describing where the
HTML format of the feed can be found. Finally, the text attribute
details how the outline should be displayed, with the description
attribute giving more detailed information.

Informa OPML Support

Informa adds some simple support for loading OPML files. The OPMLParser is capable of parsing an OPML file and producing a series of FeedIF objects. Let's look at a simple piece of code to load and list the feeds and their URLs referenced in the file above.

try {
  Collection feeds =
    OPMLParser.parse("file:///C:/feeds.opml");
  for (Iterator iter = feeds.iterator();
       iter.hasNext();
       ) {
    FeedIF feed = (FeedIF) iter.next();
    System.out.print("Feed: " + feed.getTitle());
    System.out.println(" ," +
        feed.getLocation().toString());
  }
} catch (Exception e) {
  e.printStackTrace();
}

As we saw earlier, using the FeedManager class can greatly simplify the handing of multiple feeds. The FeedManager class adds a convenience method to allow us to add all of the feeds detailed in an OPML file into the manager at once.

The addFeeds method takes as its argument the URI of an OPML file. It parses the file, identifies all of the referenced RSS feeds, and then parses and loads them into the FeedManager:

public Collection addFeeds(String opmlFileUri)

We will be using this method to create a new JSP tag.

A Blog Roll Tag

We are now going to use the FeedManager class to create a Blog Roll tag. A Blog Roll is a list of Blogs that someone reads regularly; adding a tag to a web site can be a great way of personalizing a portal. Before we look closely at its implementation, let's look how our tag is going to be used:

<rss:blogRoll opmlUri="file:///C:/feeds.opml" var="myFeeds">
  <c:forEach var="feed" items="${myFeeds}" >
    <li>
      <a href="<c:out value="${feed.site}"/>">
      <c:out value="${feed.title}" /></a>,
      Last Updated: <c:out value="${feed.lastUpdated}" />
    </li>
  </c:forEach>
</rss:blogRoll>

The blogRoll tag creates as its output a Collection holding FeedIF beans, each one representing an RSS Channel and some metadata concerning the Channel itself. The required var attribute specifies the name of the created Collection. Now, once we've loaded the feeds, we could process them in a series of ways. Here, we use the JSTL forEach tag to loop through the feeds, creating a list of links to the home pages for the feeds. For good measure, we also print the dates of the feeds, so we can see when each one was last updated.

The implementation of this tag is almost trivially simple.

public class BlogRollTag extends TagSupport {
  private static FeedManager feedManager =
    new FeedManager();
  private String opmlUri;
  private String var;
  private ChannelIF channel;

  /**
   * Returns the name of the bean used to store the Feeds
   */
  public String getVar() {
    return var;
  }

  /**
   * Specifies the name of the bean that will
   * hold the feeds
   */
  public void setVar(String var) {
    this.var = var;
  }

  /**
   * Retrieves the URI of the OPML file to read
   */
  public String getOpmlUri() {
    return opmlUri;
  }

  /**
   * Sets the URI of the OPML file to read
   */
  public void setOpmlUri(String uri) {
    this.opmlUri = uri;
  }

  public int doStartTag() throws JspException {
    JspWriter out = pageContext.getOut();

    try {
      Collection feeds =
        feedManager.addFeeds(getOpmlUri());
      pageContext.setAttribute(getVar(), feeds);
    } catch (FeedManagerException e) {
      throw new JspException(e);
    }

    return EVAL_BODY_INCLUDE;
  }
}

The call to FeedManager.addFeeds accesses the OPML file specified in the opmlUri attribute, loads all of the referenced feeds (or simply retrieves the already loaded feeds in the FeedManager itself), and returns the collection. We then store the returned Collection so it can be accessed in the page.

The result is shown in Figure 2.

The Blog Roll Tag example
Figure 2. The Blog Roll tag

Note that, depending on how many feeds you have in your OPML file, the tag could take a while to run the first time. This is because each RSS file is being downloaded, parsed, and added to the FeedManager. Subsequent calls to load the same OPML file will be very quick, as the FeedManager only reloads the feeds after a certain period of time. When the feeds get reloaded, the Last Updated time will change accordingly. A sensible approach might be to load the OPML file into your FeedManager on startup to eliminate this delay when the user first accesses the page.

Conclusion

We have looked at Informa's FeedManager class and seen how it can ease the management of multiple feeds. We have also taken a brief look at the OPML file format and how it can be used with Informa to manage a series of blogs. Next time, we'll look at some of the features of Informa that are focused more on the authors of feed aggregators, and see how they can be used to enrich a web application.

Sam Newman is a Java programmer. Check out his blog at magpiebrain.com.
Related Topics >> JSP   |   Web Services and XML   |