Skip to main content

Learn about JavaFX's APIs for Reading RSS and Atom Newsfeeds

November 30, 2009

{cs.r.title}







JavaFX 1.2 introduced many interesting APIs, including APIs for reading RSS and Atom newsfeeds. If you haven't worked with
these APIs, you'll discover that they greatly simplify the task of integrating a newsfeed reader into a JavaFX application.

This article introduces you to the RSS and Atom APIs. You first explore their common foundation, and then tour each API's key
classes. Finally, you gain insight into how these APIs work by exploring the FeedTask class's newsfeed-polling
implementation.

Common Foundation

The RSS and Atom APIs are offshoots of a common foundation that's rooted in the abstract javafx.async.Task
class. This class makes it possible to start, stop, and track an activity (task) that runs on a background thread.

Task provides onStart and onDone variables that identify functions to be invoked at
the start/end of the task, and other variables that report task progress and disposition (success or failure). This class
also provides abstract start(): Void and stop(): Void functions to initiate and terminate task
execution.

The abstract javafx.data.feed.FeedTask class extends Task. In addition to inheriting
Task's variables, and overriding its start() and stop() functions,
FeedTask provides the following functions and variables:

  • poll(): Void: Poll the newsfeed location for updated content, which is fetched, parsed, and
    delivered to the application.
  • update(): Void: Poll the newsfeed location. All content is fetched, parsed, and delivered to the
    application.
  • headers (of type javafx.io.http.HttpHeader[]) identifies a sequence of HTTP request headers that
    are to be sent to location each time this newfeed is polled. This variable defaults to null.
  • interval (of type javafx.lang.Duration) specifies the amount of time that must elapse before the
    newsfeed is once more polled for updates. You must specify a positive value for this variable, which defaults to
    0.0. (I wonder if it wouldn't be better to choose a positive value, such as 60s, to be the polling
    default, and perhaps allow 0.0 to indicate that polling isn't desired.)
  • location (of type String) specifies the newsfeed's address. This variable defaults to the empty
    string ("").
  • onException (of type function(:Exception):Void) identifies a function that's invoked when an
    exception occurs during the current poll. This variable defaults to null.
  • onForeignEvent (of type function(:javafx.data.pull.Event):Void) identifies a function that's
    invoked to handle extension elements, which are newsfeed elements whose namespace URI is not Atom or RSS. For
    example, given an Atom newsfeed whose feed element's start tag is specified as
    <feed 
    xmlns="http://www.w3.org/2005/Atom" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
    , parsing a subsequent
    <opensearch:totalResults>1911</opensearch:totalResults> element results in three foreign events (for
    the start tag, text, and end tag) because the namespace for totalResults is
    http://a9.com/-/spec/opensearch/1.1/ (as specified by the opensearch: prefix) instead of
    http://www.w3.org/2005/Atom. This variable defaults to null.

The common foundation is also rooted in the abstract javafx.data.feed.Base class, which is the base class for
RSS and Atom classes that describe various newsfeed elements. RSS's RSS and Atom's Feed top-level
element classes are examples of Base subclasses.

Base provides a namespaces variable (of type javafx.data.Pair[]) that contains the
namespace definitions in effect for the element. The name member of each Pair specifies the
namespace prefix; the value member specifies the namespace URI.

Base also provides a parent variable (of type Base) that identifies the parent
(enclosing) element. For example, the parent variable of Atom's Entry element class refers to its
containing Feed instance. If there's no parent (as is the case with Feed), this variable contains
null.

Finally, Base provides several functions that are useful when you need to create a custom feed parser. Because
this task is beyond the scope of this article, I refer you to Rakesh Menon's Custom Feed Parsers
blog post for more information and an example.

RSS API Overview

The RSS (Resource Description Framework Site Summary, Really Simple Syndication, Rich Site Summary)
API consists of 10 classes that are located in the javafx.data.feed.rss package. Central to this package is the
RssTask class.


RSS versions supported by the API
The RSS API handles newsfeeds that conform to versions 0.91 (with non-optional item elements) through 2.0.11
(the most recent version at time of writing) of the RSS specification.

The RssTask entry-point class extends FeedTask, and provides the following variables for installing
a custom factory, for reporting the newsfeed's channel element's non-item content, and for
reporting the content of each of the channel element's item elements:

  • factory (of type Factory) identifies the factory that's used to create objects that represent
    newsfeed elements. You only need to install your own factory when creating a custom feed parser.
  • onChannel (of type function(:Channel):Void) identifies a function that's invoked to report the
    channel element's non-item elements -- the RSS channel element contains
    item and non-item elements, and is itself contained within the top-level
    rss element. This variable defaults to null.
  • onItem (of type function(:Item):Void) identifies a function that's invoked to report the current
    item element. This variable defaults to null.

The Channel class extends the abstract RSS class, which represents the top-level
rss element, and which provides members for accessing the factory that's creating objects, for accessing the
task that's parsing the newsfeed, and more. In turn, RSS extends Base.

Channel also provides the following variables for accessing channel-oriented
(non-item-specific) content:

  • categories (of type Category[]) identifies the categories (in terms of domains and text values) to
    which this channel belongs.
  • copyright (of type String) specifies a copyright notice for channel content.
  • description (of type String) presents a phrase or sentence that describes this
    channel.
  • docs (of type String) specifies a URL that points to documentation for the format used in the RSS
    file. This might simply be a pointer to a Web page, and is useful for letting people, who encounter this RSS file in the
    future, understand the file's purpose (much like code comments).
  • generator (of type String) identifies the program that was used to generate this
    channel.
  • image (of type Image) identifies an image (in terms of description, height, link, title, URL, and
    width) that can be displayed with the channel content.
  • language (of type String) identifies the language in which the channel was
    written.
  • lastBuildDate (of type javafx.date.DateTime) specifies when this channel's content
    was last changed.
  • link (of type String) provides the URL to the Website that corresponds to this
    channel.
  • pubDate (of type DateTime) identifies the date when this channel was published.
  • title (of type String) provides this channel's title.
  • ttl (of type Duration) provides the number of minutes in which the news-reader can cache this
    channel before it must poll the newsfeed to refresh channel content.

Unsupported channel elements
For whatever reason, the RSS API doesn't support the channel element's cloud,
textInput, skipHours, and skipDays elements. These elements are not
represented by javafx.data.xml.QName constants in the RSS class, and they are not represented by
variables in the Channel class.

As with Channel, the Item class, which describes one of the channel's
item elements, extends RSS. It provides the following variables:

  • author (of type String) provides the email address of this item's author.
  • categories (of type Category[]) identifies the categories to which this item
    belongs.
  • comments (of type String) specifies the URL of a Web page containing comments about this
    item.
  • description (of type String) provides a description of this item.
  • enclosure (of type Enclosure) describes a media object (in terms of length, MIME type, and URL)
    that's attached to this item.
  • guid (of type Guid) specifies, for this item, a globally unique identifier (in
    terms of text and an indicator of whether or not this text permanently points to the full item described by this
    item).
  • link (of type String) provides this item's URL.
  • pubDate (of type DateTime) identifies the date when this item was published.
  • source (of type Source) identifies the originating channel (in terms of the name
    of the channel and an XMLization of that channel) for this item.
  • title (of type String) provides this item's title.

I've created a NetBeans RSSDemo project whose Main.fx source code demonstrates RssTask
in terms of its interval, location, onStart, onChannel,
onItem, onException, onForeignEvent, and onDone variables.

/*
* Main.fx
*/

package rssdemo;

import java.lang.Exception;

import javafx.data.feed.rss.Channel;
import javafx.data.feed.rss.Item;
import javafx.data.feed.rss.RssTask;

import javafx.data.pull.Event;

def MAX_POLLS = 3;

var counter = 0;

def task:RssTask = RssTask
{
    interval: 15s

    // The following location demonstrates a basic RSS newsfeed.

    location: "http://javajeff.mb.ca/rss/javajeff.xml"

    // The following location demonstrates onException().

    // location: "http://developers.sun.com/rss/sdn_features.xml"

    // The following location demonstrates onForeignEvent().

    // location: "http://feeds.dzone.com/javalobby/frontpage?format=xml"

    // The following location demonstrates IllegalArgumentException (must use
    // AtomTask for Atom feeds).

    // location: "http://feeds.sophos.com/en/atom1_0-sophos-company-news.xml"
 
    onStart: function (): Void
    {
        println ("Task is starting");

        if (++counter > MAX_POLLS)
        {
            task.stop ();
            FX.exit ()
        }
    }

    onChannel: function (c: Channel): Void
    {
        println ("Channel: {c}")
    }

    onItem: function (i: Item): Void
    {
        println ("Item: {i}")
    }

    onException: function (e: Exception): Void
    {
        println ("Exception: {e}");
        task.stop ();
        FX.exit ()
    }

    onForeignEvent: function (e: Event): Void
    {
        println ("Event: {e}")
    }

    onDone: function (): Void
    {
        println ("Completed poll #{counter}")
    }
}
task.start ()

The source code introduces a constant that specifies the maximum number of times to poll the newsfeed, and a variable that
counts the number of polls that have been made so far. The idea is to limit the number of times the newsfeed is polled so
that the application won't run indefinitely.

After invoking the RssTask instance's start() function, which starts the newsfeed-polling
operation, the newsfeed located at the address assigned to location is polled every 15 seconds. The
onStart() callback is invoked at the start of each poll.

This callback tests to see if the counter has exceeded the maximum number of polls. If so, stop() is invoked to
stop the polling, and FX.exit() is invoked to kill the background thread that's associated with the
RssTask instance, allowing the application to exit.

Perhaps you're wondering why I placed if (++counter > MAX_POLLS) in onStart(), as opposed to
onDone's callback. I did this because onDone() isn't always called at the end of each poll. (You'll
discover why this happens later in the article.)

It's possible that an exception might be thrown as a result of the newsfeed being read or parsed. If this happens, the
onException() callback invokes stop() to stop the polling task, and then invokes
FX.exit() to kill the background thread and terminate the application.

This simple framework serves as a starting point for exploring the RSS API. As an exercise, expand onChannel()
and onItem() to output the values of their Channel and Item arguments' various
variables.

Atom API Overview

In contrast to RSS, the Atom API consists of 12 classes that are located in the
javafx.data.feed.atom package. Central to this package is the AtomTask class.


Atom versions supported by the API
The Atom API handles newsfeeds that conform to version 1.0 (the most recent version at time of writing) of the href="#resources">Atom specification.

The AtomTask entry-point class extends FeedTask, and provides the following variables for
installing a custom factory, for reporting the newsfeed's feed element's non-entry content,
and for reporting the content of each of the feed element's entry elements:

  • factory (of type Factory) identifies the factory that's used to create objects that represent
    newsfeed elements. You only need to install your own factory when creating a custom feed parser.
  • onFeed (of type function(:Feed):Void) identifies a function that's invoked to report the
    feed element's non-entry elements -- the Atom feed element contains
    entry and non-entry elements, and is itself the top-level element. This variable defaults
    to null.
  • onEntry (of type function(:Entry):Void) identifies a function that's invoked to report the current
    entry element. This variable defaults to null.

The Feed class extends the abstract Atom class (inheriting members for accessing the newsfeed's
base URI, for accessing the factory that's creating objects, and more), which extends Base.

Feed also provides the following variables for accessing feed-oriented
(non-entry-specific) content:

  • authors (of type Person[]) identifies the authors (in terms of email address, name, additional
    person-specific text, and the Internationalized Resource Identifier (IRI) associated with the person) of this
    feed.
  • categories (of type Category[]) identifies the categories (in terms of a human-readable label,
    category name, and categorization scheme IRI) to which this feed belongs.
  • contributors (of type Person[]) identifies the persons who have contributed to this
    feed.
  • generator (of type Generator) identifies the program (in terms of human-readable name, program URI,
    and program version number) that was used to generate this feed. This information can be used to debug an
    Atom newsfeed.
  • icon (of type Id) identifies this feed's iconic image (in terms of a URI to the
    image).
  • id (of type Id) specifies a universally unique and a permanent identifier (in terms of a URI) for
    this feed.
  • links (of type Link[]) specifies links (in terms of href,
    hreflang, length, rel, title, and type
    XML attributes, and text associated with the link) from this feed to Web resources.
  • logo (of type Id) identifies this feed's non-iconic image.
  • rights (of type Content) specifies the rights (in terms of src,
    text, and type XML attributes) held in and over this feed.
  • subtitle (of type Content) provides this feed's subtitle.
  • title (of type Content) provides this feed's title.
  • updated (of type Date) specifies when this feed's content was last changed.

As with Feed, the Entry class, which describes one of the feed's
entry elements, extends Atom. In addition to sharing most of the same variables as
Feed, Entry provides the following unique variables:

  • content (of type Content) specifies this entry's content.
  • published (of type Date) specifies when this entry was published.
  • source (of type Feed) identifies this entry's feed source.
  • summary (of type Content) specifies a short summary, abstract, or excerpt for this
    entry.

I've created an AtomDemo NetBeans project for demonstrating AtomTask. This project's
Main.fx source code is very similar to RSSDemo's Main.fx source code.

/*
* Main.fx
*/

package atomdemo;

import java.lang.Exception;

import javafx.data.feed.atom.AtomTask;
import javafx.data.feed.atom.Entry;
import javafx.data.feed.atom.Feed;

import javafx.data.pull.Event;

def MAX_POLLS = 3;

var counter = 0;

def task:AtomTask = AtomTask
{
    interval: 15s

    // The following location demonstrates a basic Atom newsfeed.

    location: "http://photos.dailycamera.com/hack/feed.mg?Type=gallery&Data=9573834_9ysrR&format=atom10"
   
    // The following location demonstrates onForeignEvent().

    // location: "http://blogsearch.google.com/blogsearch/feeds?bc_lang=en&hl=en&output=atom"

    // The following location demonstrates IllegalArgumentException (must use
    // RssTask for RSS feeds).

    // location: "http://javajeff.mb.ca/rss/javajeff.xml"

    onStart: function (): Void
    {
        println ("Task is starting");

        if (++counter > MAX_POLLS)
        {
            task.stop ();
            FX.exit ()
        }
    }

    onFeed: function (f: Feed): Void
    {
        println ("Feed: {f}")
    }

    onEntry: function (e: Entry): Void
    {
        println ("Entry: {e}")
    }

    onException: function (e: Exception): Void
    {
        println ("Exception: {e}");
        task.stop ();
        FX.exit ()
    }

    onForeignEvent: function (e: Event): Void
    {
        println ("Event: {e}")
    }

    onDone: function (): Void
    {
        println ("Completed poll #{counter}")
    }
}
task.start ()

This simple framework serves as a starting point for exploring the Atom API. Consider expanding onFeed() and
onEntry() to output the values of their Feed and Entry arguments' various variables.

Behind the Scenes with FeedTask

The important task of polling an RSS or Atom newsfeed occurs in FeedTask and a related class. I recently
decompiled these classes to explore how newsfeeds are polled, and share my findings in this section to deepen your
understanding of RssTask and AtomTask.

FeedTask creates an instance of the java.util.Timer class in its static initializer. This instance
starts a background thread and works with an instance of FeedTask's nested SubscriptionTask class
(a java.util.TimerTask subclass) to support newsfeed-polling.

FeedTask's overridden start() function schedules the SubscriptionTask instance for
execution by invoking Timer's public void schedule(TimerTask task, long delay, long period) method
with the following arguments:

  • The SubscriptionTask instance is passed to task.
  • The long integer 0L is passed to delay.
  • The value of FeedTask's interval variable is passed to period.

Approximately every period milliseconds, the SubscriptionTask instance's

public void 
run()
method is invoked. This method invokes the SubscriptionTask-specific doPoll() method
with a true argument.

The doPoll() method first clears FeedTask's inherited started, stopped,
failed, and done Boolean variables to false. It also nulls out the inherited
causeOfFailure variable, and assigns -1 to the inherited progress and
maxProgress variables.

doPoll() next instantiates the javafx.io.http.HttpRequest class, which is the vehicle used to
obtain newsfeed content, and initializes the following HttpRequest variables prior to executing this task:

  • location: The value of FeedTask's location variable is assigned to this variable.
  • onStarted: A function is assigned to this variable, and is invoked when the request starts to execute. The
    function is responsible for invoking onStart().
  • onResponseHeaders: A function is assigned to this variable to retrieve and save the values of the HTTP
    ETag and Last-Modified response headers. These values are needed to ensure that only
    changed newsfeed content will be returned in the next poll request.
  • onToRead: A function is assigned to this variable to obtain the total number of bytes to read, which is assigned
    to maxProgress.
  • onRead: A function is assigned to this variable to obtain the number of bytes read so far, which is assigned to
    progress.
  • onInput: A function is assigned to this variable to parse the request content via an internal
    parse(is) method call (where is is onInput()'s java.io.InputStream
    argument). If parsing results in a thrown exception, the exception object is assigned to causeOfFailure,
    true is assigned to failed, and onException() is invoked. Finally, true
    is assigned to done, and onDone() is invoked. (The onInput() function isn't invoked,
    and hence onDone() isn't invoked, when only changed content is requested but that content isn't available.)
  • onException: A function is assigned to this variable to report a problem with the request itself (and not
    parsing). If the request fails, the exception object is assigned to causeOfFailure, true is
    assigned to failed, and onException() is invoked.

Continuing, doPoll() ensures that only updated newsfeed content is returned by setting the request's
If-Modified-Since and If-None-Match headers to the previously saved
Last-Modified and ETag values, respectively.


Obtaining a newsfeed's updated versus entire content
When true is passed to doPoll(), which happens when this method is called from
SubscriptionTask's run() method or FeedTask's poll() function,
doPoll() sets If-Modified-Since and If-None-Match so that only updated content
is returned. In contrast, when you invoke FeedTask's update() function, which invokes
doPoll() with a false argument, those request headers will not be set, and the entire content will
be returned.
doPoll() now iterates over FeedTask's headers variable, assigning each stored
HttpHeader instance to the HttpRequest instance by invoking the latter instance's
setHeader() function.

Finally, doPoll() invokes the HttpRequest instance's start() function to execute this
task, resulting in retrieved and parsed content. doPoll() then returns to the run() method. If it
throws an exception, run() invokes onException().


A parsing tidbit
For brevity, I don't discuss parsing beyond the parse(is) method call. However, if you decide to explore the
parsing implementation, here's a tidbit to save you some head-scratching: The parse(InputStream) method
initializes the javafx.data.pull.PullParser instance's impl_skippedElements variable to the
qualified names of Atom's summary, content, rights,
title, and subtitle elements, and RSS's description,
title, and copyright elements, to ensure that the parser treats any HTML or other markup
that's embedded in these elements as literal text.

At some point, you'll probably invoke FeedTask's overridden stop() function. This function invokes
the SubscriptionTask instance's inherited public boolean cancel() method to cancel the
newsfeed-polling task (but not kill the Timer instance's background thread).

Conclusion

Enough theory! Now that you've gained knowledge of JavaFX's RSS and Atom APIs, you might want to create your own newsfeed
reader. To help you with this task, I present a practical example that handles RSS and Atom newsfeeds in my forthcoming
companion to this article.

Resources


width="1" height="1" border="0" alt=" " />
Jeff Friesen is a freelance software developer and educator specializing in Java technology. Check out his site at javajeff.mb.ca.
Related Topics >> GUI   |   Programming   |   Web Services and XML   |   Featured Article   |