Skip to main content

Managing Timed Tasks Within a Cluster Utilizing The StopLight Framework

June 16, 2005

{cs.r.title}









Contents
Towards a Solution
Tasks
Task Management
Task Monitoring
   Task Monitoring Strategy
Detection and Management
of Dead Tasks
How Does it All Work?
Configuring and Using
the StopLight Framework
Conclusion
Resources

The increase in demand for large-scale, enterprise application
solutions has led to the development of application clustering
techniques and technologies. Clustering applications across
multiple servers provides applications with the ability to handle
large volumes of traffic, and performance can be increased by adding
additional servers to the cluster. In addition to providing
scalability, application clusters make the system more robust by
allowing for automatic system fail over when a server fails. This
way, when one server goes down, the application continues to run,
albeit with slightly decreased performance. While it is true that
the current generation of application servers makes it relatively
pain-free to create a cluster, there are still several significant,
if often overlooked, design issues that must be taken into account
once a system is clustered.

Perhaps the most significant of these issues is how to handle
recurring tasks that should not execute concurrently. Scheduled or
recurring tasks are used to execute procedures that need to run at
certain fixed times or at fixed intervals. Typical examples of
scheduled tasks are report generation tasks and tasks that send
data to external systems that are only available within a certain
timeframe. In order to understand why clustering affects the
application's design in regards to the handling of scheduled tasks,
it is useful to consider an example.

For this article, a generic e-commerce web
application will be used as an example. In order to allow
management to analyze sales trends, profits, inventory, etc., the
system has been set up to periodically compile a set of reports and
email them to management. Clearly, management does not want to
receive multiple emails containing the same reports; yet, this is
what will happen if the application contains a basic scheduled task
and then the application is clustered. When the appointed time to
run the report comes up, all machines in the cluster will generate
the same report and send it to management. This can be seen
visually in Figure 1.

Concurrently Executing Tasks
Figure 1. Concurrently executing tasks

Obviously, this is not the desired outcome. What is needed is
something that instead allows only a single task within the cluster
to execute, while still retaining the benefits of the cluster, such
as high availability and scalability.

Towards a Solution

Let's look at what features we would like to see in timed tasks
executing within a cluster. The "https://stoplight.dev.java.net">StopLight framework, a project
hosted on java.net, addresses the issue of managing clustered tasks
by dividing the problem into four sections: tasks, task
management
, task monitoring, and heartbeat
monitoring
. The task management portion of the framework
provides for the registration and scheduling of tasks, while the
task monitoring portion of the framework is responsible, through
the use of the external semaphore, for determining if a given
instance of a task may execute at a given time. The
heartbeat monitoring portion of the framework is tasked with
determining if a particular server within the cluster is still
alive and running or if the server has failed. Figure 2 provides a
high-level view of the framework deployed into a cluster.

StopLight Deployment
Figure 2. StopLight deployment

Next, we'll examine the various components of the framework so
that we can then understand how those components work together
within the StopLight framework. Last, we'll discuss installing and
configuring StopLight.

Tasks

As a clustered task management framework, the definitions of
tasks are central to all functionality within the StopLight
framework. Tasks are defined in interfaces found in the
com.clarkrichey.stopLight.task package. The
Task interface is shown below.


public interface Task extends Runnable {
/** This is the viewable name for this Task
 *  @return The viewable name for this Task
 */
String getName();
    
/** Used to get the interval that should pass
 * between executions of this task.
 *  This interval is specified in milliseconds
 *  @return The interval between run times
 *  in milliseconds
 */
long getRunInterval();
  
/** Should be called to initialize a Task to its
 * base state. Must be called
 * before the Task is executed for the first time
 */
void initialize();
    
/** Used to cancel execution of the Task
 *  @returns true if the method was successfully cancelled
 *  @returns false if the Task couldn't be cancelled
 */
void cancel();    
    
/** Used to get a read-only view of Task
 *  information
 *  @returns an instance of TaskInfo containing
 *  the information for this
 *  Task
 */
TaskInfo getTaskInfo();
}

The Task interface defines the basic
characteristics of all tasks. Several methods are defined for the
purpose of Task identification, such as getName() and
getIcon(). The methods initialize() and
run() perform the work of the Task. The
initialize() method is guaranteed to be called only
once during the lifetime of a Task, when the task is
first registered with a TaskManager. The
run() method is executed when the Task is
scheduled to execute (as defined by the task's
runInterval property) and the instance of the
Task has been selected as the single instance of that
Task in the cluster to execute at this time. The
cancel() method is guaranteed to only be called once
during the lifetime of a Task, when the task is
removed from the TaskManager's list of
Tasks scheduled for execution.

The interface RestartableTask extends
Task in order define tasks that may be safely
restarted in the event that they terminate abnormally.
RestartableTask defines one additional method,
reset(). This method is called when a
Task has been terminated abnormally and is now being
resurrected and is eligible to be executed again. It is important
to note that the initialize() method is not called
when the Task is resurrected, so it falls upon the
reset() method to provide any initialization, as well
as any cleanup, that may be needed as a result of the abnormal
termination.

AbstractTask is an abstract class that is provided
as a convenience to developers in the creation of their own
concrete Tasks. AbstractTask provides
default implementations for all of the methods required by the Task
interface, with the exception of initialize(),
run(), and cancel(). The implementation
of those methods are left to the developers of concrete
Tasks. AbstractRestartableTask provides
the same convenience to developers of
RestartableTasks, requiring only the additional
implementation of the reset() method.

The code for a "Hello World" task that simply prints "Hello
World" along with the current time every time it is executed is
listed below. This provides a simple example of creating a task by
extending AbstractTask.


import com.clarkrichey.stopLight.task.*;
import java.util.Calendar;

public class HelloWorldTask extends AbstractTask{
    
    public HelloWorldTask() {
            this.description = "A simple task";
        this.name = "HelloWorldTask";
        // execute every 10 seconds
        this.runInterval = 10000;
        // no associated icon
        this.icon = null;
    }
    
    public void cancel() {
        // no need to do anything
    }
    
    public void initialize() {
        // no need to do anything
    }
    
    public void run() {
        Calendar now = Calendar.getInstance();
        System.out.println("Hello World! It's " +
                now);
    }
}

The BasicTask class is provided as an additional
convenience to developers. BasicTask extends
AbstractTask and is constructed by passing a
Runnable to its constructor along with a unique task
name. The BasicTask class delegates to the
run() method of its runnable when run()
is called. The BasicTask takes no action when either
cancel() or initialize() are called. If
the Task being deployed requires action to be taken
when these methods are invoked, then the use of the
BasicTask is not appropriate and it will be necessary
to either extend AbstractTask or directly implement
the Task interface.

Below is the code for the HelloWorld class, which functions
exactly the same way as the HelloWorldTask shown
above. However, instead of extending AbstractTask, the
HelloWorld class simply implements
Runnable. This class can then be passed in to the
constructor for BasicTask, along with its run
interval.


public class HelloWorld implements Runnable {
    
    /** Creates a new instance of HelloWorld */
    public HelloWorld() {
    }
    
    public void run() {
        Calendar now = Calendar.getInstance();
        System.out.println("Hello World! It's " +
                now);
    }
}

Task Management

Classes directly responsible for the management of tasks are
found in the com.clarkrichey.stopLight.management
package. The TaskManager interface is listed below.
The TaskManager interface describes classes that are
responsible for scheduling the execution of tasks. This interface
contains methods for registering and for removing a
Task, as well a method for setting the monitoring
strategy to be used, and for retrieving an instance of a registered
Task. While the registerTask(),
getTask(), and getTasks() methods are
self-explanatory, the rest of the methods defined by this interface
require some explanation.

The removeTask() method will remove the specified
Task from the TaskManager's list of tasks
to be executed, but will not interrupt the Task if it
is currently executing. The removeTaskNow() method
will remove the specified Task from the
TaskManager's list of tasks to be executed, and will
terminate the Task's execution if it is currently
running.

The methods setTaskMonitoringStrategy() and
getTaskMonitoringStrategy() are used, respectively, for setting and getting the strategy to be used by the
TaskWrapper to determine if a Task has
terminated abnormally. While the TaskManager is
responsible for the scheduling of Tasks, the
determination of a Task's health is dictated by the
TaskMonitoringStrategy that is being used. Further
information on the TaskMonitoringStrategy can be found
in the "Task Monitor" section below.


public interface TaskManager
extends StopLightManagedComponent {
    
/** Used to register a Task with the TaskMonitor
 *  @param taskToRegister The Task that will
 *  be managed by the TaskMonitor
 *  @param taskName The unique name of the Task
 */
void registerTask(String taskName,
 Task taskToRegister);
    
/** Used to get a copy of the List of Tasks
 * being managed by this
 *  TaskManager
 *  @return The List of Tasks being managed by
 *  this TaskManager
 */
List<Task> getTasks();
    
/** Used to get a particular Task that was
 * registered with the TaskManager
 *  @param taskName The name of the Task to
 *  be retrieved
 *  @return The requested Task. Null if
 * the task is not found
 */
Task getTask(String taskName);
   
/** Used to remove a Task that we registered
 *  with this
 *  TaskManager. Running Tasks are allowed to 
 *  complete their execution
 *  @param taskToRemove The Task to be removed
 */
void removeTask(String taskName);
    
/** Used to remove a Task that we registered 
 *  with this TaskManager. Running Tasks 
 *  are terminated without being allowed to
 *  complete their execution
 *  @param taskToRemove The Task to be removed
 */
void removeTaskNow(String taskName);
    
/** Used to set the TaskMonitoringStrategy that
 *   will be used by the TaskManager to 
 *   determine if a Task is alive or not.
 *  Calling setTaskMonitoringStrategy will 
 *  replace any existing TaskMonitorStrategy 
 *  with the new TaskMonitorStrategy passed in
 *  @param s The TaskMonitoringStrategy
 *  to use
 */
void setTaskMonitoringStrategy(TaskMonitoringStrategy s);
    
/** Retrieves the current TaskMonitoringStrategy
 *  @return The current TaskMonitoringStrategy
 */
TaskMonitoringStrategy getTaskMonitoringStrategy();

BasicTaskManager is the default implementation of
TaskManager that is provided with the StopLight
framework. The BasicTaskManager utilizes a "http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ScheduledThreadPoolExecutor.html">
ScheduledThreadPoolExecutor
as the means of
scheduling the Tasks and a TaskWrapper
class in order to inject StopLight logic in between the
ScheduledThreadPoolExecutor and the actual running of
the Task. The details of how this process works is
discussed in greater detail in the "Task Monitoring Strategy" section
as well as in the "How Does it All Work?" section. While the
framework allows for other TaskManagers to be created,
the relatively simple nature of the TaskManager
interface and the BasicTaskManager implementation make
it much more likely that developers will want to create their own
TaskMonitors while reusing the default
BasicTaskManager.

The code snippet below illustrates the creation of a
BasicTask using the HelloWorld class illustrated
earlier and then the subsequent use of the BasicTaskManager to register
the task. The StopLightConfigurationManager has not
yet been discussed, and will be introduced later.


HelloWorld hello = new HelloWorld();
BasicTask myTask = new BasicTask(hello, 10000);
StopLightConfigurationManager manager = 
    StopLightConfigurationManager.getInstance();
TaskManager tm = manager.getTaskManager();
tm.registerTask(hello);

Task Monitoring

So far we have examined tasks and task management, two concepts
that are central to the StopLight framework. However, in order to
understand how the framework actually works, we need to take a look
at the task monitor as well. The classes directly involved in task
monitoring are found in the
com.clarkrichey.stopLight.taskMonitor package and are
illustrated in Figure 3 below. TaskMonitor classes, as
defined by the TaskMonitor interface, are responsible
for insuring that a given task is only executed on a single
instance of the clustered application at any given time. Put
differently, it is up to the TaskMonitor to make sure
that two instances of the same Task are never
executing at the same time. In order to achieve this, the
TaskMonitor contains methods for attempting to acquire
a lock on a task, as well as a method for releasing that lock and
another method for acquiring information on the current lock holder
for a particular Task. The StopLight framework ships
with an implementation of TaskMonitor,
DatabaseTaskMonitor, that uses a database to store
lock information. Information about configuring the
DatabaseTaskMonitor can be found in the section
entitled "Configuring and Using the StopLight Framework".

Figure 3
Figure 3: Task monitor classes (click for full-size
image)

Task Monitoring Strategy

A classic "http://today.java.net/pub/a/today/2004/10/29/patterns.html">Strategy
pattern
is used in order to determine the logic used by the
TaskWrapper in deciding the health of a task. The
TaskWrapper contains a reference to its
TaskMonitoringStrategy, along with a reference to the
Task it is wrapping. If the call made by the
TaskWrapper to the acquireLock() method
of its TaskMonitor returns false,
indicating that another instance of the wrapped Task
is already executing, the TaskWrapper uses its
TaskMonitoringStrategy in order to determine the
health of the Task that was reported to be executing
by the task monitor.

TaskMonitoringStrategy is an interface that defines
a single overloaded method, isAlive(). This overloaded
method takes as parameters MonitoringInformation,
TaskInfo and either a boolean or an
Exception as parameters and returns an enumeration
class, MonitoringResult. The
MonitoringInformation passed in to the method is
acquired from the TaskMonitor and the
TaskInfo is acquired from the Task
itself. When the overload method accepting a Boolean as a parameter
is called, that Boolean is the result of a call to the
isAlive() method of the HeartbeatLocator.
A true value indicates that the lock on the
Task is believed to be held by a living
Task, while a false result indicates that
it is believed that the lock on the Task is held by a
dead Task. When the overloaded method accepting an
Exception is called, the Exception passed
in is the Exception that was thrown by the
isAlive() method of the HeartbeatLocator.
More detail on the HeartbeatLocator can be found in
the next section.

Detection and Management of Dead Tasks

It is the responsibility of classes implementing the
HeartbeatLocator interface to detect Tasks that have
terminated abnormally during their run cycle. Classes implementing
HeartbeatLocator are used to attempt to determine if
an instance of a Task that is holding the execute lock
for a given Task type is still executing or if the
Task has terminated abnormally without releasing the
execute lock. The StopLight framework ships with a default
implementation of HeartbeatLocator that uses a servlet
for the purpose of determining Task "live-ness." This
HeartbeatLocatorServlet implementation and the other
classes in the HeartbeatLocator package are detailed in Figure 4.
The details of how and when heartbeat detection occurs are covered
in the next section.

<br "HeartbeatLocator Classes" />
Figure 4: HeartbeatLocator classes

How Does it All Work?

The functioning of the StopLight framework is best understood
through a series of sequence diagrams that illustrate how the
framework behaves for a variety of use cases. Figure 5 below
illustrates the standard case where a Task
successfully acquires the execution lock, executes, and then
releases the lock. The ScheduledThreadPoolExecutor
calls the run() method of the TaskWrapper
class that is used by the TaskManager to wrap the
underlying Task.

The purpose of the TaskWrapper is to inject
StopLight framework logic between the scheduled execution of the
Task as signaled by the
ScheduledThreadPoolExecutor and the actual execution
of the local instance of the Task. This is done to
ensure that only one instance of the Task is executing
at a given time within the cluster. The TaskWrapper
calls the acquireLock() method of the TaskMonitor in
order to attempt to acquire the execution lock for that
particular type of Task. The
TaskMonitor accesses the external semaphore (a
database, in the case of the default implementation) to determine if
the lock is available, and if it is available, to then acquire the
lock on behalf of the requestor and return true, as is
illustrated in this use case.

When the TaskWrapper receives the true
response from its call to acquireLock(), it then
proceeds to execute the wrapped Task's
run() method, allowing the Task to
execute. Once the run() method completes, the
TaskWrapper calls releaseLock() on the
TaskMonitor. The TaskMonitor then
accesses the external semaphore in order to release the lock for
that type of Task. Once this has completed, the
executing thread terminates. The process repeats when the
Task is next scheduled to run. Figure 5 shows this
process.

Figure 5
Figure 5. Task acquires lock and executes (click for full-size
image)

An alternative use case to the one above is that when the
Task attempts to execute, it is unable to do so because
another instance of that Task has already begun
executing on one of the other machines in the cluster. The sequence
of events that occur in this scenario is very similar to that
previously discussed, with the exception that the call to
acquireLock() returns false, since another
instance of the Task has already acquired the
lock.

When the TaskWrapper is unable to acquire the lock,
it needs to attempt to determine if it was unable to acquire the
lock because another instance of the Task was able to
acquire the lock before this instance, or if the lock was never
released from some previous execution due to the abnormal
termination of a running instance of the Task. In
order to make this determination, the TaskWrapper
calls getMonitoringInformation() on the
TaskMonitor in order to get all available information
about the current holder of the lock. The
java.net.InetAddress of the lock's current holder is
then extracted from the returned TaskInformation and
is passed along to the HeartbeatLocator in the call
isAlive(). The HeartbeatLocator then
attempts to ping the clustered application running at the specified
address. The results of this attempt are returned to the
TaskMonitor. This sequence of events is depicted in
Figure 6 below.

The TaskMonitor next passes the results from the
HeartbeatLocator, along with the
TaskInformation returned by the
TaskMonitor, to the isAlive() method of
the TaskMonitoringStrategy. The business logic
embedded in the TaskMonitoringStrategy uses this
information to determine how to proceed. It may return one of three
possible results: ISDEAD, ISALIVE, or
RETRY. If RETRY is returned, then this
entire process will repeat from the point where the
TaskWrapper attempts to acquire the lock. If
ISALIVE is returned, then the executing thread
terminates and the process repeats when the Task is
next scheduled to run. The consequences of ISDEAD
being returned are explored in the next paragraph. The default
implementation of TaskMonitoringStrategy returns
ISALIVE if the HeartbeatLocator returned
true and ISDEAD is the HeartbeatLocator
returned false. Figure 6 illustrates how this works.

Figure 6
Figure 6. Task fails to acquire lock (click for full-size
image)

If the use case described above occurred not as a result of
another task acquiring the lock prior to the
acquireLock() method being called locally, but rather
as the result of the lock not being released due to the abnormal
termination of a previously executing Task, then the
sequence of events depicted in Figure 7 would occur. This sequence
is identical to the sequence of events described above up to the
point where the TaskMonitoringStrategy returns
ISDEAD. Once ISDEAD is returned, then
there are two possible outcomes. First, if the Task
that we are attempting to execute is not an instance of a
RestartableTask, then the executing thread terminates
and the process repeats when the Task is next
scheduled to run.

However, if the Task is a
RestartableTask, then the following sequence of events
occurs, as illustrated in steps 6-8 in Figure 7. First, the
releaseLock() method of TaskMonitor is
invoked in order to release the lock that was left when the running
Task terminated abnormally. Next, the
reset() method of the RestartableTask is called in
order to allow the RestartableTask the opportunity to
clean up any artifacts that may have been left over when the
previously running task terminated abnormally. Lastly, the entire
process of attempting to acquire the lock begins again (as
illustrated in the previous use cases) in order to ensure that only
one instance of the Task runs concurrently.

Figure 7
Figure 7. Task fails to acquire lock due to dead task (click for
full-size image)

Configuring and Using the StopLight Framework

The StopLight framework is designed to be used with only minor
setup and configuration. In order to use the framework, follow the
following steps:

  1. Place the StopLight.jar file in your application's
    classpath.
  2. Create the required databases tables. The SQL scripts for
    creating the necessary tables are contained in the file
    StopLight.sql.
  3. Modify web.xml to declare the
    com.clarkrichey.stopLight.heartbeat.HeartbeatLocatorServlet
    class as a servlet that is loaded at startup.
  4. Modify web.xml to set heartbeatPath as an
    initialization parameter to the
    HeartbeatLocatorServlet. The value of the
    hearbeatPath parameter should be the context path of
    the web application, plus the URL pattern of the
    HeartbeatLocatorServlet.
  5. Create new or modify the default configuration files (described
    in detail below).

The StopLightConfigurationManager provides client
code with handles to all of the externally usable portions of the
StopLight framework. The StopLightConfigurationManager
class must have its initialize() method called on it
prior to any attempt to retrieve references to these objects.
Attempts to retrieve objects from the
StopLightConfigurationManager prior to invoking
initialize() will cause the initialize()
method to be called by the
StopLightConfigurationManager.

Once the StopLightConfigurationManager has been
initialized, Tasks may be registered with the
TaskManager that is made available via the
StopLightConfigurationManager. Once a
Task is registered with the TaskManager,
no further steps are typically necessary.

StopLight is packaged with two default configuration files.
First, the default configuration file,
StopLightConfiguration.xml, is located in the
com/clarkrichey/stopLight directory within the
StopLight.jar. This configuration file defines the
implementation classes that will be loaded by
ConfigurationManager for the TaskManager,
TaskMonitor, HeartbeatLocator, and
MonitoringStrategy interfaces. The second
configuration file, DatabaseMonitorConfiguration.xml, is
located in the com/clarkrichey/stopLight/monitoring/database
directory within StopLight.jar. This configuration file
defines the connection string, user name, password, and database
driver class required by the DatabaseTaskMonitor.

In order to run StopLight with different implementation classes,
either modify the StopLightConfiguration.xml file that is
packaged with the .jar, or create a new configuration file. If a new
StopLightConfiguration.xml file is created, it must be named
StopLightConfiguration.xml but it may be placed at any
arbitrary location that is accessible from the application at run
time. In order to tell the StopLight framework to use the alternate
configuration file, set the StopLightConfigDir system
property to the absolute path to the directory containing the
alternate configuration file. It is important to note that all
values present in the StopLightConfiguration.xml file must
be present in the alternate configuration file, even if only one
value has changed.

If the DatabaseTaskMonitor class that is provided
as the default TaskMonitor implementation is to be
used, then the default DatabaseMonitorConfiguration.xml file
must be modified or an alternate one created. The process for
creating a new DatabaseMonitorConfiguration.xml file is the
same as the process described for replacing the
StopLightConfiguration.xml file.

The only development necessary for using the StopLight framework
is the creation of the Task classes that will be
managed by the framework. The creation of new tasks can be done
through any of the means described previously in the "Tasks" section,
such as by extending AbstractTask or creating a
BasicTask.

Conclusion

While the StopLight framework provides a complete and extensible
set of services of managing tasks within a cluster, there is still
additional work that could be done on the framework. Additional
work on the StopLight framework is being conducted in the java.net
StopLight project,
which is released under the Lesser GNU Public License. Additional
documentation for the StopLight framework, particularly information
on extending the framework, may also be found at that site. The
following list enumerates some of the planned enhancements to the
framework.

  • Modification of the DatabaseTaskMonitor to support
    the use of a JNDI-based datasource for connection to the
    database.
  • Inclusion of a new implementation of the
    TaskManager interface to support a more robust task
    scheduling system, such as the "http://sourceforge.net/projects/quartz/">Quartz project.
    Quartz provides a much more robust and flexible mechanism for
    specifying when tasks should run, but it does not provide any
    mechanisms for preventing a task in a cluster from running on
    multiple servers concurrently

Resources

Clark D. Richey, Jr. is a Principal Consultant for Raba Technologies and the founder of JUGaccino, a MD based Java User's Group
Related Topics >> Programming   |