Skip to main content

Managing Timed Tasks Within a Cluster Utilizing The StopLight Framework

June 16, 2005

{cs.r.title}




Contents
Towards a Solution
Tasks
Task Management
Task Monitoring
   Task Monitoring Strategy
Detection and Management
of Dead Tasks
How Does it All Work?
Configuring and Using
the StopLight Framework
Conclusion
Resources

The increase in demand for large-scale, enterprise application solutions has led to the development of application clustering techniques and technologies. Clustering applications across multiple servers provides applications with the ability to handle large volumes of traffic, and performance can be increased by adding additional servers to the cluster. In addition to providing scalability, application clusters make the system more robust by allowing for automatic system fail over when a server fails. This way, when one server goes down, the application continues to run, albeit with slightly decreased performance. While it is true that the current generation of application servers makes it relatively pain-free to create a cluster, there are still several significant, if often overlooked, design issues that must be taken into account once a system is clustered.

Perhaps the most significant of these issues is how to handle recurring tasks that should not execute concurrently. Scheduled or recurring tasks are used to execute procedures that need to run at certain fixed times or at fixed intervals. Typical examples of scheduled tasks are report generation tasks and tasks that send data to external systems that are only available within a certain timeframe. In order to understand why clustering affects the application's design in regards to the handling of scheduled tasks, it is useful to consider an example.

For this article, a generic e-commerce web application will be used as an example. In order to allow management to analyze sales trends, profits, inventory, etc., the system has been set up to periodically compile a set of reports and email them to management. Clearly, management does not want to receive multiple emails containing the same reports; yet, this is what will happen if the application contains a basic scheduled task and then the application is clustered. When the appointed time to run the report comes up, all machines in the cluster will generate the same report and send it to management. This can be seen visually in Figure 1.

Concurrently Executing Tasks
Figure 1. Concurrently executing tasks

Obviously, this is not the desired outcome. What is needed is something that instead allows only a single task within the cluster to execute, while still retaining the benefits of the cluster, such as high availability and scalability.

Towards a Solution

Let's look at what features we would like to see in timed tasks executing within a cluster. The StopLight framework, a project hosted on java.net, addresses the issue of managing clustered tasks by dividing the problem into four sections: tasks, task management, task monitoring, and heartbeat monitoring. The task management portion of the framework provides for the registration and scheduling of tasks, while the task monitoring portion of the framework is responsible, through the use of the external semaphore, for determining if a given instance of a task may execute at a given time. The heartbeat monitoring portion of the framework is tasked with determining if a particular server within the cluster is still alive and running or if the server has failed. Figure 2 provides a high-level view of the framework deployed into a cluster.

StopLight Deployment
Figure 2. StopLight deployment

Next, we'll examine the various components of the framework so that we can then understand how those components work together within the StopLight framework. Last, we'll discuss installing and configuring StopLight.

Tasks

As a clustered task management framework, the definitions of tasks are central to all functionality within the StopLight framework. Tasks are defined in interfaces found in the com.clarkrichey.stopLight.task package. The Task interface is shown below.


public interface Task extends Runnable {
/** This is the viewable name for this Task
 *  @return The viewable name for this Task
 */
String getName();
    
/** Used to get the interval that should pass
 * between executions of this task.
 *  This interval is specified in milliseconds
 *  @return The interval between run times
 *  in milliseconds
 */
long getRunInterval();
  
/** Should be called to initialize a Task to its
 * base state. Must be called
 * before the Task is executed for the first time
 */
void initialize();
    
/** Used to cancel execution of the Task
 *  @returns true if the method was successfully cancelled
 *  @returns false if the Task couldn't be cancelled
 */
void cancel();    
    
/** Used to get a read-only view of Task
 *  information
 *  @returns an instance of TaskInfo containing
 *  the information for this
 *  Task
 */
TaskInfo getTaskInfo();
}

The Task interface defines the basic characteristics of all tasks. Several methods are defined for the purpose of Task identification, such as getName() and getIcon(). The methods initialize() and run() perform the work of the Task. The initialize() method is guaranteed to be called only once during the lifetime of a Task, when the task is first registered with a TaskManager. The run() method is executed when the Task is scheduled to execute (as defined by the task's runInterval property) and the instance of the Task has been selected as the single instance of that Task in the cluster to execute at this time. The cancel() method is guaranteed to only be called once during the lifetime of a Task, when the task is removed from the TaskManager's list of Tasks scheduled for execution.

The interface RestartableTask extends Task in order define tasks that may be safely restarted in the event that they terminate abnormally. RestartableTask defines one additional method, reset(). This method is called when a Task has been terminated abnormally and is now being resurrected and is eligible to be executed again. It is important to note that the initialize() method is not called when the Task is resurrected, so it falls upon the reset() method to provide any initialization, as well as any cleanup, that may be needed as a result of the abnormal termination.

AbstractTask is an abstract class that is provided as a convenience to developers in the creation of their own concrete Tasks. AbstractTask provides default implementations for all of the methods required by the Task interface, with the exception of initialize(), run(), and cancel(). The implementation of those methods are left to the developers of concrete Tasks. AbstractRestartableTask provides the same convenience to developers of RestartableTasks, requiring only the additional implementation of the reset() method.

The code for a "Hello World" task that simply prints "Hello World" along with the current time every time it is executed is listed below. This provides a simple example of creating a task by extending AbstractTask.


import com.clarkrichey.stopLight.task.*;
import java.util.Calendar;

public class HelloWorldTask extends AbstractTask{
    
    public HelloWorldTask() {
            this.description = "A simple task";
        this.name = "HelloWorldTask";
        // execute every 10 seconds
        this.runInterval = 10000;
        // no associated icon
        this.icon = null;
    }
    
    public void cancel() {
        // no need to do anything
    }
    
    public void initialize() {
        // no need to do anything
    }
    
    public void run() {
        Calendar now = Calendar.getInstance();
        System.out.println("Hello World! It's " +
                now);
    }
}

The BasicTask class is provided as an additional convenience to developers. BasicTask extends AbstractTask and is constructed by passing a Runnable to its constructor along with a unique task name. The BasicTask class delegates to the run() method of its runnable when run() is called. The BasicTask takes no action when either cancel() or initialize() are called. If the Task being deployed requires action to be taken when these methods are invoked, then the use of the BasicTask is not appropriate and it will be necessary to either extend AbstractTask or directly implement the Task interface.

Below is the code for the HelloWorld class, which functions exactly the same way as the HelloWorldTask shown above. However, instead of extending AbstractTask, the HelloWorld class simply implements Runnable. This class can then be passed in to the constructor for BasicTask, along with its run interval.


public class HelloWorld implements Runnable {
    
    /** Creates a new instance of HelloWorld */
    public HelloWorld() {
    }
    
    public void run() {
        Calendar now = Calendar.getInstance();
        System.out.println("Hello World! It's " +
                now);
    }
}

Task Management

Classes directly responsible for the management of tasks are found in the com.clarkrichey.stopLight.management package. The TaskManager interface is listed below. The TaskManager interface describes classes that are responsible for scheduling the execution of tasks. This interface contains methods for registering and for removing a Task, as well a method for setting the monitoring strategy to be used, and for retrieving an instance of a registered Task. While the registerTask(), getTask(), and getTasks() methods are self-explanatory, the rest of the methods defined by this interface require some explanation.

The removeTask() method will remove the specified Task from the TaskManager's list of tasks to be executed, but will not interrupt the Task if it is currently executing. The removeTaskNow() method will remove the specified Task from the TaskManager's list of tasks to be executed, and will terminate the Task's execution if it is currently running.

The methods setTaskMonitoringStrategy() and getTaskMonitoringStrategy() are used, respectively, for setting and getting the strategy to be used by the TaskWrapper to determine if a Task has terminated abnormally. While the TaskManager is responsible for the scheduling of Tasks, the determination of a Task's health is dictated by the TaskMonitoringStrategy that is being used. Further information on the TaskMonitoringStrategy can be found in the "Task Monitor" section below.


public interface TaskManager
extends StopLightManagedComponent {
    
/** Used to register a Task with the TaskMonitor
 *  @param taskToRegister The Task that will
 *  be managed by the TaskMonitor
 *  @param taskName The unique name of the Task
 */
void registerTask(String taskName,
 Task taskToRegister);
    
/** Used to get a copy of the List of Tasks
 * being managed by this
 *  TaskManager
 *  @return The List of Tasks being managed by
 *  this TaskManager
 */
List<Task> getTasks();
    
/** Used to get a particular Task that was
 * registered with the TaskManager
 *  @param taskName The name of the Task to
 *  be retrieved
 *  @return The requested Task. Null if
 * the task is not found
 */
Task getTask(String taskName);
   
/** Used to remove a Task that we registered
 *  with this
 *  TaskManager. Running Tasks are allowed to 
 *  complete their execution
 *  @param taskToRemove The Task to be removed
 */
void removeTask(String taskName);
    
/** Used to remove a Task that we registered 
 *  with this TaskManager. Running Tasks 
 *  are terminated without being allowed to
 *  complete their execution
 *  @param taskToRemove The Task to be removed
 */
void removeTaskNow(String taskName);
    
/** Used to set the TaskMonitoringStrategy that
 *   will be used by the TaskManager to 
 *   determine if a Task is alive or not.
 *  Calling setTaskMonitoringStrategy will 
 *  replace any existing TaskMonitorStrategy 
 *  with the new TaskMonitorStrategy passed in
 *  @param s The TaskMonitoringStrategy
 *  to use
 */
void setTaskMonitoringStrategy(TaskMonitoringStrategy s);
    
/** Retrieves the current TaskMonitoringStrategy
 *  @return The current TaskMonitoringStrategy
 */
TaskMonitoringStrategy getTaskMonitoringStrategy();

BasicTaskManager is the default implementation of TaskManager that is provided with the StopLight framework. The BasicTaskManager utilizes a ScheduledThreadPoolExecutor as the means of scheduling the Tasks and a TaskWrapper class in order to inject StopLight logic in between the ScheduledThreadPoolExecutor and the actual running of the Task. The details of how this process works is discussed in greater detail in the "Task Monitoring Strategy" section as well as in the "How Does it All Work?" section. While the framework allows for other TaskManagers to be created, the relatively simple nature of the TaskManager interface and the BasicTaskManager implementation make it much more likely that developers will want to create their own TaskMonitors while reusing the default BasicTaskManager.

The code snippet below illustrates the creation of a BasicTask using the HelloWorld class illustrated earlier and then the subsequent use of the BasicTaskManager to register the task. The StopLightConfigurationManager has not yet been discussed, and will be introduced later.


HelloWorld hello = new HelloWorld();
BasicTask myTask = new BasicTask(hello, 10000);
StopLightConfigurationManager manager = 
    StopLightConfigurationManager.getInstance();
TaskManager tm = manager.getTaskManager();
tm.registerTask(hello);

Task Monitoring

So far we have examined tasks and task management, two concepts that are central to the StopLight framework. However, in order to understand how the framework actually works, we need to take a look at the task monitor as well. The classes directly involved in task monitoring are found in the com.clarkrichey.stopLight.taskMonitor package and are illustrated in Figure 3 below. TaskMonitor classes, as defined by the TaskMonitor interface, are responsible for insuring that a given task is only executed on a single instance of the clustered application at any given time. Put differently, it is up to the TaskMonitor to make sure that two instances of the same Task are never executing at the same time. In order to achieve this, the TaskMonitor contains methods for attempting to acquire a lock on a task, as well as a method for releasing that lock and another method for acquiring information on the current lock holder for a particular Task. The StopLight framework ships with an implementation of TaskMonitor, DatabaseTaskMonitor, that uses a database to store lock information. Information about configuring the DatabaseTaskMonitor can be found in the section entitled "Configuring and Using the StopLight Framework".

Figure 3
Figure 3: Task monitor classes (click for full-size image)

Task Monitoring Strategy

A classic Strategy pattern is used in order to determine the logic used by the TaskWrapper in deciding the health of a task. The TaskWrapper contains a reference to its TaskMonitoringStrategy, along with a reference to the Task it is wrapping. If the call made by the TaskWrapper to the acquireLock() method of its TaskMonitor returns false, indicating that another instance of the wrapped Task is already executing, the TaskWrapper uses its TaskMonitoringStrategy in order to determine the health of the Task that was reported to be executing by the task monitor.

TaskMonitoringStrategy is an interface that defines a single overloaded method, isAlive(). This overloaded method takes as parameters MonitoringInformation, TaskInfo and either a boolean or an Exception as parameters and returns an enumeration class, MonitoringResult. The MonitoringInformation passed in to the method is acquired from the TaskMonitor and the TaskInfo is acquired from the Task itself. When the overload method accepting a Boolean as a parameter is called, that Boolean is the result of a call to the isAlive() method of the HeartbeatLocator. A true value indicates that the lock on the Task is believed to be held by a living Task, while a false result indicates that it is believed that the lock on the Task is held by a dead Task. When the overloaded method accepting an Exception is called, the Exception passed in is the Exception that was thrown by the isAlive() method of the HeartbeatLocator. More detail on the HeartbeatLocator can be found in the next section.

Detection and Management of Dead Tasks

It is the responsibility of classes implementing the HeartbeatLocator interface to detect Tasks that have terminated abnormally during their run cycle. Classes implementing HeartbeatLocator are used to attempt to determine if an instance of a Task that is holding the execute lock for a given Task type is still executing or if the Task has terminated abnormally without releasing the execute lock. The StopLight framework ships with a default implementation of HeartbeatLocator that uses a servlet for the purpose of determining Task "live-ness." This HeartbeatLocatorServlet implementation and the other classes in the HeartbeatLocator package are detailed in Figure 4. The details of how and when heartbeat detection occurs are covered in the next section.

HeartbeatLocator Classes
Figure 4: HeartbeatLocator classes

How Does it All Work?

The functioning of the StopLight framework is best understood through a series of sequence diagrams that illustrate how the framework behaves for a variety of use cases. Figure 5 below illustrates the standard case where a Task successfully acquires the execution lock, executes, and then releases the lock. The ScheduledThreadPoolExecutor calls the run() method of the TaskWrapper class that is used by the TaskManager to wrap the underlying Task.

The purpose of the TaskWrapper is to inject StopLight framework logic between the scheduled execution of the Task as signaled by the ScheduledThreadPoolExecutor and the actual execution of the local instance of the Task. This is done to ensure that only one instance of the Task is executing at a given time within the cluster. The TaskWrapper calls the acquireLock() method of the TaskMonitor in order to attempt to acquire the execution lock for that particular type of Task. The TaskMonitor accesses the external semaphore (a database, in the case of the default implementation) to determine if the lock is available, and if it is available, to then acquire the lock on behalf of the requestor and return true, as is illustrated in this use case.

When the TaskWrapper receives the true response from its call to acquireLock(), it then proceeds to execute the wrapped Task's run() method, allowing the Task to execute. Once the run() method completes, the TaskWrapper calls releaseLock() on the TaskMonitor. The TaskMonitor then accesses the external semaphore in order to release the lock for that type of Task. Once this has completed, the executing thread terminates. The process repeats when the Task is next scheduled to run. Figure 5 shows this process.

Figure 5
Figure 5. Task acquires lock and executes (click for full-size image)

An alternative use case to the one above is that when the Task attempts to execute, it is unable to do so because another instance of that Task has already begun executing on one of the other machines in the cluster. The sequence of events that occur in this scenario is very similar to that previously discussed, with the exception that the call to acquireLock() returns false, since another instance of the Task has already acquired the lock.

When the TaskWrapper is unable to acquire the lock, it needs to attempt to determine if it was unable to acquire the lock because another instance of the Task was able to acquire the lock before this instance, or if the lock was never released from some previous execution due to the abnormal termination of a running instance of the Task. In order to make this determination, the TaskWrapper calls getMonitoringInformation() on the TaskMonitor in order to get all available information about the current holder of the lock. The java.net.InetAddress of the lock's current holder is then extracted from the returned TaskInformation and is passed along to the HeartbeatLocator in the call isAlive(). The HeartbeatLocator then attempts to ping the clustered application running at the specified address. The results of this attempt are returned to the TaskMonitor. This sequence of events is depicted in Figure 6 below.

The TaskMonitor next passes the results from the HeartbeatLocator, along with the TaskInformation returned by the TaskMonitor, to the isAlive() method of the TaskMonitoringStrategy. The business logic embedded in the TaskMonitoringStrategy uses this information to determine how to proceed. It may return one of three possible results: ISDEAD, ISALIVE, or RETRY. If RETRY is returned, then this entire process will repeat from the point where the TaskWrapper attempts to acquire the lock. If ISALIVE is returned, then the executing thread terminates and the process repeats when the Task is next scheduled to run. The consequences of ISDEAD being returned are explored in the next paragraph. The default implementation of TaskMonitoringStrategy returns ISALIVE if the HeartbeatLocator returned true and ISDEAD is the HeartbeatLocator returned false. Figure 6 illustrates how this works.

Figure 6
Figure 6. Task fails to acquire lock (click for full-size image)

If the use case described above occurred not as a result of another task acquiring the lock prior to the acquireLock() method being called locally, but rather as the result of the lock not being released due to the abnormal termination of a previously executing Task, then the sequence of events depicted in Figure 7 would occur. This sequence is identical to the sequence of events described above up to the point where the TaskMonitoringStrategy returns ISDEAD. Once ISDEAD is returned, then there are two possible outcomes. First, if the Task that we are attempting to execute is not an instance of a RestartableTask, then the executing thread terminates and the process repeats when the Task is next scheduled to run.

However, if the Task is a RestartableTask, then the following sequence of events occurs, as illustrated in steps 6-8 in Figure 7. First, the releaseLock() method of TaskMonitor is invoked in order to release the lock that was left when the running Task terminated abnormally. Next, the reset() method of the RestartableTask is called in order to allow the RestartableTask the opportunity to clean up any artifacts that may have been left over when the previously running task terminated abnormally. Lastly, the entire process of attempting to acquire the lock begins again (as illustrated in the previous use cases) in order to ensure that only one instance of the Task runs concurrently.

Figure 7
Figure 7. Task fails to acquire lock due to dead task (click for full-size image)

Configuring and Using the StopLight Framework

The StopLight framework is designed to be used with only minor setup and configuration. In order to use the framework, follow the following steps:

  1. Place the StopLight.jar file in your application's classpath.
  2. Create the required databases tables. The SQL scripts for creating the necessary tables are contained in the file StopLight.sql.
  3. Modify web.xml to declare the com.clarkrichey.stopLight.heartbeat.HeartbeatLocatorServlet class as a servlet that is loaded at startup.
  4. Modify web.xml to set heartbeatPath as an initialization parameter to the HeartbeatLocatorServlet. The value of the hearbeatPath parameter should be the context path of the web application, plus the URL pattern of the HeartbeatLocatorServlet.
  5. Create new or modify the default configuration files (described in detail below).

The StopLightConfigurationManager provides client code with handles to all of the externally usable portions of the StopLight framework. The StopLightConfigurationManager class must have its initialize() method called on it prior to any attempt to retrieve references to these objects. Attempts to retrieve objects from the StopLightConfigurationManager prior to invoking initialize() will cause the initialize() method to be called by the StopLightConfigurationManager.

Once the StopLightConfigurationManager has been initialized, Tasks may be registered with the TaskManager that is made available via the StopLightConfigurationManager. Once a Task is registered with the TaskManager, no further steps are typically necessary.

StopLight is packaged with two default configuration files. First, the default configuration file, StopLightConfiguration.xml, is located in the com/clarkrichey/stopLight directory within the StopLight.jar. This configuration file defines the implementation classes that will be loaded by ConfigurationManager for the TaskManager, TaskMonitor, HeartbeatLocator, and MonitoringStrategy interfaces. The second configuration file, DatabaseMonitorConfiguration.xml, is located in the com/clarkrichey/stopLight/monitoring/database directory within StopLight.jar. This configuration file defines the connection string, user name, password, and database driver class required by the DatabaseTaskMonitor.

In order to run StopLight with different implementation classes, either modify the StopLightConfiguration.xml file that is packaged with the .jar, or create a new configuration file. If a new StopLightConfiguration.xml file is created, it must be named StopLightConfiguration.xml but it may be placed at any arbitrary location that is accessible from the application at run time. In order to tell the StopLight framework to use the alternate configuration file, set the StopLightConfigDir system property to the absolute path to the directory containing the alternate configuration file. It is important to note that all values present in the StopLightConfiguration.xml file must be present in the alternate configuration file, even if only one value has changed.

If the DatabaseTaskMonitor class that is provided as the default TaskMonitor implementation is to be used, then the default DatabaseMonitorConfiguration.xml file must be modified or an alternate one created. The process for creating a new DatabaseMonitorConfiguration.xml file is the same as the process described for replacing the StopLightConfiguration.xml file.

The only development necessary for using the StopLight framework is the creation of the Task classes that will be managed by the framework. The creation of new tasks can be done through any of the means described previously in the "Tasks" section, such as by extending AbstractTask or creating a BasicTask.

Conclusion

While the StopLight framework provides a complete and extensible set of services of managing tasks within a cluster, there is still additional work that could be done on the framework. Additional work on the StopLight framework is being conducted in the java.net StopLight project, which is released under the Lesser GNU Public License. Additional documentation for the StopLight framework, particularly information on extending the framework, may also be found at that site. The following list enumerates some of the planned enhancements to the framework.

  • Modification of the DatabaseTaskMonitor to support the use of a JNDI-based datasource for connection to the database.
  • Inclusion of a new implementation of the TaskManager interface to support a more robust task scheduling system, such as the Quartz project. Quartz provides a much more robust and flexible mechanism for specifying when tasks should run, but it does not provide any mechanisms for preventing a task in a cluster from running on multiple servers concurrently

Resources

Clark D. Richey, Jr. is a Principal Consultant for Raba Technologies and the founder of JUGaccino, a MD based Java User's Group
Related Topics >> Programming   |