The Source for Java Technology Collaboration
User: Password:
Register | Login help    

Search

Online Books:
java.net on MarkMail:


 E-mail  Print

Clustering with the Shoal Framework

Tue, 2007-12-11

{cs.r.title}



Shoal is an open source, Java-based generic clustering framework. It can be used in your applications to add clustering functionalities like load balancing, fault tolerance, or both. Applications using Shoal can share data, communicate via messages with other cluster nodes across the network, and notify of relevant events like the joining, shutdown, and failure of a node or group of nodes. You can take appropriate measures and perform monitoring tasks when these events occur; Shoal forwards a signal to your code to track these notifications.

Shoal is the clustering framework used by the Glassfish project to implement its application server clustering. One of the benefits to your application is that Shoal abstracts away network details and the network communication API. Under the hood, the default group communication provider uses JXTA for peer-to-peer reliability and scalability.

Shoal is a lightweight component; you can embed Shoal not only in Java EE applications, but in SE applications too.

In this article, we'll cover the Shoal architecture and its basic concepts. Then we'll illustrate how to integrate your own code into the clustering infrastructure.

Learning Shoal's Basic Concepts and Architecture

We will now discuss the architecture of a Shoal cluster solution and the internal design of Shoal. This section also describes main components, such as Signals and the Distributed State Cache (DSC).

Understanding Shoal's Architecture

Shoal's main concept is the group, a virtual union of cluster nodes by a single name. Each group is composed of members. A member is uniquely identified by its member token. A token is just a String, usually a Global Unique ID (GUID), to avoid name collision between the nodes. Also, a member can be a spectator or a core, the difference being that when a core member goes down, all other members are notified of the failure. On the contrary, an spectator's failure is not a significant event that Shoal notifies other members about. Spectators are commonly used for cluster administration and/or monitoring applications.

A network can host many groups, but be aware that each group consumes additional bandwidth and processing resources.

The second important concept is the Group Management Service (GMS). It manages the network's groups by keeping track of the members, mediating and facilitating member cooperation and communication.

Shoal divides groups into functionalities called components. This division is up to the developer; by default Shoal doesn't impose any structure on the cluster. A group, for example, can have a transaction component and a monitoring component, each processed by one or two physical machines. The API allows the sending of signals to specific nodes that implement a particular component.

The last significant piece is the signal. Signals are events in the group's lifecycle, such as a new node joining the cluster or another one leaving it. Signals are covered in greater depth below.

Figure 1 illustrates the physical distribution of a example Shoal group.

A Shoal system's architecture
Figure 1. A Shoal system's architecture

Figure 1 shows three physical machines, with two running a JVM and executing a single instance of the application, and the third machine running two JVMs, each one executing one instance. Each application instance has loaded Shoal's GMS, which discovered the other peers and joined the group.

Note that all the machines are connected in the same physical network. The physical location is not a concern as long as multicast traffic traverses the relevant nodes.

Understanding Shoal's Design

Shoal is comprised of two main parts: the GMS Client API and the GMS Service Provider Interface (SPI). Your application interacts with the API, which in turn uses the SPI to talk to the underlying group communication protocol. The default SPI uses JXTA to provide group services.

Figure 2 illustrates the dynamics between Shoal's layers, your application, and the network:

Shoal impact on your application architecture
Figure 2. Shoal's impact on your application architecture

Calling Shoal's API, you can:

  • Emit cluster signals: join, failure, and shutdown are some examples. The signals are discussed in depth in the next section.
  • Send messages to other nodes or the whole group.
  • Share data through a Distributed State Cache (DSC) (discussed later).

The framework calls your code when signals arrive from any node on the group. For this to happen, you must register callback objects as described in the "Listening to the Group's Messages" section.

Understanding Shoal Lifecycle Signals

Each node in the group has the following built-in cluster lifecycle signals to both send and receive from other nodes:

  • Join: Shoal tells all members that a new node is joining the group.
  • Joined and ready: The recently joined member is ready to process requests.
  • Failure suspected: Shoal doesn't receive a heartbeat response from a member and it suspects the node has failed. When the failure is confirmed, it sends a failure notify signal.
  • Failure notify: The group tells all the members about the confirmed failure of a node.
  • Failure recovery: The group tells a node to take the place of a failed node. This node must have a previously registered as a failure recovery node.
  • Planned shutdown: Shoal sends a signal to the group notifying an administrative shutdown of a member.

Shoal can send other signals related to the domain of the application. Those are called message signals. Both signal types can arrive at any time and can encapsulate arbitrary data.

Taking Advantage of Shoal's Failure Recovery

The most interesting signals are the failure family, as they enable Shoal's recovery features. When a node fails and then tries to recover, Shoal brings up something called recovery failure fencing. This protects the node from getting inconsistent states and fences its recovery process. A virtual fence is raised for a member's identity inside the group. This is done to protect cluster operations from race conditions. When the fence is lowered for the identity, any member can communicate with the previously fenced one.

Sometimes another node wants to take the failed node's place. This is achieved by a node getting the failed member's identity. Let's make it clearer with an example.

When node A goes down, the other nodes in the group get notified via a FailureSuspectedSignal. After a timeout period a FailureNotifySignal is fired. If any of the other members in the group is a FailureRecoveryAction listener, then these nodes are candidates to substitute the failed one. Let's call one such node "node B."

At this point, the GMS master node notifies node B that it has to become a recovery delegate for the failed node. It sends the node a FailureRecoverySignal. Finally, if node B acknowledges that signal, it raises the fence, does the recovery work (such as taking over resources or tasks of node A), and lowers the fence. This takes us back to normal operation.

If the failed node goes up again, it must check whether the fence is raised for its member identity. If it's not fenced, and it wants to recover, it must raise a fence. After the recovery is finished, it lowers the fence again.

Understanding Distributed State Cache

Shoal also provides a data sharing mechanism for the group called the Distributed State Cache. Though not a full-blown distributed cache like JBossCache, it allows the nodes to share data.

The DSC is an associative data structure like a java.util.Map. The default implementation uses a node-local HashMap, and replicates the changes to the other nodes for better performance. Beware that it doesn't implement a Least Recently Used (LRU) or any other algorithm to eliminate unused elements. You can only put a serializable object into the map as a key, a value, or both.

In the next section, we'll cover how to integrate Shoal into your application.

Integrating Shoal into Your Application

Embedding Shoal into your application is straightforward; the time-consuming part is thinking about what clustering features you really need: load balancing, fault tolerance, or both. If you add something you don't need, the complexity grows and can jeopardize your project.

Begin by downloading the Shoal distribution from Shoal's java.net site. Unpack and copy both shoal.jar and jxta.jar to your application classpath. That's all you need besides a working network connection.

It's very useful to test with two instances of the application, so you can see how it responds to the other instance joining and failing, along with message passing. So let's start by creating a group and joining it.

Starting and Joining a Group

The first thing we need to create or join a group is a group name. The first node to start the Shoal framework creates the group with this name. If no other nodes for the same group on the network have started, this node becomes the master node. From now on, each node joining the group will acknowledge this node as a master.

If there's already a group created with the group name, the new node joins as a vanilla member.

First you must start a new GMSModule by calling the GMSFactory.startGMSModule() method, like the following code fragment does:


GroupManagementService gms = (GroupManagementService)
    GMSFactory.startGMSModule(serverName, groupName, memberType, props);

The method receives four parameters:

  • groupName identifies the group name to create if it doesn't exist, or to join in the case the group is already created.
  • serverName uniquely identifies this instance in the cluster. You can't create a node with a duplicate name in the cluster.
  • The memberType parameter is an enum characterizing the member's role: CORE or SPECTATOR, as discussed in the previous section.
  • The properties you send to start the GMS module are mostly configuration parameters for the underlying SPI, such as JXTA network configuration parameters. These parameters range from the multicast address and port to the discovery and failure timeout and retries.

In the case where a Shoal creates a new group, it will display a message like the following in your console:

7:33:14 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens INFO: GMS View Change Received: Members in view (before change analysis) are : 1: MemberId: node1, MemberType: CORE, Address: urn:jxta:uuid-4879342BEE684EC598A9B8741505B700A62767B3102B4D258A48FDBF1961AC0D03

This message tells us that a new core member, called node1, is a member of the group, and shows its unique JXTA identifier. If the group already exists, the message includes the other member JXTA identifiers.

After the group is created, you must call the join() method, as the next code snippet shows:


try {
   gms.join();
  } 
  catch (GMSException e) { 
     e.printStackTrace();
  } 

After join()ing, Shoal will output a message like the following:

Nov 29, 2007 7:33:14 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved INFO: Analyzing new membership snapshot received as part of event : MASTER_CHANGE_EVENT

In this case, the member becomes the master of the group, as it is the creator and only member at this moment.

Sending Messages

Once you are part of the group, you can send and receive messages to and from other members. Shoal exposes a GroupHandle object to achieve several tasks, one of which is to send message signals. Let's see how you can use the GroupHandle to send messages in the following fragment:


   gms.getGroupHandle().sendMessage(null, message.getBytes());
   

This method sends a byte array to all members. The first parameter is the component name; in this case the null means all components. The second is the byte array of a string. The GroupHandle can also send a message to all members, a specific subset, or just one member by using member tokens. See the GroupHandle Javadocs for more details.

Listening to the Group's Messages

To begin listening to the group member messages, you must register a callback with the framework. To do this, simply create a class implementing the CallBack interface. You have to implement just one method: processNotification(). The following code fragment shows an implementation:


  public void processNotification(Signal signal) {
   signal.acquire();
          if (signal instanceof MessageSignal) {
                          System.out.println(
                          ":Message Received from:" 
                          + signal.getMemberToken()
                          + ":[" + signal.toString() + "]");
           } else {
                          System.out.println(
                          ":Other Notification Received from:"
                          + signal.getMemberToken() + ":[" 
                          + signal.toString() + "]");
           }
    signal.release();
   }

The signal must be acquire()d and release()d to avoid race conditions. The Signal's getMemberToken() method returns the member token of the node that emitted the signal. It's very useful for responding only to the member who sent the signal.

Once you have the message handling code, you must register the CallBack with Shoal.You can register to receive custom messages directed to a particular component of the cluster by hooking up a MessageActionFactoryImpl and sending a componentName parameter.

 gms.addActionFactory(new MessageActionFactoryImpl(callBackClass), componentName);

This particular MessageActionFactory receives MessageSignals, the only non-lifecycle signals built into Shoal. The ActionFactory will call the instance of the CallBack object, every time a MessageSignal arrives directed to the componentName parameter.

Sharing Data using the DSC

To share data between nodes without using messages, you can use the DSC. Remember that no automatic reaping of the unused values occurs. The default implementation doesn't have any stale value reaping mechanism or a capacity limit, so you must be very cautious using the default DSC implementation. You can write another implementation to manage DSC according to the application requirements.

The DSC is accessed via the GroupHandle by invoking the getDistributedStateCache() method, as in the following code:

DistributedStateCache dsc = gms.getGroupHandle().getDistributedStateCache();

This gets a reference to the DSC. To add a value, use the add() method:

dsc.addToCache( componentName, memberTokenId,  key,  state);

There's an individual cache for each group component; you specify in which cache to store the state using the componentName parameter. The second parameter is the calling member's token. The third is the key you'll use to recover the state. Finally, the fourth parameter is the state to store in the cache.

To retrieve the stored value, use the get() method:

 Object o = dsc.getFromCache( componentName,  memberTokenId, key) ;

This retrieves the state, using the key to search it. Finally, to remove an object, use the remove() method:

 dsc.removeFromCache( componentName,  memberTokenId,  key) ;

This eliminates the state associated with the key from the DSC. .

Shutting Down

Hopefully, after doing some useful work, the node can shut down. The master node can shut down all members in the group. To achieve both you must call the shutdown() method of the gms object. Shutting down members or the whole group sends planned shutdown signals to their members.

Conclusion

Hopefully this article clarifies some of the less-documented aspects of Shoal and kickstarts new users of the framework.

Clustering gives you a solution to solve two important problems: fault tolerance and load balancing, a must for enterprise mission-critical applications. Load balancing offers much-needed scalability and fault tolerance means less downtime.

Shoal makes it easy for you to achieve both load balancing and fault tolerance characteristics in your application. However, the most commonly used scenario is group communication, including message passing and state caching. You can easily add cluster facilities to your application by integrating the Shoal framework into your code.

To see detailed characteristics and advanced uses of the framework, such as cross subnets groups, take a look at the examples and design documents included with Shoal.

Resources

Juan Pedro Danculovic lives in Buenos Aires, Argentina and works for the IT Architecture department of the country's biggest Health care service provider.
Related Topics >> Programming      
Comments
Comments are listed in date ascending order (oldest first)