Skip to main content

Clustering with the Shoal Framework

December 11, 2007


Shoal is an open
source, Java-based generic clustering framework. It can be used in
your applications to add clustering functionalities like load
balancing, fault tolerance, or both. Applications using Shoal can
share data, communicate via messages with other cluster nodes across the
network, and notify of relevant events like the joining,
shutdown, and failure of a node or group of nodes. You can take
appropriate measures and perform monitoring tasks when these events
occur; Shoal forwards a signal to your code to track these

Shoal is the clustering framework used by the "">Glassfish project to
implement its application server clustering. One of the benefits to
your application is that Shoal abstracts away network details and
the network communication API. Under the hood, the default group
communication provider uses "">JXTA for peer-to-peer reliability
and scalability.

Shoal is a lightweight component; you can embed Shoal not only
in Java EE applications, but in SE applications too.

In this article, we'll cover the Shoal architecture and its
basic concepts. Then we'll illustrate how to integrate your own
code into the clustering infrastructure.

Learning Shoal's Basic Concepts and Architecture

We will now discuss the architecture of a Shoal cluster solution
and the internal design of Shoal. This section also describes main
components, such as Signals and the Distributed State Cache

Understanding Shoal's Architecture

Shoal's main concept is the group, a virtual union of
cluster nodes by a single name. Each group is composed of
members. A member is uniquely identified by its member
. A token is just a String, usually a Global Unique ID
, to avoid name collision between the nodes. Also, a member
can be a spectator or a core, the difference being
that when a core member goes down, all other members are notified
of the failure. On the contrary, an spectator's failure is not a
significant event that Shoal notifies other members about.
Spectators are commonly used for cluster administration and/or
monitoring applications.

A network can host many groups, but be aware that each group
consumes additional bandwidth and processing resources.

The second important concept is the Group Management Service
. It manages the network's groups by keeping track of the
members, mediating and facilitating member cooperation and

Shoal divides groups into functionalities called
components. This division is up to the developer; by default
Shoal doesn't impose any structure on the cluster. A group, for
example, can have a transaction component and a monitoring
component, each processed by one or two physical machines. The API
allows the sending of signals to specific nodes that implement a
particular component.

The last significant piece is the signal. Signals are
events in the group's lifecycle, such as a new node joining the
cluster or another one leaving it. Signals are covered in greater depth below.

Figure 1 illustrates the physical distribution of a example
Shoal group.

<br "A Shoal system's architecture" />
Figure 1. A Shoal system's architecture

Figure 1 shows three physical machines, with two running a JVM
and executing a single instance of the application, and the third
machine running two JVMs, each one executing one instance. Each
application instance has loaded Shoal's GMS, which discovered the
other peers and joined the group.

Note that all the machines are connected in the same physical
network. The physical location is not a concern as long as
multicast traffic traverses the relevant nodes.

Understanding Shoal's Design

Shoal is comprised of two main parts: the GMS Client API and the
GMS Service Provider Interface (SPI). Your application interacts
with the API, which in turn uses the SPI to talk to the underlying
group communication protocol. The default SPI uses JXTA to provide
group services.

Figure 2 illustrates the dynamics between Shoal's layers, your
application, and the network:

<br "Shoal impact on your application architecture" />
Figure 2. Shoal's impact on your application architecture

Calling Shoal's API, you can:

  • Emit cluster signals: join, failure, and shutdown are some
    examples. The signals are discussed in depth in the next
  • Send messages to other nodes or the whole group.
  • Share data through a Distributed State Cache (DSC) (discussed

The framework calls your code when signals arrive from any node
on the group. For this to happen, you must register callback
objects as described in the "Listening to the Group's Messages"

Understanding Shoal Lifecycle Signals

Each node in the group has the following built-in cluster lifecycle signals to both send and receive from other nodes:

  • Join: Shoal tells all members that a new node is joining the
  • Joined and ready: The recently joined member is ready to
    process requests.
  • Failure suspected: Shoal doesn't receive a heartbeat response
    from a member and it suspects the node has failed. When the failure
    is confirmed, it sends a failure notify signal.
  • Failure notify: The group tells all the members about the
    confirmed failure of a node.
  • Failure recovery: The group tells a node to take the place of a
    failed node. This node must have a previously registered as a
    failure recovery node.
  • Planned shutdown: Shoal sends a signal to the group notifying
    an administrative shutdown of a member.

Shoal can send other signals related to the domain of the
application. Those are called message signals. Both signal
types can arrive at any time and can encapsulate arbitrary

Taking Advantage of Shoal's Failure Recovery

The most interesting signals are the failure family, as they
enable Shoal's recovery features. When a node fails and then tries
to recover, Shoal brings up something called recovery failure
. This protects the node from getting inconsistent
states and fences its recovery process. A virtual fence is raised
for a member's identity inside the group. This is done to protect
cluster operations from race conditions. When the fence is lowered
for the identity, any member can communicate with the previously
fenced one.

Sometimes another node wants to take the failed node's place.
This is achieved by a node getting the failed member's identity.
Let's make it clearer with an example.

When node A goes down, the other nodes in the group get notified
via a FailureSuspectedSignal. After a timeout period a
FailureNotifySignal is fired. If any of the other members in
the group is a FailureRecoveryAction listener, then these
nodes are candidates to substitute the failed one. Let's call one
such node "node B."

At this point, the GMS master node notifies node B that it has
to become a recovery delegate for the failed node. It sends the
node a FailureRecoverySignal. Finally, if node B acknowledges
that signal, it raises the fence, does the recovery work (such as
taking over resources or tasks of node A), and lowers the fence.
This takes us back to normal operation.

If the failed node goes up again, it must check whether the fence is
raised for its member identity. If it's not fenced, and it wants to
recover, it must raise a fence. After the recovery is finished, it
lowers the fence again.

Understanding Distributed State Cache

Shoal also provides a data sharing mechanism for the group
called the Distributed State Cache. Though not a full-blown
distributed cache like "">JBossCache, it
allows the nodes to share data.

The DSC is an associative data structure like a
java.util.Map. The default implementation uses a node-local
HashMap, and replicates the changes to the other nodes for
better performance. Beware that it doesn't implement a Least
Recently Used (LRU)
or any other algorithm to eliminate
unused elements. You can only put a serializable object into
the map as a key, a value, or both.

In the next section, we'll cover how to integrate Shoal into your

Integrating Shoal into Your Application

Embedding Shoal into your application is straightforward; the
time-consuming part is thinking about what clustering features you
really need: load balancing, fault tolerance, or both. If you add
something you don't need, the complexity grows and can jeopardize
your project.

Begin by downloading the Shoal distribution from "">Shoal's site. Unpack and
copy both shoal.jar and jxta.jar to your application
classpath. That's all you need besides a working network

It's very useful to test with two instances of the application,
so you can see how it responds to the other instance joining and
failing, along with message passing. So let's start by creating a
group and joining it.

Starting and Joining a Group

The first thing we need to create or join a group is a group
name. The first node to start the Shoal framework creates the group
with this name. If no other nodes for the same group on the network
have started, this node becomes the master node. From now on, each
node joining the group will acknowledge this node as a master.

If there's already a group created with the group name, the new
node joins as a vanilla member.

First you must start a new GMSModule by calling the
GMSFactory.startGMSModule() method, like the following
code fragment does:

GroupManagementService gms = (GroupManagementService)
    GMSFactory.startGMSModule(serverName, groupName, memberType, props);

The method receives four parameters:

  • groupName identifies the group name to create if it
    doesn't exist, or to join in the case the group is already
  • serverName uniquely identifies this instance in the
    cluster. You can't create a node with a duplicate name in the
  • The memberType parameter is an enum
    characterizing the member's role: CORE or
    SPECTATOR, as discussed in the previous section.
  • The properties you send to start the GMS module are mostly
    configuration parameters for the underlying SPI, such as JXTA
    network configuration parameters. These parameters range from the
    multicast address and port to the discovery and failure timeout and

In the case where a Shoal creates a new group, it will display
a message like the following in your console:

7:33:14 PM
getMemberTokens INFO: GMS View Change Received: Members in view
(before change analysis) are : 1: MemberId: node1, MemberType:
CORE, Address:

This message tells us that a new core member, called
node1, is a member of the group, and shows its unique JXTA
identifier. If the group already exists, the message includes the
other member JXTA identifiers.

After the group is created, you must call the
join() method, as the next code snippet shows:

try {
  catch (GMSException e) { 

After join()ing, Shoal will output a message like the following:

Nov 29, 2007 7:33:14 PM newViewObserved
INFO: Analyzing new membership snapshot received as part of event :

In this case, the member becomes the master of the group, as it
is the creator and only member at this moment.

Sending Messages

Once you are part of the group, you can send and receive
messages to and from other members. Shoal exposes a
GroupHandle object to achieve several tasks, one of
which is to send message signals. Let's see how you can use the
GroupHandle to send messages in the following

   gms.getGroupHandle().sendMessage(null, message.getBytes());

This method sends a byte array to all members. The first
parameter is the component name; in this case the null means
all components. The second is the byte array of a string.
The GroupHandle can also send a message to all
members, a specific subset, or just one member by using member
tokens. See the GroupHandle Javadocs for more

Listening to the Group's Messages

To begin listening to the group member messages, you must
register a callback with the framework. To do this, simply create a
class implementing the CallBack interface. You have to
implement just one method: processNotification(). The
following code fragment shows an implementation:

  public void processNotification(Signal signal) {
          if (signal instanceof MessageSignal) {
                          ":Message Received from:" 
                          + signal.getMemberToken()
                          + ":[" + signal.toString() + "]");
           } else {
                          ":Other Notification Received from:"
                          + signal.getMemberToken() + ":[" 
                          + signal.toString() + "]");

The signal must be acquire()d and
release()d to avoid race conditions. The
Signal's getMemberToken() method returns
the member token of the node that emitted the signal. It's very
useful for responding only to the member who sent the signal.

Once you have the message handling code, you must register the
CallBack with Shoal.You can register to receive custom
messages directed to a particular component of the cluster by
hooking up a MessageActionFactoryImpl and sending a
componentName parameter.

 gms.addActionFactory(new MessageActionFactoryImpl(callBackClass), componentName);

This particular MessageActionFactory receives
MessageSignals, the only non-lifecycle signals built
into Shoal. The ActionFactory will call the instance
of the CallBack object, every time a
MessageSignal arrives directed to the
componentName parameter.

Sharing Data using the DSC

To share data between nodes without using messages, you can use
the DSC. Remember that no automatic reaping of the unused values
occurs. The default implementation doesn't have any stale value
reaping mechanism or a capacity limit, so you must be very
cautious using the default DSC implementation
. You can write
another implementation to manage DSC according to the application

The DSC is accessed via the GroupHandle by invoking
the getDistributedStateCache() method, as in the
following code:

DistributedStateCache dsc = gms.getGroupHandle().getDistributedStateCache();

This gets a reference to the DSC. To add a value, use the
add() method:

dsc.addToCache( componentName, memberTokenId,  key,  state);

There's an individual cache for each group component; you
specify in which cache to store the state using the
componentName parameter. The second parameter is the
calling member's token. The third is the key you'll use to recover
the state. Finally, the fourth parameter is the state to store in
the cache.

To retrieve the stored value, use the get()

 Object o = dsc.getFromCache( componentName,  memberTokenId, key) ;

This retrieves the state, using the key to search it. Finally, to
remove an object, use the remove() method:

 dsc.removeFromCache( componentName,  memberTokenId,  key) ;

This eliminates the state associated with the key from the DSC.

Shutting Down

Hopefully, after doing some useful work, the node can shut down.
The master node can shut down all members in the group. To achieve
both you must call the shutdown() method of the
gms object. Shutting down members or the whole group
sends planned shutdown signals to their members.


Hopefully this article clarifies some of the less-documented
aspects of Shoal and kickstarts new users of the framework.

Clustering gives you a solution to solve two important problems:
fault tolerance and load balancing, a must for enterprise
mission-critical applications. Load balancing offers much-needed
scalability and fault tolerance means less downtime.

Shoal makes it easy for you to achieve both load balancing and
fault tolerance characteristics in your application. However, the
most commonly used scenario is group communication, including
message passing and state caching. You can easily add cluster
facilities to your application by integrating the Shoal framework
into your code.

To see detailed characteristics and advanced uses of the
framework, such as cross subnets groups, take a look at the
and "">design
included with Shoal.


width="1" height="1" border="0" alt=" " />
Juan Pedro Danculovic lives in Buenos Aires, Argentina and works for the IT Architecture department of the country's biggest Health care service provider.
Related Topics >> Programming   |