Skip to main content

A Method for Reducing Contention and Overhead in Worker Queues for Multithreaded Java Applications

June 14, 2011

[Editor's note: the following article was submitted by Sathiskumar Palaniappan, Kavitha Varadarajan, and Jayashree Viswanathan.]

Introduction

Many server applications, such as Web servers, application servers, database servers, file servers, and mail servers, maintain worker queues and thread pools to handle large numbers of short tasks that arrive from remote sources. In general, a "worker queue" holds all the short tasks that need to be executed, and the threads in the thread pool retrieve the tasks from the worker queue and complete the tasks.

Since multiple threads act on the worker queue, adding tasks to and deleting tasks from the worker queue needs to be synchronized, which introduces contention in the worker queue. This article explains the contention involved with the traditional approach (using a common queue for the thread pool) and helps you reduce the contention by maintaining one queue per thread. This article also explains a work stealing technique that is important for utilizing the CPU effectively in multicore systems.

Note: The source code for the examples described in this article can be downloaded here: workerqueue.zip

Common Worker Queue: The Traditional Approach

Today, most server applications use a common worker queue and thread pool to exploit the concurrency provided by the underlying hardware. As shown in Figure 1, server applications use a common worker queue to hold short tasks that arrive from remote sources. A pool of threads acts on the worker queue by retrieving tasks from the worker queue and running the tasks to completion. Threads are blocked on the queue if there is no task in the worker queue.

This method of using a common worker queue resolves the issues created by earlier approaches, such as creating a thread per task, which caused lots of threads to be spawned. However, the common worker queue method creates a bottleneck when the number of tasks is high and the task time is very short. The single background thread approach also has flaws when an application has a huge number of short-spanned, independent tasks.


Figure 1. Common Worker Queue.

Listing 1 shows how you can create a common worker queue with just few lines of code.

Listing 1. Creating a Common Worker Queue

/*
* Defines common worker queue and pool of threads to execute tasks from remote sources
*/
public class SimpleWorkQueue {
    private final PoolWorker[] threads;
    private final BlockingDeque queue;

    public SimpleWorkQueue(int nThreads)
    {
        queue = new LinkedBlockingDeque();
        threads = new PoolWorker[nThreads];
        for (int i=0; i<nThreads; i++) {
            threads[i] = new PoolWorker();
            threads[i].start();
        }
    }

    /*
     * Worker thread to execute remote tasks
     */
    private class PoolWorker extends Thread {      
    /*
    * Method to retrieve task from worker queue and start executing it.
    * This thread will wait for a task if there is no task in the queue.
    */
        public void run() {
            while (!stopNow) {
                try {
   Runnable r = (Runnable) queue.takeLast();
   r.run();
                } catch ( java.lang.Throwable  e) { }
            }
        }
    }
}

As shown in Listing 1, the SimpleWorkQueue class initializes a dequeue and starts a fixed number of threads at startup. Each thread then executes queue.takeLast() in a loop that retrieves a task from the worker queue (if there are tasks) or waits for a new task to arrive (if it finds the queue is empty). Once a task is retrieved, each thread then calls the run method, r.run(), of the task.

Worker Queues per Thread

The approach above is very simple and improves performance over the traditional approach of creating threads for each incoming task. However, as shown in Figure 2, this method creates contention. Contention is created when multiple threads use a single work queue to get their task. The condition is worse when the number of threads (cores) is higher.


Figure 2. Contention in a Common Worker Queue.

Today, with the advent of more multicore processors, it becomes a challenge for software applications to utilize the underlying cores effectively. (For example, IBM's Power7, Oracle's UltraSPARC, and Intel's Nehalem are multicore processors capable of running multiple threads.)

There are various solutions available for overcoming the contention in the common worker queue approach:

  • Using lock-free data structures
  • Using concurrent data structures with multiple locks
  • Maintaining multiple queues to isolate the contention

In this article, we explain how to maintain multiple queues—a queue-per-thread approach—to isolate the contention, as shown in Figure 3.


Figure 3. Queue-per-Thread Queue.

In this approach, each thread has its own worker queue and can retrieve tasks only from its own queue, not from any other queue. This approach isolates contention when retrieving tasks because there is no one to compete with. This guarantees that threads will not be in a sleeping state if there are tasks in the worker queue, which utilizes the cores effectively.

Listing 2 shows how you can easily migrate from the common worker queue approach to the queue-per-thread approach by making just a few modifications to the code that was shown in Listing 1. In Listing 2, the constructor initializes multiple queues (equal to the number of threads) at startup and each thread maintains an ID called thread_id. Then, thread_id is used to isolate the contention by helping each thread retrieve tasks from its own queue.

Listing 2. Creating a Queue-per-Thread Queue

/* Modification to number of queue initialization */
for (int i=0; i    queue[i] = new LinkedBlockingDeque();
}
...
......
/* Modification in task retrieval */
r = (Runnable) queue[thread_id].takeLast();

Queue-per-Thread Queue with Work Stealing

Although the queue-per-thread approach greatly reduces the contention, it does not guarantee that the underlying cores are used effectively all the time, For example, what happens if a couple of queues get emptied long before other queues? This is a common situation, and in this case, only a few threads execute the tasks whereas other threads (emptied queues threads) wait for the new tasks to arrive. This can happen due to following:

  • Unpredictable nature of the scheduling algorithm
  • Unpredictable nature of the incoming tasks (short versus long)

A solution to this problem is work stealing.

Work stealing lets one thread steal work from another queue when it finds that its own queue is empty. This ensures that all the threads (and, in turn, the cores) are busy all the time. Figure 4 shows a scenario where Thread 2 steals a work from Thread 1

Kevin Farnham is the java.net managing editor.
AttachmentSize
worker_queue_fig1.gif14.6 KB
worker_queue_fig2.gif16.04 KB
worker_queue_fig3.gif26.8 KB
worker_queue_fig4.gif10.57 KB
worker_queue_fig5.gif38.7 KB
worker_queue_fig6.gif28.97 KB
workerqueue.zip21.89 KB
Related Topics >> Databases   |   Extreme Programming   |   J2EE   |   J2SE   |   Performance   |   Programming   |   Featured Article   |   

Comments

Hi I would be interesting how this compares to join/fork in ...

Hi

I would be interesting how this compares to join/fork in JDK7 and the java.util.concurrent.Executor. I mean who will actually do its own thread pool nowdays?

In the work stealing algorithm, any thread with an empty ...

In the work stealing algorithm, any thread with an empty queue tries the queues in the same order, so there's going to be more contention on queue 0, which will empty faster, then putting more contention on queue 1, and so on.

A small refinement to alleviate contention a bit more would be to have a thread with an empty queue first try the queue of the next thread, then the next one if that thread has an empty queue, and so on. The loop for the stealWork method would now be like this:

// go through queues from index + 1 to (index + (nThreads - 1)) % nThreads
for (int i = 1 ; i &lt; nThreads ; i++) {
   Object o = queues[(index + i) % nThreads].pollFirst();
   ...

It would be interesting to see it that makes a difference in the benchmark.

Thank you for the article.

Though I liked this article, I found nothing new about ...

Though I liked this article, I found nothing new about creating a Queue per worker thread. The new thing I find is this work stealing algorithm. What would be nice to add would be how the synchronizaton code would look like if each of those tasks had to read/write to the same database and access shared in-memory data structures.

In the multiple queue approach there is an overhead in ...

In the multiple queue approach there is an overhead in replicating the data in all worker queues which could result in big memory footprint. Also I was not clear how the coordination is done among multiple threads to ensure no two threads act on the same task in different worker queues.

you really explain great , great tutorial thanks for share

you really explain great , great tutorial thanks for share