Vocal Java Vocal Java

by Jeff Friesen
04/13/2006

Several years back, I configured a blind gentleman's Microsoft-Windows-based computer to vocally identify the window under the mouse pointer. As he moved the pointer around the screen, the computer spoke the name of the underlying window. I have never forgotten how beneficial that speaking computer was to that gentleman's life.

My earlier work on configuring a Windows-based computer to speak inspired me to create an equivalent assistive technology for use in Java contexts. This technology transparently helps blind users interact with Swing-based GUIs. It also can be used with AWT-based GUIs, provided that those GUIs are made accessible to the technology.

This article introduces my Speaker assistive technology. Because Speaker depends on Sun's Java Speech API (Sun's preferred choice for supporting various speech technologies on any Java platform) and FreeTTS (my preferred Java Speech implementation for making Speaker speak), the article first reviews those technologies. I developed and tested this article's code with Sun's J2SE 5.0 SDK and FreeTTS 1.2.1. Windows 98 SE was the underlying platform.

Java Speech API Overview

The Java Speech API is a specification that describes a standard set of classes and interfaces for integrating speech technologies into Java software. Sun released version 1.0 (the only version to date) of this specification on October 26, 1998. It is important to keep in mind that Java Speech is only a specification--no implementation is included.

Java Speech supports two kinds of speech technologies: speech recognition and speech synthesis. Speech recognition converts speech to text. Special input devices called recognizers make speech recognition possible. In contrast, speech synthesis converts text to speech. Special output devices called synthesizers make speech synthesis possible.

The javax.speech package defines the common functionality of recognizers, synthesizers, and other speech engines. The package javax.speech.recognition extends this basic functionality for recognizers. Similarly, javax.speech.synthesis extends this basic functionality for synthesizers.

Speaker focuses on speech synthesis. To understand the synthesizer part of its source code, you need to know a few Java Speech API items (learn more by reading "The Java Speech API, Part 1"), which I describe from Speaker's perspective:

To reinforce your understanding of the Java Speech API items presented above, I've created a simple Swing application called TextSpeaker. This application lets you type some text into a text component, and then click a button to hear the synthesizer for the default locale speak that text. Its source code appears below.

// TextSpeaker.java

import java.awt.*;
import java.awt.event.*;

import javax.speech.*;
import javax.speech.synthesis.*;
import javax.swing.*;

public class TextSpeaker
{
   static Synthesizer synth;

   public static JFrame createGUI ()
   {
      JFrame frame = new JFrame ("Text Speaker");

      WindowListener wl;
      wl = new WindowAdapter ()
           {
               public void windowClosing
                                   (WindowEvent e)
               {
                  try
                  {
                      // Deallocate synthesizer
                      // resources.

                      synth.deallocate ();
                  }
                  catch (EngineException e2)
                  {
                  }

                  System.exit (0);
               }
           };
      frame.addWindowListener (wl);

      JPanel p = new JPanel ();

      p.add (new JLabel ("Specify text to " +
                         "speak:"));

      final JTextField text = new JTextField (20);
      p.add (text);

      frame.getContentPane ().add (p,
                              BorderLayout.NORTH);

      p = new JPanel ();
      p.setLayout (new FlowLayout
                              (FlowLayout.RIGHT));

      JButton btnSpeak = new JButton ("Speak");

      ActionListener al;
      al = new ActionListener ()
           {
               public void actionPerformed
                                   (ActionEvent e)
               {
                  // Speak the context of the text
                  // field, ignoring JSML tags. 
                  // Pass null as the second
                  // argument because I am not
                  // interested in attaching a
                  // listener that receives events
                  // as text is spoken.

                  synth.speakPlainText
                          (text.getText (), null);

                  try
                  {
                      // Block this thread until
                      // the synthesizer's queue
                      // is empty (all text has
                      // been spoken). Normally,
                      // blocking the
                      // event-dispatching thread
                      // is not a good idea.
                      // However, the amount of
                      // text to be spoken should
                      // not take more than a few
                      // seconds to speak, and the
                      // user probably would not
                      // need to do anything with
                      // the GUI until the text
                      // had been spoken.

                      synth.waitEngineState
                        (Synthesizer.QUEUE_EMPTY);
                  }
                  catch (InterruptedException e2)
                  {
                  }
               }
           };
      btnSpeak.addActionListener (al);

      p.add (btnSpeak);

      JButton btnClear = new JButton ("Clear");

      al = new ActionListener ()
           {
               public void actionPerformed
                                   (ActionEvent e)
               {
                  text.setText ("");
                  text.requestFocusInWindow ();
               }
           };
      btnClear.addActionListener (al);

      p.add (btnClear);

      frame.getContentPane ().add (p,
                              BorderLayout.SOUTH);

      frame.getRootPane ().setDefaultButton
                                       (btnSpeak);

      frame.pack ();

      return frame;
   }

   public static void main (String [] args)
   {
      try
      {
          // Create a synthesizer for the default
          // locale.

          synth = Central.createSynthesizer
                                           (null);

          // Allocate synthesizer resources.

          synth.allocate ();

          // Place synthesizer in the RESUMED
          // state so that it can produce speech
          // as it receives text.

          synth.resume ();
      }
      catch (Exception e)
      {                 
          JOptionPane.showMessageDialog (null,
                                 e.getMessage ());
          System.exit (0);
      }

      createGUI ().setVisible (true);
   }
}

Now that you've examined TextSpeaker.java, you'll want to compile this source code and run the application. Before you can do that, however, you must install a Java Speech implementation--like FreeTTS.

FreeTTS Overview

FreeTTS is a speech synthesizer written entirely in Java. It was created by Sun's Speech Integration Group and is based on Carnegie Mellon University's Flite run-time speech synthesis engine. Although FreeTTS does not support speech recognition, and although FreeTTS places some limits on speech synthesis, FreeTTS is free to download, install, modify, and use.

To download FreeTTS, point your web browser to the FreeTTS 1.2 home page. Select the "Downloading and Installing" link near the top of the page and follow the instructions to download the binary .zip file. You can also download the source and test .zip files, if you plan to make changes to FreeTTS.

Assuming that you download freetts-1.2.1-bin.zip, unzip that file and move the freetts-1.2.1-bin\freetts-1.2.1 directory to a location of your choice, such as the root directory on the C: drive on Windows. In this case, you should end up with c:\freetts-1.2.1 as the FreeTTS home directory, which I refer to as FREETTS_HOME.

You are almost ready to compile TextSpeaker.java and run the resulting application. But first you need to configure the FreeTTS environment:

Compile TextSpeaker.java. If there are any compilation errors, check your CLASSPATH setting--it must include at least jsapi.jar for compilation to succeed.

Invoke java TextSpeaker to run TextSpeaker. Enter some text in the resulting GUI (see Figure 1) and click the Speak button. You should hear that text being spoken.

Figure 1
Figure 1. Type some text to speak and click the Speak button

Congratulations on getting TextSpeaker to speak via FreeTTS. But we can do better: let's use our knowledge of Java Speech and FreeTTS to build Speaker, an assistive technology that lets the blind hear their GUIs.

Let Me Hear You Speak

Speaker depends on Java's Accessibility API and Sun's Java Accessibility Utilities. The Accessibility API lets you make your Java GUIs accessible to assistive technologies--specialized tools that help people interact with GUIs. Voice synthesizers and voice recognizers are perhaps the most common examples of assistive technologies.

Little (if anything) needs to be done to make GUIs based on standard Swing components accessible to Speaker. Because AWT-based GUIs are not accessible to Speaker, I have prepared the code fragment below to show you how to make an AWT Checkbox component accessible:

CheckboxGroup cbg = new CheckboxGroup ();
Checkbox cb = new Checkbox ("Over 65", cbg, true);
cb.getAccessibleContext ().
  setAccessibleName ("Over 65");

The getAccessibleContext() method returns a javax.swing.AccessibleContext object. That object bundles component information that any assistive technology can query. Because AWT components do not provide accessible information, setAccessibleName() provides a name for Speaker to access. This name can be accessed from Speaker by invoking the companion getAccessibleName() method.

Sun's Java Accessibility Utilities help you determine how accessible your GUIs are. The distribution file that contains those utilities also contains jaccess.jar, a package of classes and interfaces that are required by Java-based assistive technologies.

Download version 1.3 of these utilities from Sun's download page. You can choose to download a compressed .tar file, a gzip .tar file, or a .zip file. Regardless of which file you download, extract jaccess.jar, copy that file to FreeTTS's lib directory (a convenient location), and add jaccess.jar to your CLASSPATH. On a Windows platform, you can specify set classpath=%classpath%;c:\freetts-1.2.1\lib\jaccess.jar.

The jaccess.jar file associates with the com.sun.java.accessibility.util package name. Speaker interacts with three of that package's classes and interfaces:

Now that you have an idea of what GUIInitializedListener, EventQueueMonitor, and AWTEventMonitor accomplish, let's see how Speaker works with those types. Examine the source code below.

// Speaker.java

import java.awt.*;
import java.awt.event.*;

import javax.accessibility.*;
import javax.speech.*;
import javax.speech.synthesis.*;
import javax.swing.*;

import com.sun.java.accessibility.util.*;

public class Speaker implements
                            GUIInitializedListener
{
   Synthesizer synth;

   public Speaker ()
   {
      try
      {
          // Create a synthesizer for the default
          // locale.

          synth = Central.createSynthesizer
                                           (null);

          // Allocate synthesizer resources.

          synth.allocate ();

          // Place synthesizer in the RESUMED
          // state so that it can produce speech
          // as it receives text.

          synth.resume ();
      }
      catch (Exception e)
      {                 
          JOptionPane.showMessageDialog (null,
                                 e.getMessage ());
          return;
      }

      // If JVM GUI subsystem is ready to
      // interface with assistive technology,
      // invoke speakerInit(). Otherwise, register
      // current Speaker object as a listener,
      // whose guiInitialized() method will be
      // invoked when the GUI subsystem is ready.

      if (EventQueueMonitor.isGUIInitialized ())
          speakerInit ();
      else
          EventQueueMonitor.
                 addGUIInitializedListener (this);
   }

   public void guiInitialized ()
   {
      speakerInit ();
   }

   void speakerInit ()
   {
      // Register a window listener that
      // deallocates synthesizer resources when a
      // JFrame window with an EXIT_ON_CLOSE
      // operation is closing.

      WindowListener wl;
      wl = new WindowAdapter ()
           {
               public void windowClosing
                                   (WindowEvent e)
               {
                  Window w = e.getWindow ();
                  if (!(w instanceof JFrame))
                      return;

                  JFrame f = (JFrame) w;
                  if (f.getDefaultCloseOperation
                       () == JFrame.EXIT_ON_CLOSE)
                      try
                      {
                          // Deallocate
                          // synthesizer
                          // resources.

                          synth.deallocate ();
                      }
                      catch (Exception e2)
                      {
                      }
               }
           };
      AWTEventMonitor.addWindowListener (wl);

      // Register a mouse listener that speaks the
      // name of an accessible component when the
      // mouse pointer enters that component.

      MouseListener ml;
      ml = new MouseAdapter ()
           {
               public void mouseEntered
                                    (MouseEvent e)
               {
                   Component c = (Component)
                                   e.getSource ();

                   Accessible a;
                   a = SwingUtilities.
                               getAccessibleAt (c,
                                   e.getPoint ());
                   if (a == null)
                       return;

                   AccessibleContext ac = a.
                          getAccessibleContext ();
                   if (ac == null)
                       return;

                   String text = ac.
                             getAccessibleName ();
                   if (text == null)
                       return;

                   // Speak the component's name.

                   synth.speakPlainText (text,
                                         null);

                   try
                   {
                       // Wait for synthesizer to
                       // finish speaking.

                       synth.waitEngineState
                        (Synthesizer.QUEUE_EMPTY);
                   }
                   catch (InterruptedException e2)
                   {
                   }
               }
           };
      AWTEventMonitor.addMouseListener (ml);
   }
}

Speaker's source code should be fairly easy to understand. However, you might be wondering why I've specified a = SwingUtilities.getAccessibleAt (c, e.getPoint ()); instead of working with EventQueueMonitor's public static Point getCurrentMousePosition() and public static Accessible getAccessibleAt(Point pt) methods. I could not get those methods to work properly in the context of the mouseEntered() method: I discovered that the mouse pointer had to be moved off of a component before Speaker would speak that component's name. This behavior was unacceptable to me.

If you set up your environment as specified earlier, you should be able to compile Speaker.java. Compilation results in three class files: Speaker.class, Speaker$1.class, and Speaker$2.class. I found it convenient to archive these class files into a Speaker.jar file (with the command jar cf Speaker.jar *.class), copy Speaker.jar to c:\freetts-1.2.1\lib, and add Speaker.jar to the CLASSPATH. On my Windows platform, I specified set classpath=%classpath%;c:\freetts-1.2.1\lib\Speaker.jar.

One last item has to be taken care of before you can use Speaker. You have to tell the JVM to automatically load Speaker.jar at startup. Accomplish that task by placing the following line in your accessibility.properties file (in your JAVA_HOME\jre\lib or JAVA_HOME/jre/lib directory): assistive_technologies=Speaker. If that file does not exist, create an accessibility.properties file with that single line.

To put Speaker through its paces, start TextSpeaker and move the mouse pointer around that application's GUI. You should hear your computer speak as you move the pointer over the label and either button. However, it does not speak when the mouse pointer moves over the text component--I leave figuring out why that is the case as an exercise.

Tip: To use Speaker with an applet that you start via appletviewer, specify both the CLASSPATH environment variable value and a policy file that grants appropriate permissions. For example: appletviewer -J-classpath -J%CLASSPATH% -J-Djava.security.policy=my.policy applet.html--my.policy contains grant { permission java.security.AllPermission; };.

Conclusion

Blind users need help to interact with GUI environments. My former experience in helping a blind gentleman use his Windows-based computer, by getting that computer to speak, inspired me to create an equivalent assistive technology for Java. My Speaker assistive technology vocalizes GUI component names as the user moves the mouse pointer over those components, which helps blind users navigate their way around Java GUIs.

To speak Java GUIs, Speaker depends on Sun's Java Speech API, via the FreeTTS implementation of that API. I've purposely minimized Speaker's interaction with Java Speech/FreeTTS to keep this technology simple. Therefore, you might want to extend Speaker; one idea is to let Speaker's constructor access a property file and choose a voice based on that file's contents (you'll need to add voices to FreeTTS--learn how in its documentation).

Resources

Jeff Friesen is a freelance software developer and educator specializing in Java technology. Check out his site at javajeff.mb.ca.


 Feed java.net RSS Feeds