Skip to main content

Vocal Java

April 13, 2006

{cs.r.title}







Several years back, I configured a blind gentleman's Microsoft-Windows-based computer to vocally identify the window under the mouse pointer. As he moved the pointer around the screen, the computer spoke the name of the underlying window. I have never forgotten how beneficial that speaking computer was to that gentleman's life.

My earlier work on configuring a Windows-based computer to speak inspired me to create an equivalent assistive technology for use in Java contexts. This technology transparently helps blind users interact with Swing-based GUIs. It also can be used with AWT-based GUIs, provided that those GUIs are made accessible to the technology.

This article introduces my Speaker assistive technology. Because Speaker depends on Sun's Java Speech API (Sun's preferred choice for supporting various speech technologies on any Java platform) and FreeTTS (my preferred Java Speech implementation for making Speaker speak), the article first reviews those technologies. I developed and tested this article's code with Sun's J2SE 5.0 SDK and FreeTTS 1.2.1. Windows 98 SE was the underlying platform.

Java Speech API Overview

The Java Speech API is a specification that describes a standard set of classes and interfaces for integrating speech technologies into Java software. Sun released version 1.0 (the only version to date) of this specification on October 26, 1998. It is important to keep in mind that Java Speech is only a specification--no implementation is included.

Java Speech supports two kinds of speech technologies: speech recognition and speech synthesis. Speech recognition converts speech to text. Special input devices called recognizers make speech recognition possible. In contrast, speech synthesis converts text to speech. Special output devices called synthesizers make speech synthesis possible.

The javax.speech package defines the common functionality of recognizers, synthesizers, and other speech engines. The package javax.speech.recognition extends this basic functionality for recognizers. Similarly, javax.speech.synthesis extends this basic functionality for synthesizers.

Speaker focuses on speech synthesis. To understand the synthesizer part of its source code, you need to know a few Java Speech API items (learn more by reading "The Java Speech API, Part 1"), which I describe from Speaker's perspective:

  • Before anything else, Speaker attempts to create an appropriate synthesizer. Speaker handles this task by invoking the javax.speech.Central method public static final Synthesizer createSynthesizer (EngineModeDesc require). The require parameter identifies either a javax.speech.EngineModeDesc object, which defines properties common to all speech engines (engine name, mode name, locale, and whether or not a speech engine is already running), or a javax.speech.synthesis.SynthesizerModeDesc object, which introduces two synthesizer-specific properties: a list of voices (older male voice, younger female voice, for example) and the voice that loads when the synthesizer starts. This method returns a javax.speech.synthesis.Synthesizer matching the required engine properties, or null if none is available. If those properties don't refer to a known engine or engine mode, an IllegalArgumentException is thrown. A javax.speech.EngineException is thrown if the synthesizer engine cannot be properly created.

    To put this discussion into perspective, the following code fragment shows you how to create a synthesizer for the Spanish language, and give that synthesizer a female voice:

    // Create a mode descriptor with all required
    // features: "es" is the ISO 639 language code for
    // "Spanish".

    SynthesizerModeDesc required =
      new SynthesizerModeDesc ();
    required.setLocale (new Locale ("es", null));
    required.addVoice (new Voice (null,
                                  GENDER_FEMALE,
                                  AGE_DONT_CARE,
                                  null));
    Synthesizer synth =
      Central.createSynthesizer (required);

    Because Speaker isn't language-specific, and because it doesn't care about the gender of the voice, it passes null as the value of required. This causes a synthesizer to be created that supports the language of the default locale; for example, Synthesizer synth = Central.createSynthesizer (null);.

  • Assuming that a synthesizer is successfully created, Speaker next attempts to allocate resources such as the sound card. It accomplishes this task by invoking the javax.speech.Engine method public void allocate(). During allocation, the synthesizer typically transitions from the deallocated state (the initial state after creating the synthesizer) to the allocating resources state. If allocation succeeds, the synthesizer enters the allocated state (along with the paused and queue empty sub-states of the allocated state). If an allocation error occurs, allocate() throws an EngineException; a javax.speech.EngineStateError is thrown if the synthesizer is in the deallocating resources state when allocate() is called (Speaker avoids this problem). For example: synth.allocate ();.

  • Assuming that the synthesizer's resources were successfully allocated, Speaker invokes Engine's public void resume() method to resume audio streaming to a paused synthesizer. If this method succeeds, the synthesizer enters the resumed sub-state of the allocated state. This method throws a javax.speech.AudioException if it cannot gain access to the audio channel; an EngineStateError is thrown if the synthesizer is in the deallocated or deallocating resources states when resume() is called (Speaker avoids this problem). For example: synth.resume ();

  • Now comes the good stuff: when the mouse pointer moves over a GUI component, Speaker invokes Synthesizer's public void speakPlainText(String text, SpeakableListener listener) method. The parameter text contains a plain text string (as opposed to a Java Speech Markup Language (JSML) string that annotates text with structural and presentation information to improve the speech output quality), identifying the text to speak. The parameter listener identifies a javax.speech.synthesis.SpeakableListener that receives notification of events during spoken output--Speaker does not use a SpeakableListener so it passes null. This method places the text at the end of the synthesizer's speaking queue; that text is spoken once it reaches the front of that queue. An EngineStateError is thrown if the synthesizer is in the deallocated or deallocating resources states when speakPlainText() is called (Speaker avoids this problem). For example: synth.speakPlainText ("Hello", null);.
  • After successfully invoking speakPlainText(), Speaker invokes Engine's public void waitEngineState(long state) method to wait until the synthesizer's queue is empty before continuing. The parameter state is a bitwise OR'ed set of constants--Speaker only specifies the Synthesizer.QUEUE_EMPTY constant. This method throws an InterruptedException if another thread interrupts the waiting thread; an IllegalArgumentException is thrown if the specified state is unreachable (Speaker avoids this problem). For example: synth.waitEngineState (Synthesizer.QUEUE_EMPTY);.

  • Speaker must deallocate synthesizer resources prior to termination. It accomplishes this task by invoking Engine's public void deallocate() method. If a deallocation error occurs, deallocate() throws an EngineException; an EngineStateError is thrown if the synthesizer is in the allocating-resources state when deallocate() is called (Speaker avoids this problem). For example: synth.deallocate ();.

To reinforce your understanding of the Java Speech API items presented above, I've created a simple Swing application called TextSpeaker. This application lets you type some text into a text component, and then click a button to hear the synthesizer for the default locale speak that text. Its source code appears below.

// TextSpeaker.java

import java.awt.*;
import java.awt.event.*;

import javax.speech.*;
import javax.speech.synthesis.*;
import javax.swing.*;

public class TextSpeaker
{
   static Synthesizer synth;

   public static JFrame createGUI ()
   {
      JFrame frame = new JFrame ("Text Speaker");

      WindowListener wl;
      wl = new WindowAdapter ()
           {
               public void windowClosing
                                   (WindowEvent e)
               {
                  try
                  {
                      // Deallocate synthesizer
                      // resources.

                      synth.deallocate ();
                  }
                  catch (EngineException e2)
                  {
                  }

                  System.exit (0);
               }
           };
      frame.addWindowListener (wl);

      JPanel p = new JPanel ();

      p.add (new JLabel ("Specify text to " +
                         "speak:"));

      final JTextField text = new JTextField (20);
      p.add (text);

      frame.getContentPane ().add (p,
                              BorderLayout.NORTH);

      p = new JPanel ();
      p.setLayout (new FlowLayout
                              (FlowLayout.RIGHT));

      JButton btnSpeak = new JButton ("Speak");

      ActionListener al;
      al = new ActionListener ()
           {
               public void actionPerformed
                                   (ActionEvent e)
               {
                  // Speak the context of the text
                  // field, ignoring JSML tags.
                  // Pass null as the second
                  // argument because I am not
                  // interested in attaching a
                  // listener that receives events
                  // as text is spoken.

                  synth.speakPlainText
                          (text.getText (), null);

                  try
                  {
                      // Block this thread until
                      // the synthesizer's queue
                      // is empty (all text has
                      // been spoken). Normally,
                      // blocking the
                      // event-dispatching thread
                      // is not a good idea.
                      // However, the amount of
                      // text to be spoken should
                      // not take more than a few
                      // seconds to speak, and the
                      // user probably would not
                      // need to do anything with
                      // the GUI until the text
                      // had been spoken.

                      synth.waitEngineState
                        (Synthesizer.QUEUE_EMPTY);
                  }
                  catch (InterruptedException e2)
                  {
                  }
               }
           };
      btnSpeak.addActionListener (al);

      p.add (btnSpeak);

      JButton btnClear = new JButton ("Clear");

      al = new ActionListener ()
           {
               public void actionPerformed
                                   (ActionEvent e)
               {
                  text.setText ("");
                  text.requestFocusInWindow ();
               }
           };
      btnClear.addActionListener (al);

      p.add (btnClear);

      frame.getContentPane ().add (p,
                              BorderLayout.SOUTH);

      frame.getRootPane ().setDefaultButton
                                       (btnSpeak);

      frame.pack ();

      return frame;
   }

   public static void main (String [] args)
   {
      try
      {
          // Create a synthesizer for the default
          // locale.

          synth = Central.createSynthesizer
                                           (null);

          // Allocate synthesizer resources.

          synth.allocate ();

          // Place synthesizer in the RESUMED
          // state so that it can produce speech
          // as it receives text.

          synth.resume ();
      }
      catch (Exception e)
      {                
          JOptionPane.showMessageDialog (null,
                                 e.getMessage ());
          System.exit (0);
      }

      createGUI ().setVisible (true);
   }
}

Now that you've examined TextSpeaker.java, you'll want to compile this source code and run the application. Before you can do that, however, you must install a Java Speech implementation--like FreeTTS.

FreeTTS Overview

FreeTTS is a speech synthesizer written entirely in Java. It was created by Sun's Speech Integration Group and is based on Carnegie Mellon University's Flite run-time speech synthesis engine. Although FreeTTS does not support speech recognition, and although FreeTTS places some limits on speech synthesis, FreeTTS is free to download, install, modify, and use.

To download FreeTTS, point your web browser to the FreeTTS 1.2 home page. Select the "Downloading and Installing" link near the top of the page and follow the instructions to download the binary .zip file. You can also download the source and test .zip files, if you plan to make changes to FreeTTS.

Assuming that you download freetts-1.2.1-bin.zip, unzip that file and move the freetts-1.2.1-bin\freetts-1.2.1 directory to a location of your choice, such as the root directory on the C: drive on Windows. In this case, you should end up with c:\freetts-1.2.1 as the FreeTTS home directory, which I refer to as FREETTS_HOME.

You are almost ready to compile TextSpeaker.java and run the resulting application. But first you need to configure the FreeTTS environment:

  • Extract JSAPI: FreeTTS stores the class files for Java Speech API classes and interfaces in a file named jsapi.jar. For licensing reasons, jsapi.jar is distributed in the file jsapi.sh (UNIX) or as jsapi.exe (Windows). Both self-extracting archives are located in the FREETTS_HOME/lib directory (UNIX) or FREETTS_HOME\lib directory (Windows). Change to that directory and invoke jsapi. You will be prompted to accept Sun's Binary Code license agreement. Accept that agreement and jsapi.jar will be extracted into lib.
  • Copy speech.properties: The FREETTS_HOME directory contains a file named speech.properties. That file is used by jsapi.jar's Central class to determine which speech engine to use. Copy this file to your %user.home%, %java.home%/lib (UNIX), or %java.home%\lib (Windows) directory. You can obtain the %user.home% and %java.home% values by executing the following Java code: System.out.println (System.getProperty ("user.home")); System.out.println (System.getProperty ("java.home"));.
  • Modify voices.txt: The lib directory contains a voices.txt file that specifies the voice directories (lists of voices) available to FreeTTS. These voice directories and voices are stored in .jar files--cmu_time_awb.jar stores the Alan voice directory for clock-specific speech, and cmu_us_kal.jar stores the Kevin voice directory for generic speech. The voice directories are listed as com.sun.speech.freetts.en.us.cmu_time_awb.AlanVoiceDirectory followed by com.sun.speech.freetts.en.us.cmu_us_kal.KevinVoiceDirectory. Because TextSpeaker and Speaker request the default voice, FreeTTS will return the voice associated with the first voices.txt entry--Alan's voice. Because that voice is restricted to clock-specific speech, these applications will most likely produce no speech or send error messages to the console. To solve this problem, either comment out the Alan voice directory line by placing a # character at the head of that line, or move the Kevin voice directory line before the Alan voice directory line.
  • Modify the CLASSPATH: Add jsapi.jar and freetts.jar to your CLASSPATH. Assuming Windows and the previous home directory, use the following command: set classpath=%classpath%;c:\freetts-1.2.1\lib\jsapi.jar;c:\freetts-1.2.1\lib\freetts.jar;..

Compile TextSpeaker.java. If there are any compilation errors, check your CLASSPATH setting--it must include at least jsapi.jar for compilation to succeed.

Invoke java TextSpeaker to run TextSpeaker. Enter some text in the resulting GUI (see Figure 1) and click the Speak button. You should hear that text being spoken.

Figure 1
Figure 1. Type some text to speak and click the Speak button

Congratulations on getting TextSpeaker to speak via FreeTTS. But we can do better: let's use our knowledge of Java Speech and FreeTTS to build Speaker, an assistive technology that lets the blind hear their GUIs.

Let Me Hear You Speak

Speaker depends on Java's Accessibility API and Sun's Java Accessibility Utilities. The Accessibility API lets you make your Java GUIs accessible to assistive technologies--specialized tools that help people interact with GUIs. Voice synthesizers and voice recognizers are perhaps the most common examples of assistive technologies.

Little (if anything) needs to be done to make GUIs based on standard Swing components accessible to Speaker. Because AWT-based GUIs are not accessible to Speaker, I have prepared the code fragment below to show you how to make an AWT Checkbox component accessible:

CheckboxGroup cbg = new CheckboxGroup ();
Checkbox cb = new Checkbox ("Over 65", cbg, true);
cb.getAccessibleContext ().
  setAccessibleName ("Over 65");

The getAccessibleContext() method returns a javax.swing.AccessibleContext object. That object bundles component information that any assistive technology can query. Because AWT components do not provide accessible information, setAccessibleName() provides a name for Speaker to access. This name can be accessed from Speaker by invoking the companion getAccessibleName() method.

Sun's Java Accessibility Utilities help you determine how accessible your GUIs are. The distribution file that contains those utilities also contains jaccess.jar, a package of classes and interfaces that are required by Java-based assistive technologies.

Download version 1.3 of these utilities from Sun's download page. You can choose to download a compressed .tar file, a gzip .tar file, or a .zip file. Regardless of which file you download, extract jaccess.jar, copy that file to FreeTTS's lib directory (a convenient location), and add jaccess.jar to your CLASSPATH. On a Windows platform, you can specify set classpath=%classpath%;c:\freetts-1.2.1\lib\jaccess.jar.

The jaccess.jar file associates with the com.sun.java.accessibility.util package name. Speaker interacts with three of that package's classes and interfaces:

  • The GUIInitializedListener interface must be implemented by an assistive technology's main class. That interface provides a public void guiInitialized() method that is invoked by the JVM when its GUI subsystem is ready to interface with an assistive technology.

  • The EventQueueMonitor class provides a public static boolean isGUIInitialized() method that returns true if the GUI subsystem is ready to interface with assistive technologies. If false is returned, Speaker invokes that class's public static void addGUIInitializedListener(GUIInitializedListener l) method to register Speaker's main class as a listener. When the GUI subsystem is ready to interface, the JVM invokes the guiInitialized() method. If isGUIInitialized() returns true, or when guiInitialized() is called, Speaker registers listeners that will receive events as the user interacts with a GUI.

  • The AWTEventMonitor class provides a variety of static methods for adding and removing various listeners. For example, Speaker invokes public static void addWindowListener(WindowListener l) to register a window listener (to deallocate the synthesizer's resources when the user closes the GUI being monitored), and invokes public static void addMouseListener(MouseListener l) to register a mouse listener (to obtain and speak a component's name). If you plan to customize Speaker with Swing-specific listeners, you can work with the SwingEventMonitor class, which subclasses AWTEventMonitor.

Now that you have an idea of what GUIInitializedListener, EventQueueMonitor, and AWTEventMonitor accomplish, let's see how Speaker works with those types. Examine the source code below.

// Speaker.java

import java.awt.*;
import java.awt.event.*;

import javax.accessibility.*;
import javax.speech.*;
import javax.speech.synthesis.*;
import javax.swing.*;

import com.sun.java.accessibility.util.*;

public class Speaker implements
                            GUIInitializedListener
{
   Synthesizer synth;

   public Speaker ()
   {
      try
      {
          // Create a synthesizer for the default
          // locale.

          synth = Central.createSynthesizer
                                           (null);

          // Allocate synthesizer resources.

          synth.allocate ();

          // Place synthesizer in the RESUMED
          // state so that it can produce speech
          // as it receives text.

          synth.resume ();
      }
      catch (Exception e)
      {                
          JOptionPane.showMessageDialog (null,
                                 e.getMessage ());
          return;
      }

      // If JVM GUI subsystem is ready to
      // interface with assistive technology,
      // invoke speakerInit(). Otherwise, register
      // current Speaker object as a listener,
      // whose guiInitialized() method will be
      // invoked when the GUI subsystem is ready.

      if (EventQueueMonitor.isGUIInitialized ())
          speakerInit ();
      else
          EventQueueMonitor.
                 addGUIInitializedListener (this);
   }

   public void guiInitialized ()
   {
      speakerInit ();
   }

   void speakerInit ()
   {
      // Register a window listener that
      // deallocates synthesizer resources when a
      // JFrame window with an EXIT_ON_CLOSE
      // operation is closing.

      WindowListener wl;
      wl = new WindowAdapter ()
           {
               public void windowClosing
                                   (WindowEvent e)
               {
                  Window w = e.getWindow ();
                  if (!(w instanceof JFrame))
                      return;

                  JFrame f = (JFrame) w;
                  if (f.getDefaultCloseOperation
                       () == JFrame.EXIT_ON_CLOSE)
                      try
                      {
                          // Deallocate
                          // synthesizer
                          // resources.

                          synth.deallocate ();
                      }
                      catch (Exception e2)
                      {
                      }
               }
           };
      AWTEventMonitor.addWindowListener (wl);

      // Register a mouse listener that speaks the
      // name of an accessible component when the
      // mouse pointer enters that component.

      MouseListener ml;
      ml = new MouseAdapter ()
           {
               public void mouseEntered
                                    (MouseEvent e)
               {
                   Component c = (Component)
                                   e.getSource ();

                   Accessible a;
                   a = SwingUtilities.
                               getAccessibleAt (c,
                                   e.getPoint ());
                   if (a == null)
                       return;

                   AccessibleContext ac = a.
                          getAccessibleContext ();
                   if (ac == null)
                       return;

                   String text = ac.
                             getAccessibleName ();
                   if (text == null)
                       return;

                   // Speak the component's name.

                   synth.speakPlainText (text,
                                         null);

                   try
                   {
                       // Wait for synthesizer to
                       // finish speaking.

                       synth.waitEngineState
                        (Synthesizer.QUEUE_EMPTY);
                   }
                   catch (InterruptedException e2)
                   {
                   }
               }
           };
      AWTEventMonitor.addMouseListener (ml);
   }
}

Speaker's source code should be fairly easy to understand. However, you might be wondering why I've specified a = SwingUtilities.getAccessibleAt (c, e.getPoint ()); instead of working with EventQueueMonitor's public static Point getCurrentMousePosition() and public static Accessible getAccessibleAt(Point pt) methods. I could not get those methods to work properly in the context of the mouseEntered() method: I discovered that the mouse pointer had to be moved off of a component before Speaker would speak that component's name. This behavior was unacceptable to me.

If you set up your environment as specified earlier, you should be able to compile Speaker.java. Compilation results in three class files: Speaker.class, Speaker$1.class, and Speaker$2.class. I found it convenient to archive these class files into a Speaker.jar file (with the command jar cf Speaker.jar *.class), copy Speaker.jar to c:\freetts-1.2.1\lib, and add Speaker.jar to the CLASSPATH. On my Windows platform, I specified set classpath=%classpath%;c:\freetts-1.2.1\lib\Speaker.jar.

One last item has to be taken care of before you can use Speaker. You have to tell the JVM to automatically load Speaker.jar at startup. Accomplish that task by placing the following line in your accessibility.properties file (in your JAVA_HOME\jre\lib or JAVA_HOME/jre/lib directory): assistive_technologies=Speaker. If that file does not exist, create an accessibility.properties file with that single line.

To put Speaker through its paces, start TextSpeaker and move the mouse pointer around that application's GUI. You should hear your computer speak as you move the pointer over the label and either button. However, it does not speak when the mouse pointer moves over the text component--I leave figuring out why that is the case as an exercise.

Tip: To use Speaker with an applet that you start via appletviewer, specify both the CLASSPATH environment variable value and a policy file that grants appropriate permissions. For example: appletviewer -J-classpath -J%CLASSPATH% -J-Djava.security.policy=my.policy applet.html--my.policy contains grant { permission java.security.AllPermission; };.

Conclusion

Blind users need help to interact with GUI environments. My former experience in helping a blind gentleman use his Windows-based computer, by getting that computer to speak, inspired me to create an equivalent assistive technology for Java. My Speaker assistive technology vocalizes GUI component names as the user moves the mouse pointer over those components, which helps blind users navigate their way around Java GUIs.

To speak Java GUIs, Speaker depends on Sun's Java Speech API, via the FreeTTS implementation of that API. I've purposely minimized Speaker's interaction with Java Speech/FreeTTS to keep this technology simple. Therefore, you might want to extend Speaker; one idea is to let Speaker's constructor access a property file and choose a voice based on that file's contents (you'll need to add voices to FreeTTS--learn how in its documentation).

Resources

width="1" height="1" border="0" alt=" " />
Jeff Friesen is a freelance software developer and educator specializing in Java technology. Check out his site at javajeff.mb.ca.
Related Topics >> Accessibility   |   Swing   |