Skip to main content

Building a Better Brain, Part 2: A Great Thick Client

March 26, 2004

{cs.r.title}








Contents
The Base Program
Real-Time Incremental Searching
Creating an Index with Lucene
Implementing a Real-Time Search
Searching on Each Keystroke
Syncing the BrainFeeds
Creating an HTML View
Adding Style to the Layout
The Future

In the previous article, we designed a web server protocol for searching and
updating small chunks of information, called Brain Entries, that are stored in
BrainFeeds. The sample client is a JSP program that displays the entries in a
web browser. Now it would be nice to have a really good thick client that would
let us do real-time searches, local data caching, and properly render the
entries in the client itself instead of in a web browser.

In this article, we are going to build a desktop application to read and
post to BrainFeeds. Since it's a real application, we will also be able to cache
the feeds and do incremental updates to disk. This lets us do fast real-time
searching through the local cache. Also, since we won't have access to a browser
anymore, we will customize the HTMLEditorKit to render each entry as
HTML directly in our application.

The Base Program

Like most desktop applications, we will start with a simple base. Our
application has one frame with three sections (as seen in Figure 1). The top is
a search box, the middle displays the results in a list, and the bottom renders
the selected entry as HTML. The bottom two buttons are for editing the currently
selected entry and for adding a new one. You can download the source code for this application here: brainfeed.zip.

Figure 1
Figure 1. The base application

Real-Time Incremental Searching

The first feature we'll add to make our application really nice to use is real-time
incremental searching. This is a method of searching most prominently featured
in iTunes, though you can also find
it in text editors (like the venerable XEmacs),
file managers, and even the combo boxes of some applications. The two key
points of real-time incremental searching are that the search is run over again
on each keystroke, and that the user can search for substrings. This means that
a search for "ten" would match "ten," "tent," and "forgotten." These two
techniques combine to create a great user experience, but at the cost of
processor speed and disk space for an index. Fortunately, we live in the age of
cheap and powerful computers that waste most of their resources waiting in a
loop for a mouse click. Incremental searching can be slow, but for the datasets we
will be dealing with (say, less than 20MB of pure text), on modern
computers it should be nearly instantaneous.

So how do we do it? First we need a powerful database with support for
wildcard searching. Lucene is a 100% Java, open source
search engine that supports almost everything we need. It was written by the
author of Apple's VTwin search engine, and supports both full-text and wildcard
searching. Now adopted by the Apache Jakarta project, it provides top-notch
searching for any Java application. We just need to hook it up.

Creating an Index with Lucene

First we need to create an index on the client side to store all of our
entries. The index contains all of the words that we can search on, presorted to
make searching faster. It also lets us set some options about how to deal with
spaces, plural words, and other language issues.

File indexDir = new File("braindir");

// the stop analyzer breaks the text on word boundaries
// converting it all to lower case and stripping out the stop
// words (like "the", and "a")
Analyzer analyzer = new StopAnalyzer();

if(writer == null) {
    try {
        // create a new indexwriter.
        // the false means it won't overwrite the old index
        writer = new IndexWriter(indexDir, analyzer, false);
    } catch (IOException ex) {
        // create a new index writer and overwrite the old index
        writer = new IndexWriter(indexDir, analyzer, true);
    }

    writer.close();
}

The code above will create an index in the braindir directory.
The first call to new IndexWriter() will open the index without
creating it. If the call fails because the index doesn't already exist, then it
will make the call again with true for the last argument to create a new index.
The Analyzer is a set of rules about how to preprocess the data
before putting it into the database. The StopAnalyzer, one of the
default Analyzers that comes with Lucene, will convert all text to
lowercase and remove stop words. Stop words are short words like "the" and "a"
that convey little or no meaning and are not useful for searching. We can leave
them out to speed up processing and make the search more targeted.

Now that we have an index, we need to put the entries into it. Each entry has
already been parsed into a BrainEntry object (reused from the JSP
version), which has accessors for each field we will need. Lucene stores text in
Document objects, so we will create one Document for
each BrainEntry.

private static void addToIndex(File indexDir,
                               BrainEntry be,
                               boolean create)
    throws Exception {
    IndexWriter writer = getWriter();
    // create a new document for the brain entry
    Document doc = new Document();

    // pull out all of the fields and put them
    // in the document
    String id = be.getId();
    doc.add(Field.Keyword("id",id));
    doc.add(Field.Keyword("uri",be.getURI()));
    doc.add(Field.Keyword("iduri",be.getId() +
                          ":"+be.getURI()));
    doc.add(Field.Text("title", be.getTitle()));
    doc.add(Field.UnIndexed("content",
                            be.getContentString()));

    // add each keyword
    Iterator it = be.getKeywordList().iterator();
    while(it.hasNext()) {
        String keyword = (String)it.next();
        doc.add(Field.Text("keyword",keyword));
    }

    // add the document and close
    writer.addDocument(doc);
    writer.close();
}

First we add searchable fields to the Document and then we add the
content. Lucene has different types of fields depending on how they should be
included in the index. We want the id and source uri
to be keywords, and the title is text. A keyword field
is a string that will be stored and indexed but not tokenized, meaning it won't
be modified in any way. Since we need the id and uri external to the program, we
don't want them to be changed at all. A Text field is also stored and
indexed, but it will also be tokenized, which in our case will make it lowercase
and remove the stop words. All of the fields that we would like our users to
search on will be stored as text. For the content (the body text of the entry),
we don't actually want to index it for searching, since that would make queries
slower. Instead, we just want to use the database as a convenient storage
mechanism, so it gets stuffed into an UnIndexed field. Once our
Document is set up, we add it to the index.

As we saw above, we write to the index with an IndexWriter. To
search through the index, we will use, not surprisingly, an
IndexSearcher. The query itself is derived from the
QueryParser, which takes our query string, the name of the field we
want to search, and the analyzer. We will use the same Analyzer
when we originally put the entry into the index; the StopAnalyzer.
Finally, we execute the search and loop through the results.

private static List luceneSearch(String q,
                                 File indexDir)
    throws Exception {
    init();
    List list = new ArrayList();

    // create an index search
    Directory fsDir =
        FSDirectory.getDirectory(indexDir, false);
    IndexSearcher is = new IndexSearcher(fsDir);

    // create a new query based on the
    // query string passed in
    Query query =
        QueryParser.parse(q, "keyword",
                          new StopAnalyzer());

    // do the search
    Hits hits = is.search(query);

    for (int i = 0; i < hits.length(); i++) {
        Document doc = hits.doc(i);
        BrainEntry be = new BrainEntry();
        be.setId(doc.get("id"));
        be.setURI(doc.get("uri"));
        be.setTitle(doc.get("title"));
        be.setContentString(doc.get("content"));
        Field[] keywords = doc.getFields("keyword");
        for(int j=0; j<keywords.length; j++) {
            //u.p("keyword: " + keywords[j]);
            be.addKeyword(keywords[j].stringValue());
        }
        list.add(be);
    }
    return list;
}

To create an incremental search, we need to modify the query. Lucene doesn't
support complete substring search (where a search for "oo" would return "noon"),
but it does support prefix substrings, meaning a search for "jav" will return
both "java" and "javascript." This is done by adding a wildcard ("*") to each
term. Years of Googling have conditioned people to continue typing words to
narrow down a search, so we will just AND the search terms together into our
final query string.

public List search(String[] terms) throws Exception {
    // return empty array if empty query
    if(terms.length == 0) return new ArrayList();

    StringBuffer query = new StringBuffer();
    // add the first term with a wildcard (*)
    query.append(terms[0]+"*");

    // AND all of the additional terms
    // with *'s after them
    for(int i=1; i<terms.length; i++) {
        query.append(" AND " + terms[i] +"*");
    }
    return luceneSearch(query.toString(),
                        this.indexdir);
    //return bes;
}







Searching on Each Keystroke

Now that we have the ability to search, we need to hook it up to the
keystrokes on the search field. Dealing with the keystroke events on the
text field can get problematic, since we want to capture the backspace but not the
arrow keys. Really, we just want to know when the text itself has changed.
Instead of listening to keystroke events on the component, we will listen for
document events on the underlying text field document.

query.getDocument().addDocumentListener(new DocumentListener() {
        public void changedUpdate(DocumentEvent evt) {
            try {
                localSearch(query.getText());
            } catch (Exception ex) { u.p(ex); }
        }
        public void insertUpdate(DocumentEvent evt) {
            try {
                localSearch(query.getText());
            } catch (Exception ex) { u.p(ex); }
        }
        public void removeUpdate(DocumentEvent evt) {
            try {
                localSearch(query.getText());
            } catch (Exception ex) { u.p(ex); }
        }
    });

Syncing the BrainFeeds

Now our application does real-time searching through our local database, but
how do we get the data into our database to begin with? The first time we
connect to a feed, we will want to download the whole thing, but thereafter we
only want the updates. This is called syncing, and to do it we need a way of
storing not only the downloaded entries, but also the timestamp of the last
download.

First, we need a list of previous access times. Since this is external to the
dataset, we can just store it in an XML file. Below is the feeds.xml
file that the program uses to store a list of URIs and when they were last
accessed.

<?xml version="1.0" encoding="UTF-8"?>
<uris>
    <uri last-read="19/01/2004-09:47:27-EST">
        file:/C:/brain/testdata/acidtest.xml
    </uri>
    <uri last-read="19/01/2004-09:47:27-EST">
       
http://mybrain.com/feed.xml
    </uri>
</uris>

Once we have the date of the last sync (or a really old date, if we've never
synced before), we need to actually make the query. The BrainSearch
utility class implements the actual search (the HTTP GET request and parsing
into BrainEntry objects) that we will use here. First, we set the
URL to search, and then the time of the last sync. Next, we execute the search and
read the entries back. After dumping each entry into the repository, we finally
set the last modified date to the current time.

private void syncFeed(Feed feed) throws Exception {
   try {
      // init search
      BrainSearch search = new BrainSearch();

      search.setURL(feed.url);
       
      search.setLastModifiedTimestampAfter(feed.getLastRead());

      // execute the search
      search.search();

      // loop through the results
      BrainEntry[] entries = search.getEntryArray();
      for(int i=0; i<entries.length; i++) {
         brain.add(entries[i]);
      }

      brain.setLastModified(feed.url,new Date());
       
      } catch (Exception ex) {
         System.out.println(ex.toString());
   }
}

Creating an HTML View

Now that we can sync and search through our database, it would be nice to
actually see each entry once it is selected. The content of BrainFeed entries is
in strict XHTML, so we will need an HTML renderer to view them. Fortunately, we have
one: Swing's text package (javax.swing.text) can render styled text, and it includes an HTML viewer/editor. All we have to do is initialize it properly and then load in our HTML.

Swing's text package includes a series of EditorKits along
with an actual Swing component, the JEditorPane (and its subclass
the JTextPane). To create a specific type of viewer, we have to
initialize a JEditorPane with the right EditorKit. The
code below creates an editor kit with some placeholder content. Since HTML is
one of the built-in kits, the easiest way to create one is by just telling the
editor kit we want to support the "text/html" mime type. No further
configuration is required. The JEditorPane is scrolling-aware, so we can just
drop it into a scroll pane. Notice the scrolling constants. Since this is sort of
a web browser with small pages, we want the text to only scroll vertically.

JEditorPane view =
    new JEditorPane("text/html","<p>empty</p>");
JScrollPane view_scroll = new JScrollPane(view,
        JScrollPane.VERTICAL_SCROLLBAR_ALWAYS,
        JScrollPane.HORIZONTAL_SCROLLBAR_NEVER);

To load new content into the view, we take the content from the entry and
wrap it in html and body tags. Since the title is
separate from the content but would be useful to see, it's added, as well.
Finally, the text is added to the view.

BrainEntry be = (BrainEntry)results.getSelectedValue();
if(be!=null) {
    StringBuffer sb = new StringBuffer();
    view.setContentType("text/html");
    HTMLDocument d = (HTMLDocument)view.getDocument();
    d.setBase(new File(".").toURL());
    sb.append("<html>");
    sb.append("<body>");
    sb.append("<h1>"+be.getTitle()+"</h1>");
    sb.append(be.getContentString());
    sb.append("</body>");
    sb.append("</html>");
    view.setText(sb.toString());
}

Now, with an HTML pane in our program, we can turn this:

<entry id="3">
    <keyword>java</keyword>
    <keyword>awt</keyword>
    <keyword>swing</keyword>
    <title>How can I make a screen capture?</title>
    <content>

<p>Java has a method in the <i>java.awt</i> package that will
capture the screen into a buffered image</p>

<blockquote><pre>
Robot robot = new java.awt.Robot();
BufferedImage img =
  robot.createScreenCapture(
    new Rectangle(0,0,100,100)
  );
</pre></blockquote>

    </content>
</entry>

into this:

Figure 2
Figure 2. HTML rendering

Adding Style to the Layout

Well, it works, but it's not the prettiest screen we've ever seen. In fact, it
looks like Netscape circa 1995. When it was originally released, the HTML kit was
very slow and buggy, but recent versions of the JDK have improved it
considerably. It's still not up to a modern browser level, but it can handle a fair
amount of CSS Level 1. It's a bit finicky, though, and we'll have to work around
the bugs.

Below is some simple CSS that will set a nice background color and a border
around blockquoted code samples, and will colorize the header.

body {
    background-color: #f0fff0;
    font-family: Helvetica, Arial, sans-serif;
    font-size: 10pt;
}

blockquote {
    border: 1px solid #008800;
    padding: 5px;
    background-color: #b0ffb0;
}

h1 {
    border: 1px solid #008800;
    background-color: #88ff88;
    padding: 3px 3px 5px 3px;
    font-size: 120%;
    font-weight: bold;
    color: #005500;
}

Now we just need to apply the CSS to the HTML. The HTMLEditorKit can load
CSS via LINKs, so we can just add a reference to it at the top of the HTML when we stuff it into the editor. The code above is now modified with to add a head element with a stylesheet reference.

sb.append("<html>");
sb.append("<head>" +
    "<link rel=stylesheet href='src/css/style.css'>" +
    </head>");
sb.append("<body>");
sb.append("<h1>"+be.getTitle()+"</h1>");

Now our HTML renders like this:

Figure 3
Figure 3. HTML with CSS

It looks better, but our borders are missing. Maybe the
HTMLEditorKit doesn't support borders? Research on the Web doesn't
turn up much, but browsing through the Swing source code, we discover that it
does support borders, just not the border shorthand. It also
doesn't support borders with different widths for each side, or any width other
than one pixel! But still, with a quick CSS change we can get something
that looks pretty good.

We change the border line for the blockquote and h1
to this:

border-width: 1px;
border-style: solid;
border-color: #008800;

and now we have an attractive display for each Brain Entry, as seen in Figure 4.

Figure 4
Figure 4. HTML with correct CSS

The nice thing about using CSS is that the style is determined by the viewer
instead of the content author. This means you can fit the display to your own
personal preferences or repurpose it to use in another web site or application.

The Future

The application we built in this article (brainfeed.zip) could be
embedded into an IDE, searching through Javadocs and developer forums in real
time for whatever code is selected. Or it could be integrated into an email
program like Outlook's daily summary. Or we could write a chatter bot that
answers questions by doing brain searches. I am sure that others will come
up with even stranger uses for this technology.

With the BrainFeed system, we can subscribe to and search through multiple
feeds across a network. Its simple protocol allows us to create a wide variety
of clients to make targeted searches and distribute lightweight information. I
hope this will be a launching pad for others to create their own clients and
servers, and more importantly, create their own BrainFeeds to share with others.

Josh Marinacci first tried Java in 1995 at the request of his favorite TA and has never looked back.
Related Topics >> GUI   |   Programming   |   Search   |   Swing   |