ArticlesIntroduction to Nutch, Part 2: Searching
In the second part of this look at the Nutch web indexing and search engine, Tom White looks at how to perform searches on the index generated in part one's crawl, and shows how to integrate Nutch's search capabilities with your applications through direct Java calls to its API or via the OpenSearch API. Feb. 16, 2006
Introduction to Nutch, Part 1: Crawling
Do you need your own search engine, when the world already has Google? Quite possibly so: you may belong to an organization with enough of its own contents that you want to manage and run your own search engine--and know how it works. Nutch is an open source search engine written in Java. In this article, Tom White shows how it crawls pages to build its index. Jan. 10, 2006
Did You Mean: Lucene?
All modern search engines attempt to detect and correct spelling errors in users' search queries. This article shows you one way of adding a "did you mean" suggestion facility to your own search applications using the Lucene Spell Checker. Aug. 9, 2005
How To Build a ComputeFarm
Parallel computing allows some programs to run faster by dividing them up into smaller pieces and running these pieces on multiple processors. ComputeFarm is an open source Java framework for developing and running parallel programs. Apr. 21, 2005
|
Weblogs
"Disks have become tapes": What trends in disk drive technology mean for data processing. Posted by tomwhite on March 18, 2008 at 06:07 PST | Permalink
| Discuss (3)
Consistent Hashing: I've bumped into consistent hashing a couple of times lately. But what is it and why should you care? This post has a look. Posted by tomwhite on November 27, 2007 at 09:56 PST | Permalink
| Discuss (2)
Hadoop + EC2 + S3: How to run data processing applications on a rented grid. Posted by tomwhite on July 20, 2007 at 01:10 PST | Permalink
| Discuss (7)
Wanted: A Public Amazon EC2 AMI for Java EE: Ruby on Rails has got one - is there one for the Java EE stack? Posted by tomwhite on June 27, 2007 at 13:00 PST | Permalink
| Discuss (0)
jMock 2 and my Java Unit Testing Toolkit: The long-awaited final version of jMock 2 was released today. Another useful tool for my unit testing toolkit. Posted by tomwhite on April 11, 2007 at 06:16 PST | Permalink
| Discuss (2)
Testing for errant network connections: Or, "Why's my application connecting to that site?!" Posted by tomwhite on February 08, 2007 at 01:37 PST | Permalink
| Discuss (2)
Hamcrest: Hamcrest release 1.0 is now available. It allows you to write flexible assertions in your unit testing framework of choice. Posted by tomwhite on December 22, 2006 at 12:27 PST | Permalink
| Discuss (5)
Lift Off: Introducing LiFT - a Literate Functional Testing framework for making your web application tests more readable. Posted by tomwhite on October 30, 2006 at 03:50 PST | Permalink
| Discuss (3)
Are your beans thread-safe?: Why it's worth being a little paranoid about what your IoC container does in a multi-threaded environment. Posted by tomwhite on September 21, 2006 at 13:55 PST | Permalink
| Discuss (27)
Affordable Web-Scale Computing Redux: Amazon's new Elastic Compute Cloud should be a perfect fit for running Hadoop jobs. Posted by tomwhite on August 24, 2006 at 14:01 PST | Permalink
| Discuss (1)
S3Map: Implementing a distributed java.util.Map using Amazon S3. Posted by tomwhite on August 13, 2006 at 12:30 PST | Permalink
| Discuss (1)
Pluralization: Tool builders can now easily add pluralization to their applications using Inflector, a new Java library hosted on java.net. Posted by tomwhite on July 26, 2006 at 13:14 PST | Permalink
| Discuss (1)
More Literate Programming: Language-Level Anaphora: Following on from a previous post about using anaphora (a word like it that refers to something previously referred to) to make jMock tests more readable, I ask "Can we have language-level anaphora?" Posted by tomwhite on June 29, 2006 at 13:21 PST | Permalink
| Discuss (5)
More Literate Programming with jMock: Anaphora: How to reduce repetition in jMock tests using an idea from natural languages. Posted by tomwhite on May 14, 2006 at 02:30 PST | Permalink
| Discuss (5)
Literate Programming with jMock: jMock is not just about mock objects, its support for constraints make it a great example of literate programming. Posted by tomwhite on May 11, 2006 at 11:10 PST | Permalink
| Discuss (7)
A Faster Java Regex Package: dk.brics.automaton is a Java regex package whose main claim to fame is that it is significantly faster then all other Java regex libraries, including the one in the JDK. How can this be? Posted by tomwhite on March 27, 2006 at 12:02 PST | Permalink
| Discuss (4)
Affordable Web-Scale Computing: With the launch of Amazon S3 (Simple Storage Service) we are seeing a continuation of the trend for the big web companies to monetize their computing infrastructure by opening it up to developers. Posted by tomwhite on March 17, 2006 at 06:10 PST | Permalink
| Discuss (0)
Hadoop!: The MapReduce and distributed filesytem parts of Nutch (inspired by projects from Google) have been split into a new project, called Hadoop. Posted by tomwhite on February 08, 2006 at 01:49 PST | Permalink
| Discuss (4)
User-Friendly XML Config: Using the xml-stylesheet processing instruction in XML config files makes them much easier on the eye. Posted by tomwhite on October 27, 2005 at 14:49 PST | Permalink
| Discuss (2)
MapReduce: MapReduce is an amazing distributed system for massive data processing from Google Labs. There's now a Java implementation. Posted by tomwhite on September 25, 2005 at 22:36 PST | Permalink
| Discuss (3)
Good Behaviour: JavaScript isn't hot just because of AJAX. There's a flurry of activity in the web design community describing how to use modern JavaScript to enhance the behaviour of web applications. Posted by tomwhite on July 19, 2005 at 14:00 PST | Permalink
| Discuss (5)
Modularize Early: It pays to think about how you can break up a system into modular units at all levels of the hierarchy. Posted by tomwhite on May 30, 2005 at 15:57 PST | Permalink
| Discuss (2)
groovy -pi -e: Bringing Perl-style one-liners to Java. Posted by tomwhite on April 18, 2005 at 12:55 PST | Permalink
| Discuss (1)
Counting Characters: What is a character? With the introduction of supplementary characters in Java 1.5 things aren't quite as simple as they once were. Posted by tomwhite on March 22, 2005 at 14:10 PST | Permalink
| Discuss (4)
The Java Language Specification, 3rd Edition: Four years since it was last published, the JLS 3rd edition (which rolls up all the language changes from Java 1.4 and 1.5) is available for "maintenance review". Posted by tomwhite on February 28, 2005 at 14:16 PST | Permalink
| Discuss (6)
View All Blogs
|