The Source for Java Technology Collaboration
User: Password:



Start New Message Delete Post a Reply

Article: 
 Introduction to Nutch, Part 2: Searching
Subject:  Several questions...
Date:  2006-06-02 09:02:52
From:  ealex


Hello,

I have several questions to see how I will use Nutch solution:
1) Each time, we modify .conf files of Nutch, do we have to do catalina/start for Tomcat, or can we modify Nutch code to avoid such restart?
2) It seems that the Nutch crawler/searcher is dedicated to one UserAgent. Imagine that we want to crawl with several user agent. With the latest version it seems not to be possible
instead of changing a .conf file each time?
3) We want to have search by User Agent. It seems that there is no way to do that?
4) When getting the search result of the crawl (that means "animal" found in page "http://toto..."),
can we have the information of the first url that contains link http://toto?
5) In file crawl-urlfilter.txt, can we set other thing that a domain name: a complete url for example.

Thanks by advance.

 Feed java.net RSS Feeds