[Yanel-commits] rev 26462 -
public/yanel/trunk/src/realms/yanel-website/content
michi at wyona.com
michi at wyona.com
Thu Aug 2 10:49:40 CEST 2007
Author: michi
Date: 2007-08-02 10:49:39 +0200 (Thu, 02 Aug 2007)
New Revision: 26462
Modified:
public/yanel/trunk/src/realms/yanel-website/content/db7296de-85c2-4723-96e0-01d94c464467
Log:
crawler docu updated
Modified: public/yanel/trunk/src/realms/yanel-website/content/db7296de-85c2-4723-96e0-01d94c464467
===================================================================
--- public/yanel/trunk/src/realms/yanel-website/content/db7296de-85c2-4723-96e0-01d94c464467 2007-08-02 08:26:23 UTC (rev 26461)
+++ public/yanel/trunk/src/realms/yanel-website/content/db7296de-85c2-4723-96e0-01d94c464467 2007-08-02 08:49:39 UTC (rev 26462)
@@ -1,6 +1,31 @@
-!Nutch Resource
+<?xml version="1.0"?>
-__Configuration__:
-* URLs to start with: nutch-0.8.x/url/...
-* URLs to be parsed and followed: nutch-0.8.x/conf/crawl-urlfilter.txt (Intranet), nutch-0.8.x/conf/regex-urlfilter.txt (Internet)
-* Depth of Crawling: crawl.sh (e.g. -depth 10)
+<html xmlns="http://www.w3.org/1999/xhtml">
+<head>
+ <title>Nutch Resource</title>
+ <link rel="neutron-introspection" type="application/neutron+xml" href="?yanel.resource.usecase=introspection"/></head>
+
+ <body>
+ <h1>Nutch Resource</h1>
+ <h2>Crawling</h2>
+ <h3>Configuration of Crawler</h3>
+ <p>See also <a href="http://lucene.apache.org/nutch/tutorial8.html">http://lucene.apache.org/nutch/tutorial8.html</a> for more information.</p>
+ <ul>
+ <li>URLs to start with:<ul>
+ <li>e.g. nutch-0.8.x/url/yanel-website.txt (http://yanel.wyona.org/)</li>
+ <li>e.g. nutch-0.8.x/url/yulup-website.txt (http://www.yulup.org/)</li>
+ </ul>
+ </li>
+ <li>The range of crawling resp. URLs to be parsed and followed (IMPORTANT: Both files below need to have an "accept hosts" entry):<ul>
+ <li>nutch-0.8.x/conf/crawl-urlfilter.txt (+^http://yanel.wyona.org/)</li>
+ <li>nutch-0.8.x/conf/regex-urlfilter.txt (+^http://yanel.wyona.org/)</li>
+ </ul>
+ </li>
+ <li>Depth of Crawling: crawl.sh (e.g. DEPTH=5)</li>
+ </ul>
+ <h3>Running Crawler</h3>
+ <ul>
+ <li>sh crawl.sh</li>
+ </ul>
+ </body>
+</html>
More information about the Yanel-commits
mailing list