[Yanel-commits] rev 26462 - public/yanel/trunk/src/realms/yanel-website/content

michi at wyona.com michi at wyona.com
Thu Aug 2 10:49:40 CEST 2007


Author: michi
Date: 2007-08-02 10:49:39 +0200 (Thu, 02 Aug 2007)
New Revision: 26462

Modified:
   public/yanel/trunk/src/realms/yanel-website/content/db7296de-85c2-4723-96e0-01d94c464467
Log:
crawler docu updated

Modified: public/yanel/trunk/src/realms/yanel-website/content/db7296de-85c2-4723-96e0-01d94c464467
===================================================================
--- public/yanel/trunk/src/realms/yanel-website/content/db7296de-85c2-4723-96e0-01d94c464467	2007-08-02 08:26:23 UTC (rev 26461)
+++ public/yanel/trunk/src/realms/yanel-website/content/db7296de-85c2-4723-96e0-01d94c464467	2007-08-02 08:49:39 UTC (rev 26462)
@@ -1,6 +1,31 @@
-!Nutch Resource
+<?xml version="1.0"?>
 
-__Configuration__:
-* URLs to start with: nutch-0.8.x/url/...
-* URLs to be parsed and followed: nutch-0.8.x/conf/crawl-urlfilter.txt (Intranet), nutch-0.8.x/conf/regex-urlfilter.txt (Internet)
-* Depth of Crawling: crawl.sh (e.g. -depth 10)
+<html xmlns="http://www.w3.org/1999/xhtml">
+<head>
+  <title>Nutch Resource</title>
+  <link rel="neutron-introspection" type="application/neutron+xml" href="?yanel.resource.usecase=introspection"/></head>
+  
+  <body>
+    <h1>Nutch Resource</h1>
+      <h2>Crawling</h2>
+      <h3>Configuration of Crawler</h3>
+       <p>See also <a href="http://lucene.apache.org/nutch/tutorial8.html">http://lucene.apache.org/nutch/tutorial8.html</a> for more information.</p>
+       <ul>
+        <li>URLs to start with:<ul>
+	  <li>e.g. nutch-0.8.x/url/yanel-website.txt (http://yanel.wyona.org/)</li>
+	  <li>e.g. nutch-0.8.x/url/yulup-website.txt (http://www.yulup.org/)</li>
+	  </ul>
+	   </li>
+        <li>The range of crawling resp. URLs to be parsed and followed (IMPORTANT: Both files below need to have an "accept hosts" entry):<ul>
+	  <li>nutch-0.8.x/conf/crawl-urlfilter.txt (+^http://yanel.wyona.org/)</li>
+	  <li>nutch-0.8.x/conf/regex-urlfilter.txt (+^http://yanel.wyona.org/)</li>
+	  </ul>
+	  </li>
+        <li>Depth of Crawling: crawl.sh (e.g. DEPTH=5)</li>
+       </ul>
+      <h3>Running Crawler</h3>
+        <ul>
+	<li>sh crawl.sh</li>
+	</ul>
+      </body>
+</html>



More information about the Yanel-commits mailing list