[Yanel-commits] rev 26848 -
public/yanel/trunk/src/realms/yanel-website/content
michi at wyona.com
michi at wyona.com
Tue Aug 21 17:28:36 CEST 2007
Author: michi
Date: 2007-08-21 17:28:35 +0200 (Tue, 21 Aug 2007)
New Revision: 26848
Modified:
public/yanel/trunk/src/realms/yanel-website/content/db7296de-85c2-4723-96e0-01d94c464467
Log:
more stuff added re Nutch
Modified: public/yanel/trunk/src/realms/yanel-website/content/db7296de-85c2-4723-96e0-01d94c464467
===================================================================
--- public/yanel/trunk/src/realms/yanel-website/content/db7296de-85c2-4723-96e0-01d94c464467 2007-08-21 15:28:05 UTC (rev 26847)
+++ public/yanel/trunk/src/realms/yanel-website/content/db7296de-85c2-4723-96e0-01d94c464467 2007-08-21 15:28:35 UTC (rev 26848)
@@ -8,7 +8,7 @@
<body>
<h1>Nutch Resource</h1>
<h2>Crawling</h2>
- <h3>Configuration of Crawler</h3>
+ <h3>Configuration of Nutch Crawler</h3>
<p>See also <a href="http://lucene.apache.org/nutch/tutorial8.html">http://lucene.apache.org/nutch/tutorial8.html</a> for more information.</p>
<ul>
<li>URLs to start with:<ul>
@@ -16,16 +16,16 @@
<li>e.g. nutch-0.8.x/url/yulup-website.txt (http://www.yulup.org/)</li>
</ul>
</li>
- <li>The range of crawling resp. URLs to be parsed and followed (IMPORTANT: Both files below need to have an "accept hosts" entry):<ul>
+ <li>The range of crawling resp. URLs to be parsed and followed (IMPORTANT: Both files below need to have an "accept hosts" entry):<ul>
<li>nutch-0.8.x/conf/crawl-urlfilter.txt (+^http://yanel.wyona.org/)</li>
<li>nutch-0.8.x/conf/regex-urlfilter.txt (+^http://yanel.wyona.org/)</li>
</ul>
</li>
<li>Depth of Crawling: crawl.sh (e.g. DEPTH=5)</li>
</ul>
- <h3>Running Crawler</h3>
+ <h3>Running Nutch Crawler</h3>
<ul>
<li>sh crawl.sh</li>
- </ul>
+ </ul><h2>Searching</h2><h3>Configuration of Yanel Nutch Resource</h3>...
</body>
-</html>
+</html>
\ No newline at end of file
More information about the Yanel-commits
mailing list