[Yanel-dev] Problem with importing www.apache.org
Josias Thöny
josias.thoeny at wyona.com
Fri Apr 13 23:50:57 CEST 2007
Michael Wechner wrote:
> Hi
>
> I am trying to import www.apache.org, but receive the following errors:
>
> 649934 2007-04-12 23:42:59,499 [Thread-62] ERROR
> org.wyona.yanel.impl.resources.ImportSiteThread.run():83 -
> java.lang.RuntimeException: Could not save page:
> url=http://www.apache.org/: java.io.FileNotFoundException:
> /home/michi/src/wyona-svn/public/yanel/trunk/local/apache-tomcat-5.5.20/temp/import_1176414178461
> (Is a directory)
> java.lang.RuntimeException: Could not save page:
> url=http://www.apache.org/: java.io.FileNotFoundException:
> /home/michi/src/wyona-svn/public/yanel/trunk/local/apache-tomcat-5.5.20/temp/import_1176414178461
> (Is a directory)
> at
> org.apache.lenya.search.crawler.DumpingCrawler.visit(DumpingCrawler.java:148)
>
> at websphinx.Crawler.process(Crawler.java:1209)
> at websphinx.Crawler.run(Crawler.java:342)
> at
> org.wyona.yanel.impl.resources.ImportSiteThread.run(ImportSiteThread.java:65)
>
> Caused by: java.io.FileNotFoundException:
> /home/michi/src/wyona-svn/public/yanel/trunk/local/apache-tomcat-5.5.20/temp/import_1176414178461
> (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:179)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:102)
> at websphinx.Access.writeFile(Access.java:142)
> at websphinx.HTMLTransformer.openFile(HTMLTransformer.java:137)
> at websphinx.HTMLTransformer.<init>(HTMLTransformer.java:116)
> at websphinx.LinkTransformer.<init>(LinkTransformer.java:82)
> at
> websphinx.RewritableLinkTransformer.<init>(RewritableLinkTransformer.java:59)
>
> at websphinx.MirrorTransformer.<init>(Mirror.java:354)
> at websphinx.Mirror.writePage(Mirror.java:147)
> at
> org.apache.lenya.search.crawler.DumpingCrawler.visit(DumpingCrawler.java:129)
>
> ... 3 more
>
> And it's funny that the status screen tells me that 21 pages have been
> downloaded, but within
>
> ls local/apache-tomcat-5.5.20/temp/import_1176414178461/
> index.html style
>
> one can only find two files.
>
> Any idea what might be wrong?
The problem seems to be that the page
http://www.apache.org
contains a link to
http://www.apache.org/
The crawler thinks that those are two different pages and gets confused
somehow.
Do you want to open a bug?
Josias
>
> Thanks
>
> Michi
>
More information about the Yanel-development
mailing list