[Yanel-dev] Problem with importing www.apache.org

Josias Thöny josias.thoeny at wyona.com
Fri Apr 13 23:50:57 CEST 2007


Michael Wechner wrote:
> Hi
> 
> I am trying to import www.apache.org, but receive the following errors:
> 
> 649934 2007-04-12 23:42:59,499 [Thread-62] ERROR 
> org.wyona.yanel.impl.resources.ImportSiteThread.run():83  - 
> java.lang.RuntimeException: Could not save page: 
> url=http://www.apache.org/: java.io.FileNotFoundException: 
> /home/michi/src/wyona-svn/public/yanel/trunk/local/apache-tomcat-5.5.20/temp/import_1176414178461 
> (Is a directory)
> java.lang.RuntimeException: Could not save page: 
> url=http://www.apache.org/: java.io.FileNotFoundException: 
> /home/michi/src/wyona-svn/public/yanel/trunk/local/apache-tomcat-5.5.20/temp/import_1176414178461 
> (Is a directory)
>        at 
> org.apache.lenya.search.crawler.DumpingCrawler.visit(DumpingCrawler.java:148) 
> 
>        at websphinx.Crawler.process(Crawler.java:1209)
>        at websphinx.Crawler.run(Crawler.java:342)
>        at 
> org.wyona.yanel.impl.resources.ImportSiteThread.run(ImportSiteThread.java:65) 
> 
> Caused by: java.io.FileNotFoundException: 
> /home/michi/src/wyona-svn/public/yanel/trunk/local/apache-tomcat-5.5.20/temp/import_1176414178461 
> (Is a directory)
>        at java.io.FileOutputStream.open(Native Method)
>        at java.io.FileOutputStream.<init>(FileOutputStream.java:179)
>        at java.io.FileOutputStream.<init>(FileOutputStream.java:102)
>        at websphinx.Access.writeFile(Access.java:142)
>        at websphinx.HTMLTransformer.openFile(HTMLTransformer.java:137)
>        at websphinx.HTMLTransformer.<init>(HTMLTransformer.java:116)
>        at websphinx.LinkTransformer.<init>(LinkTransformer.java:82)
>        at 
> websphinx.RewritableLinkTransformer.<init>(RewritableLinkTransformer.java:59) 
> 
>        at websphinx.MirrorTransformer.<init>(Mirror.java:354)
>        at websphinx.Mirror.writePage(Mirror.java:147)
>        at 
> org.apache.lenya.search.crawler.DumpingCrawler.visit(DumpingCrawler.java:129) 
> 
>        ... 3 more
> 
> And it's funny that the status screen tells me that 21 pages have been 
> downloaded, but within
> 
> ls local/apache-tomcat-5.5.20/temp/import_1176414178461/
> index.html  style
> 
> one can only find two files.
> 
> Any idea what might be wrong?

The problem seems to be that the page
     http://www.apache.org
contains a link to
     http://www.apache.org/
The crawler thinks that those are two different pages and gets confused 
somehow.
Do you want to open a bug?

Josias


> 
> Thanks
> 
> Michi
> 




More information about the Yanel-development mailing list