[Yanel-dev] Problem with importing www.apache.org
Michael Wechner
michael.wechner at wyona.com
Sun Apr 15 23:41:09 CEST 2007
Josias Thöny wrote:
> Michael Wechner wrote:
>
>> Hi
>>
>> I am trying to import www.apache.org, but receive the following errors:
>>
>> 649934 2007-04-12 23:42:59,499 [Thread-62] ERROR
>> org.wyona.yanel.impl.resources.ImportSiteThread.run():83 -
>> java.lang.RuntimeException: Could not save page:
>> url=http://www.apache.org/: java.io.FileNotFoundException:
>> /home/michi/src/wyona-svn/public/yanel/trunk/local/apache-tomcat-5.5.20/temp/import_1176414178461
>> (Is a directory)
>> java.lang.RuntimeException: Could not save page:
>> url=http://www.apache.org/: java.io.FileNotFoundException:
>> /home/michi/src/wyona-svn/public/yanel/trunk/local/apache-tomcat-5.5.20/temp/import_1176414178461
>> (Is a directory)
>> at
>> org.apache.lenya.search.crawler.DumpingCrawler.visit(DumpingCrawler.java:148)
>>
>> at websphinx.Crawler.process(Crawler.java:1209)
>> at websphinx.Crawler.run(Crawler.java:342)
>> at
>> org.wyona.yanel.impl.resources.ImportSiteThread.run(ImportSiteThread.java:65)
>>
>> Caused by: java.io.FileNotFoundException:
>> /home/michi/src/wyona-svn/public/yanel/trunk/local/apache-tomcat-5.5.20/temp/import_1176414178461
>> (Is a directory)
>> at java.io.FileOutputStream.open(Native Method)
>> at java.io.FileOutputStream.<init>(FileOutputStream.java:179)
>> at java.io.FileOutputStream.<init>(FileOutputStream.java:102)
>> at websphinx.Access.writeFile(Access.java:142)
>> at websphinx.HTMLTransformer.openFile(HTMLTransformer.java:137)
>> at websphinx.HTMLTransformer.<init>(HTMLTransformer.java:116)
>> at websphinx.LinkTransformer.<init>(LinkTransformer.java:82)
>> at
>> websphinx.RewritableLinkTransformer.<init>(RewritableLinkTransformer.java:59)
>>
>> at websphinx.MirrorTransformer.<init>(Mirror.java:354)
>> at websphinx.Mirror.writePage(Mirror.java:147)
>> at
>> org.apache.lenya.search.crawler.DumpingCrawler.visit(DumpingCrawler.java:129)
>>
>> ... 3 more
>>
>> And it's funny that the status screen tells me that 21 pages have
>> been downloaded, but within
>>
>> ls local/apache-tomcat-5.5.20/temp/import_1176414178461/
>> index.html style
>>
>> one can only find two files.
>>
>> Any idea what might be wrong?
>
>
> The problem seems to be that the page
> http://www.apache.org
> contains a link to
> http://www.apache.org/
> The crawler thinks that those are two different pages and gets
> confused somehow.
so I guess we need to patch websphinx, right?
> Do you want to open a bug?
I have opened one
http://bugzilla.wyona.com/cgi-bin/bugzilla/show_bug.cgi?id=5277
Cheers
Michael
>
> Josias
>
>
>>
>> Thanks
>>
>> Michi
>>
>
>
> _______________________________________________
> Yanel-development mailing list
> Yanel-development at wyona.com
> http://wyona.com/cgi-bin/mailman/listinfo/yanel-development
>
--
Michael Wechner
Wyona - Open Source Content Management - Apache Lenya
http://www.wyona.com http://lenya.apache.org
michael.wechner at wyona.com michi at apache.org
+41 44 272 91 61
More information about the Yanel-development
mailing list