[Yanel-dev] Problem with importing www.apache.org

Michael Wechner michael.wechner at wyona.com
Sun Apr 15 23:41:09 CEST 2007


Josias Thöny wrote:

> Michael Wechner wrote:
>
>> Hi
>>
>> I am trying to import www.apache.org, but receive the following errors:
>>
>> 649934 2007-04-12 23:42:59,499 [Thread-62] ERROR 
>> org.wyona.yanel.impl.resources.ImportSiteThread.run():83  - 
>> java.lang.RuntimeException: Could not save page: 
>> url=http://www.apache.org/: java.io.FileNotFoundException: 
>> /home/michi/src/wyona-svn/public/yanel/trunk/local/apache-tomcat-5.5.20/temp/import_1176414178461 
>> (Is a directory)
>> java.lang.RuntimeException: Could not save page: 
>> url=http://www.apache.org/: java.io.FileNotFoundException: 
>> /home/michi/src/wyona-svn/public/yanel/trunk/local/apache-tomcat-5.5.20/temp/import_1176414178461 
>> (Is a directory)
>>        at 
>> org.apache.lenya.search.crawler.DumpingCrawler.visit(DumpingCrawler.java:148) 
>>
>>        at websphinx.Crawler.process(Crawler.java:1209)
>>        at websphinx.Crawler.run(Crawler.java:342)
>>        at 
>> org.wyona.yanel.impl.resources.ImportSiteThread.run(ImportSiteThread.java:65) 
>>
>> Caused by: java.io.FileNotFoundException: 
>> /home/michi/src/wyona-svn/public/yanel/trunk/local/apache-tomcat-5.5.20/temp/import_1176414178461 
>> (Is a directory)
>>        at java.io.FileOutputStream.open(Native Method)
>>        at java.io.FileOutputStream.<init>(FileOutputStream.java:179)
>>        at java.io.FileOutputStream.<init>(FileOutputStream.java:102)
>>        at websphinx.Access.writeFile(Access.java:142)
>>        at websphinx.HTMLTransformer.openFile(HTMLTransformer.java:137)
>>        at websphinx.HTMLTransformer.<init>(HTMLTransformer.java:116)
>>        at websphinx.LinkTransformer.<init>(LinkTransformer.java:82)
>>        at 
>> websphinx.RewritableLinkTransformer.<init>(RewritableLinkTransformer.java:59) 
>>
>>        at websphinx.MirrorTransformer.<init>(Mirror.java:354)
>>        at websphinx.Mirror.writePage(Mirror.java:147)
>>        at 
>> org.apache.lenya.search.crawler.DumpingCrawler.visit(DumpingCrawler.java:129) 
>>
>>        ... 3 more
>>
>> And it's funny that the status screen tells me that 21 pages have 
>> been downloaded, but within
>>
>> ls local/apache-tomcat-5.5.20/temp/import_1176414178461/
>> index.html  style
>>
>> one can only find two files.
>>
>> Any idea what might be wrong?
>
>
> The problem seems to be that the page
>     http://www.apache.org
> contains a link to
>     http://www.apache.org/
> The crawler thinks that those are two different pages and gets 
> confused somehow.


so I guess we need to patch websphinx, right?

> Do you want to open a bug?


I have opened one

http://bugzilla.wyona.com/cgi-bin/bugzilla/show_bug.cgi?id=5277

Cheers

Michael

>
> Josias
>
>
>>
>> Thanks
>>
>> Michi
>>
>
>
> _______________________________________________
> Yanel-development mailing list
> Yanel-development at wyona.com
> http://wyona.com/cgi-bin/mailman/listinfo/yanel-development
>


-- 
Michael Wechner
Wyona      -   Open Source Content Management   -    Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
michael.wechner at wyona.com                        michi at apache.org
+41 44 272 91 61




More information about the Yanel-development mailing list