[Yanel-dev] crawler

Michael Wechner michael.wechner at wyona.com
Tue Feb 27 14:14:08 CET 2007


Josias Thöny wrote:

> Hi,
>
> I've had a look at the crawler of lenya 1.2, and it seems that a few 
> features are missing:
>
> basic missing features:
> - download of images
> - download of css
> - download of scripts
> - link rewriting
> - limits for max level / max documents
>
> advanced missing features:
> - handling of frames / iframes
> - tidy html -> xhtml
> - extraction of body content
> - resolving of links in css (background images etc.)
>
> Or am I misunderstanding something...?


no ;-)

>
> IMHO some of these features are quite essential, because we want to 
> use the crawler in yanel to import the complete pages with images and 
> everything, not only text content.
>
> The question is now, does it make sense to implement the missing 
> features into that crawler, or should we look for an alternative?


sure, if there is an alternative :-) Is there?

Thanks

Michi

>
> Josias
>
> _______________________________________________
> Yanel-development mailing list
> Yanel-development at wyona.com
> http://wyona.com/cgi-bin/mailman/listinfo/yanel-development
>


-- 
Michael Wechner
Wyona      -   Open Source Content Management   -    Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
michael.wechner at wyona.com                        michi at apache.org
+41 44 272 91 61




More information about the Yanel-development mailing list