[Yanel-dev] API for accessing large data sets

Balz Schreier balz.schreier at gmail.com
Tue Mar 29 11:06:31 CEST 2011


Hi Michael,

my observation with large data sets (zwischengas.com) is the following:
- usually you only want to retrieve a subset of all matching documents
- therefore it is ok to include the "max documents" parameter into the API
too
- additionally you could also provide a method search(from, max), which
internally uses the lucene method search(from+max) and then just skips the
document before "from".

I don't know where you want to provide this method but be careful with
creating YarepNodes for the results, I would deal with just the Yarep Paths
as long as you can, otherwise performance goes done dramatically.

Cheers
Balz

On Tue, Mar 29, 2011 at 10:50 AM, Michael Wechner <michael.wechner at wyona.com
> wrote:

>  Hi
>
> I am currently thinking about introducing a new VersionableV3 interface to
> access large sets of revisions
> (e.g. 50K) and make it scale better. Also it would be nice to search
> revisions for particular tags.
> Hence I was looking at the search API of lucene, because it has similar
> scalability issues:
>
>
> http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/Searcher.html#search%28org.apache.lucene.search.Query,%20org.apache.lucene.search.Filter,%20int%29
>
> public TopDocs <http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/TopDocs.html> *search*(Query <http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/Query.html> query,
>                       Filter <http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/Filter.html> filter,
>                       int n)
>                throws IOException <http://java.sun.com/j2se/1.5/docs/api/java/io/IOException.html>
>
> http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/TopDocs.html
>
>
>   ScoreDoc<http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/ScoreDoc.html>
> [] *scoreDocs<http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/TopDocs.html#scoreDocs>
> *
>           The top hits for the query.  int *totalHits<http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/TopDocs.html#totalHits>
> *
>           The total number of hits for the query.
>
> but also see for example
>
> http://docs.codehaus.org/display/GEOTOOLS/Random+Data+Access
>
> I am currently playing with the various APIs, but any suggestions are very
> welcome.
>
> Cheers
>
> Michael
>
> --
> Yanel-development mailing list Yanel-development at wyona.com
> http://lists.wyona.org/cgi-bin/mailman/listinfo/yanel-development
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wyona.org/pipermail/yanel-development/attachments/20110329/958e8b9c/attachment.html>


More information about the Yanel-development mailing list