[Yanel-dev] API for accessing large data sets
Michael Wechner
michael.wechner at wyona.com
Tue Mar 29 22:35:28 CEST 2011
Hi Balz
Thanks very much for your feedback. Please find some comments inline below
On 3/29/11 11:06 AM, Balz Schreier wrote:
> Hi Michael,
>
> my observation with large data sets (zwischengas.com
> <http://zwischengas.com>) is the following:
> - usually you only want to retrieve a subset of all matching documents
> - therefore it is ok to include the "max documents" parameter into the
> API too
> - additionally you could also provide a method search(from, max),
> which internally uses the lucene method search(from+max) and then just
> skips the document before "from".
>
> I don't know where you want to provide this method
for example for retrieving revisions of a node. We have some real world
situations with more than 30K revisions per node.
> but be careful with creating YarepNodes for the results, I would deal
> with just the Yarep Paths as long as you can, otherwise performance
> goes done dramatically.
I think it depends on the implementation. For example some
implementations read the properties during node init, which I consider
bad and I think we should change.
Thanks
Michael
>
> Cheers
> Balz
>
> On Tue, Mar 29, 2011 at 10:50 AM, Michael Wechner
> <michael.wechner at wyona.com <mailto:michael.wechner at wyona.com>> wrote:
>
> Hi
>
> I am currently thinking about introducing a new VersionableV3
> interface to access large sets of revisions
> (e.g. 50K) and make it scale better. Also it would be nice to
> search revisions for particular tags.
> Hence I was looking at the search API of lucene, because it has
> similar scalability issues:
>
> http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/Searcher.html#search%28org.apache.lucene.search.Query,%20org.apache.lucene.search.Filter,%20int%29
>
> publicTopDocs <http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/TopDocs.html> *search*(Query <http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/Query.html> query,
> Filter <http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/Filter.html> filter,
> int n)
> throwsIOException <http://java.sun.com/j2se/1.5/docs/api/java/io/IOException.html>
>
>
> http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/TopDocs.html
>
>
> |ScoreDoc
> <http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/ScoreDoc.html>[]|
> |*scoreDocs
> <http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/TopDocs.html#scoreDocs>*|
>
> The top hits for the query. | int| |*totalHits
> <http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/TopDocs.html#totalHits>*|
>
> The total number of hits for the query.
>
> but also see for example
>
> http://docs.codehaus.org/display/GEOTOOLS/Random+Data+Access
>
> I am currently playing with the various APIs, but any suggestions
> are very welcome.
>
> Cheers
>
> Michael
>
> --
> Yanel-development mailing list Yanel-development at wyona.com
> <mailto:Yanel-development at wyona.com>
> http://lists.wyona.org/cgi-bin/mailman/listinfo/yanel-development
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wyona.org/pipermail/yanel-development/attachments/20110329/e9954401/attachment.html>
More information about the Yanel-development
mailing list