[Yanel-dev] Enhancing Yarep Indexer/Searcher interface

Thu Aug 6 14:02:36 CEST 2009

Guillaume Déflache schrieb:
> Hi!
>
> Michael Wechner schrieb:
>> Hi
>>
>> At the moment one has the following searcher interface
>>
>> Node[] Repository.getSearcher().search(QUERY);
>
> Does QUERY includes pagination ATM?

no
>
>
>> and with the index the PATH/URL is saved and the FULLTEXT is indexed.
>>
>> This is all nice and simple, BUT  .... ;-)
>>
>> In most common search engines one receives the following search 
>> result structure:
>>
>> Title of Document
>> Excerpt of Document
>> Path/URL of Document
>> Mime-Type of Document
>> Last Modified of Document
>
> What would be the types there? String for most probably, maybe a 
> java.lang.Long timestamp for LMD?

yes
>
>
>> which means if we also want to provide this, then we need to reparse 
>> each Node which has been found, which is not
>> so nice (performance wise and also code wise).
>>
>> Hence I would suggest that we enhance the Indexer/Searcher interface 
>> by adding the fields above and introducing methods like
>>
>> Result[] Repository.getSearcher().search(QUERY)
>
> If QUERY does not include paging there, we'd better return a 
> java.lang.Iterable<Result> to make the API easier to use (e.g. with a 
> Java 5 for loop) but mostly to be able to load the results lazily if 
> needed. We would also need a startIndex and maxCount, even if we do 
> not implement them at once.
>
>> whereas Result has methods like
>>
>> Result.getTitle()
>> Result.getExcerpt()
>> etc.
>
> We could also use:
> String Result.getMetadata(String aDublinCoreOrWhateverRDFpropertyURI)
> ...or maybe both: the hard-coded ones because API users will most 
> probably need them, and the generic one for extensibility?

makes sense
>
>
>> WDYT?
>>
>> Thanks
>>
>> Michi
>
> HTH,

thanks very much for the feedback

Michael
>
>    Guillaume