[Yanel-dev] New XMLDB repository

Michael Wechner michael.wechner at wyona.com
Mon Feb 12 21:02:41 CET 2007


Josias Thöny wrote:

> Andreas Wuest wrote:
>
>> Hi Josias
>>
>> On 12.2.2007 15:55 Uhr, Josias Thöny wrote:
>>
>>> Andreas Wuest wrote:
>>>
>>>> Hi
>>>>
>>>> I've finished and checked in a basic implementation of the XMLDB 
>>>> repository, based on the XML:DB API.
>>>
>>>
>>> Cool :)
>>>
>>>>
>>>> Unfortunately, Yarep is documented really bad, so I couldn't find 
>>>> out what the exact contracts for the various methods are. For 
>>>> example, should getSize() or delete() throw a repository exception 
>>>> if the resource does not exist, or return 0 or false, etc.
>>>>
>>>> I've extensively documented the XMLDBStorage class, so you can see 
>>>> what it does on the first glance.
>>>>
>>>> The Reader/Writer and InputStream/OutputStream are implemented 
>>>> using aggregation. Don't know if it would be more desireable to 
>>>> e.g. subclass StringReader and override the close() method instead.
>>>>
>>>> Also, there are some other API related problems: Yanel always seems 
>>>> to call getInputStream to directly read from the repo. Now, this is 
>>>> all fine and dandy on a file based repo, but the XML database 
>>>> stores XML documents as character data, and returns them as 
>>>> strings. With other words, in order for the OutputStream to work, 
>>>> we have to convert the string to bytes, which, of course, involves 
>>>> character encoding. I just use UTF-8 to de- and encode, but of you 
>>>> really want to read an XML resource, the getReader method should be 
>>>> used.
>>>>
>>>> The same goes for writing, but with some additional complication. 
>>>> You should NEVER use getOutputStream to write an XML document. 
>>>> getOutputStream creates a binary resource in the database. Use 
>>>> getWriter instead to write character data, which creates an XML 
>>>> resource.
>>>
>>>
>>> Well, I didn't realize that some repository implementations might 
>>> handle binary data differently than text data. But I guess it makes 
>>> sense.
>>> So probably we should change yanel to use the reader/writer methods 
>>> for text data, and add reader/writer methods to the node-based api, 
>>> too.
>>> Would that help?
>>
>>
>> That would help for sure. Although I don't know how Yanel can find 
>> out which method to call for reading, because it does not know in the 
>> first place if a requested resource is character-based or binary.
>
>
> Yeah, I had some doubts about that also.
> Maybe we could simply say that a FileResource is always treated as 
> binary, and a XMLResource is always text. Would that be too simple? 


I am not sure, because XML can also contain binary data (using CDATA). 
This is also because one should use application/xml and not text/xml

>
>
>>
>> One possible way would be for the repository implementation to guide 
>> Yanel, because the repository should generally know what type of 
>> resource is being requested (at least, XMLDB knows, we may see other 
>> back-ends in the future which do not even know this one though). If 
>> Yanel uses getInputStream(), and the repo decides that this is not a 
>> binary resource, it could throw an exception, and Yanel would then 
>> try getReader(), or vice versa. We could also introduce a flag on 
>> those two methods, e.g. forceRead, which would prevent the repo impl 
>> from throwing if the resource to be read is of the wrong type, but 
>> read anyway.
>>
>
> If we say that the repo "knows" about the type of a resource, it could 
> provide a method isBinary() or something like that, so yanel could 
> know which method to call (getReader/getInputStream). I normally 
> prefer to "ask first" instead of handling an error.
> When someone calls a reading method which does not match the type, a 
> best-effort conversion could be applied.
> I'm not entirely sure though how the repo would know the type 
> (text/binary). Should it assume that it's binary when it was written 
> by getOutputStream, and text otherwise?


from my guts I think Yanel should not have to care what kind of data 
it's piping through, but maybe my guts tell my something wrong ;-)

How is JCR handling this?

Cheers

Michi

> WDYT?
>
> josias
>
>> For writing, there should basically be no problem, since Yanel can 
>> decide based on the MIME-type if it is going to write a 
>> character-based or a binary resource.
>>
>
>
> _______________________________________________
> Yanel-development mailing list
> Yanel-development at wyona.com
> http://wyona.com/cgi-bin/mailman/listinfo/yanel-development
>


-- 
Michael Wechner
Wyona      -   Open Source Content Management   -    Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
michael.wechner at wyona.com                        michi at apache.org
+41 44 272 91 61




More information about the Yanel-development mailing list