[Yanel-dev] New XMLDB repository

Andreas Wuest awuest at student.ethz.ch
Tue Feb 13 01:34:00 CET 2007


Hi

On 12.2.2007 22:52 Uhr, Michael Wechner wrote:

> Andreas Wuest wrote:
> 
>> Hi
>>
>> On 12.2.2007 21:02 Uhr, Michael Wechner wrote:
>>
>>> I am not sure, because XML can also contain binary data (using 
>>> CDATA). This is also because one should use application/xml and not 
>>> text/xml
>>
>>
>> Just for the record: CDATA cannot contain binary data.
> 
> 
> what I mean with binary data are images, etc. sorry for maybe mixing not 
> correct technical language here, but one often has reserved XML 
> characters within such data and hence it makes sense to embed it within 
> CDATA.

Yes, a Base64 encoding may produce a "<" for example, which would only 
go unpunished if inside a CDATA section. Nevertheless, the actual byte 
values of the "binary" data must still match the byte values (or 
multi-byte values actually, with regard to UTF-8, UTF-16 etc.) allowed 
by the employed character set.

So, technically, such a document is still a text document, and should 
also be treated as such (because when persisting it to a file and then 
re-reading it, the charset has to be taken into account, otherwise not 
only the text but also your Base64 encoded image will look differently 
after decoding!). Furthermore, there is generally a reason why you would 
envelope an image inside an XML document. One of them is to provide 
meta-data alongside the image, which you'd like to query. And as far as 
XML databases are concerned, for querying you'd need a text document.

Anyway, the point of all this is that if you have an XML document, you 
can and should treat it as character data. It simply can't be binary data.

So when Josias proposed to simply assume every XML as text, he was 
indeed correct.

-- 
Kind regards,
Andi



More information about the Yanel-development mailing list