[Yanel-development] Yanel does not specify the charset in the Content-Type header

Andreas Wuest awuest at student.ethz.ch
Mon Dec 4 19:09:29 CET 2006


Hi

On 4.12.2006 18:32 Uhr, Andreas Wuest wrote:

> I just realised that Yanel does not set the charset in the Content-Type 
> header.
> 
> This is crucial for editing though, or even web browsing, if the 
> character set is not specified in the document itself.

On a sidenote, the HTTP/1.1 specification requires the content body to 
be encoded using ISO-8859-1 if the charset is not specified in the 
Content-Type header (if the content is a subtype of type text).

Futhermore, the W3C reccomends the following procedure for user-agents 
to detect the charset encoding used (from 
http://www.w3.org/TR/REC-html40/charset.html):

"To sum up, conforming user agents must observe the following priorities 
when determining a document's character encoding (from highest priority 
to lowest):

    1. An HTTP "charset" parameter in a "Content-Type" field.
    2. A META declaration with "http-equiv" set to "Content-Type" and a 
value set for "charset".
    3. The charset attribute set on an element that designates an 
external resource.

In addition to this list of priorities, the user agent may use 
heuristics and user settings. For example, many user agents use a 
heuristic to distinguish the various encodings used for Japanese text. 
Also, user agents typically have a user-definable, local default 
character encoding which they apply in the absence of other indicators."

This means that Yanel should produce the charset parameter. Of course, 
this requires Yanel to know which encoding was used for a certain 
document in the first place. I see no other way than to define meta-data 
containing a charset property (which of course must be checked and 
enforced by any code which allows the creation of new and the 
modification of existing documents).

Note that for now, Yulup is not conformant and simply uses UTF-8 in case 
no charset parameter is defined (Yulup does not look at META 
declarations or the "encoding" parameter of an XML Processing Instruction).

This was done because most documents served by Yanel are in fact UTF-8, 
but as described above, lack the charset parameter.

-- 
Kind regards,
Andi



More information about the Yanel-development mailing list