[Home] [By Thread] [By Date] [Recent Entries]
Julian Reschke wrote: > Andrew Welch wrote: >> Hi all, >> There's a very good article here about the problem of reading feeds >> from all over the world in different encodings: >> http://www.xml.com/pub/a/2004/07/21/dive.html >> It describes how you (sometimes) have the encoding in the http >> content >> type but also the encoding in the xml prolog, and the problems of >> choosing which to use. >> It also talks of RFC 3023 which sounds like it was an attempt to sort >> it out. The article is dated July 2004 and I'm wondering if there's >> any more recent information? Is there any support in modern >> parsers - >> for example can I give the parser a URL and it takes care of the >> rest? > > I think many parsers can read from a web resource, but few use the > encoding information from the content type. The thing is that XML documents are designed to be read where there is no external content-type information (such as from a filesystem) as well as where there is. The spec says you can leave out the encoding declaration where it's not UTF8 or UTF-16 and the encoding can be determined from an external content-type, but then it has to be kept in metadata somewhere, which is just very unlikely unless you have a full blown content management system and all the processes to ensure that documents and their metadata are kept in sync. It's generally just much easier for people to put the encoding directly in the document (or entity), in which case any external content-type can and should be ignored. >> At the moment it all seems pretty complicated... especially >> considering XML was designed for the web. The problem of parsing >> feeds from all over the world must have tackled a few times over by >> now? > > There's a related HTTPbis issue -- HTTP/1.1 (RFC 2616) defines a > default encoding for text/* -- in retrospective a bad idea, at least > for XML -- see <http://trac.tools.ietf.org/wg/httpbis/trac/ticket/20>. > > Of course the simple workaround is not to use a text/* content type > (so this is one of the many problems you don't have with Atom). Indeed. -- Chris Burdess
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |

Cart



