application/xhtml+xml

October 10, 2002

What began as a simple comment has turned into an adventure.

It started the other day with a thread on opera.wishlist about a validation mode in which the browser would stop and inform the user when there was a problem with their webpage.

The other day I was reading diveintomark (as I do most days). On this particular day, Mark was talking about the tilde. I can distinctly remember thinking to myself: a) Mark had a little too much time on his hands today OR Mark had very little time on his hands today and instead of doing what he was supposed to do, he ended up chasing a rabbit, b) only on the web could you find a brief history of the tilde along with your morning coffee (and people wonder why I don’t get the local paper… why would I, since I can get the highlights online, and I have a much more interested morning read for my morning routine?) and c) I remember thinking Dang… Mark just managed to make the history of the tilde an interesting read… I could learn to hate him for that. (Jealousy is an ugly thing, but there it is anyway.)

Today, however, I am chasing my own rabbit, blissfully ignorant of the late hour of the day and the fact that I should be sleeping (which I’m not anyway, so why stare at the TV?).

So, it got really interesting when Rijk van Geijtenbeek of Opera Software suggested that I use xhtml+xml (note that he mistakenly said “text/xhtml+xml” but meant “application/xhtml+xml”)

Why would he suggest this? Well because of the limitations of using text/html for XHTML documents, which are explained at the W3C’s note about XHTML Media Types

XHTML documents served as ‘text/html’ will not be processed as XML, e.g. well-formedness errors may not be detected by user agents. Also be aware that HTML rules will be applied for DOM and style sheets.

Clearly I have been spending too much time at W3C, because these publications are starting to make more and more sense to me as I read them, and deep down in a place that I’m a little afraid to talk about, I’m starting to like them.

Of course I would point out here that all the specification says is that may not detect well-formedness errors. That does not seem to be prescriptive as much as descriptive — that is to say, they are describing how it is, not necessary how it should be. User Agents (known most often to us as browsers) could detect well-formedness errors. Which would mean, getting back to the original suggestion, the idea of having a ‘validation mode’ would be an appropriate option. With regard to Opera, there is already an option to display Javascript error messages, and the existence of such errors might cause there to be problems on the page. Why not follow that example and have an option to display XHTML and XML well-formedness errors? It seems likely to me that eventually this would be a good option to have when dealing with XML.

So I had to check it out. Sure enough, I made a poorly formed XHTML 1.1 document (specifically, I omitted an opening <p> tag).

Opera printed out as much as it could and then gave me the line number and column of the error.

Mozilla did mostly the same, except that it showed me the line in question, and did not show any of the first part of the document which was well formed.

Internet Explorer? Well, would you believe it prompted me to download the page? Yup, that’s right, the world’s largest company can’t even get a browser together that understands application/xhtml+xml.

So I decided to find out what my other options where.

For (geeky) reasons of my own, I am planning a section of this website which we be done to strict standards, meaning 100% compliant markup, including CSS and DOM. I planned (even before the thread about ‘validation mode’) to use XHTML 1.1.

So when I saw the Media types summary for serving XHTML documents, I was surprised by what it reminded me:

XHTML 1.1 documents should not be served with the ‘text/html’ MIME type.

Whoa.

Here I was planning a site to be the pinnacle of cutting edge standards compliance, and I nearly used the wrong MIME type for the whole bloody thing.

Now some would say that the W3C left wiggle room by using “SHOULD NOT” as opposed to “MUST NOT” which would prohibit it altogether…. but come on, we’re talking about moving towards the levels of advanced über-geekdom.

The preferred media type is application/xhtml+xml. Apache predefines that MIME type to correspond to documents that end with either .xhtml or .xht. So what happens if I try to use that?

In Opera: displays as web page In Mozilla: displays as web page In IE6: prompts to Open, Save, or Cancel

Hrm. The next choice is application/xml. Apache has no predefined file extension, so I added my own (.axml) using .htaccess and tried again:

In Opera: displays as web page In Mozilla: displays as web page In IE6: displays source to document (no, I don’t know why either)

Well, what about the last choice: text/xml (Apache: .xml or .xsl)? In Opera: displays as web page In Mozilla: displays as web page In IE6: same as application/xml, shows source

So my next thought was to try Apache’s Content Negotiation. This would allow different formats to be served based on what the browser tells you it is capable of handling. This would serve different content to different browsers, based on the expressed information coming to you from the browser itself.

IE6 correctly indicates that it can only handle text/html, and Mozilla correctly indicates that it can handle text/xml,application/xml,application/xhtml+xml,text/html.

Unfortunately, Opera only advertizes that it can handle text/html (…time passes as Tim shuffles off to file bug report…).

Update: Opera 7.2 fixes this bug

So I’m left with another WDD™ — Web Designer Dilemma — Do I follow the standards and make my life simpler, or do I worry about browser-specific hackery that will take more time and energy?

Well, fortunately for me, the site I was planning was primarily to be a sandbox to learn in, so I don’t need to worry about IE6 if I don’t want to… and guess what? I don’t want to.

So the site will be XHTML 1.1 and will be sent, according to the standards, as application/xhtml+xml.

Which means that it will work in Opera and Mozilla, and not IE. Sorry, Charlie.

  • Jor

    Very nice find! This is extremely useful, as I am learning XHTML.


    As for why IE displays the content as source: IE only pays respect to the file extension, and it ignores the MIME Type. If you serve foo.html as text/xml or application/xml, MSIE usally will display it as a normal webpage.

  • Can you explain what the specific disadvantage to using text/html is? The big advantage is obviously that it works properly in IE. What do I gain by using application/xhtml+xml?


    (what can I say, I don't understand basically everything on the W3C site)

  • TjL

    The disadvantage is that text/html will not show any parsing errors, so if you have an opening tag without a closing one, the browser doesn't say anything.


    If you use application/xhtml+xml then the browser will stop and report errors.


    It's sort of like instant validation of a sort.


    Checkout the link for the poorly formed XHTML 1.1 document in Opera or Mozilla and see what it says.

  • Here's how I do it in one line straight in PHP. Can be done in other languages, too, of course:


    $contentType = empty($SERVER['HTTPACCEPT']) ? 'text/html' : (pregmatch('#\b(application/xhtml\+xml|application/xml|text/xml|text/html)\b#', $SERVER['HTTP_ACCEPT'], $matches) ? $matches[1] : 'text/html');


    Read: if there is an "Accept:" header AND there is a match for either one of the things between (), set it to the match; otherwise, default it to text/html


    Then it's just a matter of passing the Content-Type header on to the client (OK, that makes for a second line):


    header("Content-Type: $contentType; charset=utf-8");


    PS: Tim, how about adding the "pre" tag to the allowed list?

  • Jor

    Just in case you are also a Proxomitron user, I have completed a filter which will attempt to fix this MSIE bug.


    More details are at http://members.outpost10f.com/~jor/prox/msie.html


    Of course the real solution is for Microsoft to finally fix their browser...


    Interestingly enough, I have just converted parts of my personal home page to compliant XHTML 1.1 (including the content-type), and I'm seeing a very strange MSIe behaviour: it appears to be trying to interpret the doctype using my stylesheet!

  • Jor

    Actually, disregard the above about a working Proxomitron filter. It is far too unreliable at the moment to work everywhere :(

  • I have detailed at the URI below how to work around broken browsers and still serve your xhtml as application/xhtml+xml for the good ones in Apache:
    http://lists.w3.org/Archives/Public/www-archive/2002Dec/0005.html


    Hope this helps

blog comments powered by Disqus

Previous post: Subway (I took this myself)

Next post: Martha Stewart, Special Edition