02
June 23, 2000
Rated XHTML
by Peter-Paul Koch
Published in: HTML and XHTML, Industry | No discussion
번역: pinkhare(분홍토끼) in GTport
조만간 번역을 완료하겠습니다.
Being a web developer is a tough job. Not only do you have to steer clear of the traps and pitfalls the popular browsers think up for you on a daily basis, you also have to keep at least half an eye on all kinds of developments that may (or may not) have an impact on your job. Having hardly mastered style sheets and DHTML, new techniques clamor for attention. Which ones are important right away? Which ones can you dismiss for now?
This article gives my view on the language the W3C has developed to succeed HTML: XHTML. Agree or disagree with me, at least you’ll have something to think about and to help you decide.
First I’ll explain what XHTML is, then I’ll give the four rules of writing correct XHTML and finally I’ll add some words about why you should use XHTML.
What is XHTML, Anyway?
XHTML is HTML written according to the XML rules of well-formedness. To understand XHTML, we therefore have to understand XML. Many articles have already been written on this subject, so a short summary should be enough:
XML is a general markup language. Unlike HTML, XML allows you to make up your own tags and thus impose your own structure on a document. Do you need a tag <colour-of-hat>? Add it to your document, make sure some program knows what to do when it encounters this tag, and you’re ready.
There are a few simple rules for XML documents (see below). As long as your tags are correctly formed, XML doesn’t care what the actual tags are. So XML is a generalized markup language that you can use in any way you like.
In contrast, HTML is a much more rigidly defined markup language where your tags have to adhere to a syntax to make sure browsers understand you. Nonetheless, the open character of XML allows us to treat HTML documents as XML documents with the specific purpose of being shown by a web browser. However, the old standards of HTML are not completely XML compatible. For instance, using a </P> at the end of each paragraph is not required in HTML, it is optional. Web browsers don’t care if it’s there because they’re programmed not to, but XML parsers will be stricter and will tell you that your HTML document is not well-formed XML.
To bridge the gap between the two, XHTML was developed. In essence it is simply HTML, but the XML rules of well-formedness have been added to the normal HTML syntax. Thus web pages would become XML-conforming and web developers would become acquainted with the rules and restrictions of XML.
Rules of the Game
In practice, the following rules have been added to HTML for writing XHTML:
Make sure all your tags are lower case.
Close all your tags. In the case of tags that don’t have a closing tag, like <IMG> or <BR>, add a slash to the end of the tag: <img />, <br />.
Nest tags correctly. No more <B><P>text</B></P>, but <p><b>text</b></p>.
Put quotes around all attribute values. No more <P ALIGN=center> but <p align="center">.
The good news is that current browsers don’t have any problems with XHTML. After all, rule 1, 2 and 4 are already optional in HTML, while rule 3 is required (even though in most cases browsers ignore nesting errors). The only really new one is rule 2a. However, this rule only leads to problems when you write <br/> without the space. Now the browser sees a br/ tag that it doesn’t know, so it doesn’t do anything. Inserting a space solves this problem. If you write <br /> the browsers see a br tag with an unknown attribute /. The br is executed, the unknown attribute is ignored.
The bad news is that you have to change your coding practices. Personally I dislike rule 1. First of all I’ve never understood why XML tags can only be lower case, secondly I always make my HTML tags upper case to make them stand out from the surrounding text. All of a sudden I can’t do this any more, while I think it’s useful. Nonetheless, I don’t mind changing my coding practices, but only if there are good reasons to.
Why Use XHTML
So why use XHTML instead of good old HTML? W3C gives the following reasons:
Document developers and user agent designers are constantly discovering new ways to express their ideas through new markup. In XML, it is relatively easy to introduce new elements or additional element attributes. The XHTML family is designed to accommodate these extensions through XHTML modules and techniques for developing new XHTML-conforming modules (described in the forthcoming XHTML Modularization specification). These modules will permit the combination of existing and new feature sets when developing content and when designing new user agents.
Alternate ways of accessing the Internet are constantly being introduced. [...] The XHTML family is designed with general user agent interoperability in mind. Through a new user agent and document profiling mechanism, servers, proxies, and user agents will be able to perform best effort content transformation. Ultimately, it will be possible to develop XHTML-conforming content that is usable by any XHTML-conforming user agent.
So future, as yet unspecified, enhancements of XHTML will allow developers to use novel, as yet unwritten, modules to extend XHTML to include new, as yet undefined, things in their web pages. In addition, W3C expects new user agents to require XHTML instead of HTML in the future.
X It Off Your List
Frankly, I don’t think these two reasons are enough for us web developers to switch from HTML to XHTML.
The first reason is unimportant at the moment. Maybe the XHTML modules will dazzle our socks off, maybe they’ll never be good for anything. In any case it’ll take at least two or three years before the modules will appear on the scene. Since we don’t yet know how they will work or exactly what they will do or even if they will be worth the trouble, we cannot do anything with them or prepare for them.
The second reason is also unimportant at the moment. There are no pure XHTML-conforming user agents, no browsers that require XHTML. Besides, it’s uncertain whether they’ll ever appear. After all, if you write a browser that only works with XHTML, it will give errors when you try to view simple HTML pages. That’s not really what browser vendors want.
Suppose Ed End-User goes to his favourite web page with the newest, XHTML-requiring, Ultra Browser X7 only to see lots of incomprehensible error messages that complain about the lack of valid XHTML. Will he think "Naughty web developers, you should have used XHTML!" or will he think "Bloody browser’s buggy!" ?
So when a new browser is released, the manufacturer will include support for good old HTML because end users will (rightly) demand it. New browsers on as yet unreleased platforms may require valid XHTML (though I don’t think so, see below), but Netscape and Explorer on personal computers won’t because they have to be conservative in their choice of languages.
Staying Power
I think that many people underestimate the staying power of HTML. It’s the standard at the moment, without it you can’t make a web page. Because of that all web developers use HTML. Because of that, all future browsers that are intended to show traditional web pages must continue to support HTML as we now know it. Because of that all web developers will continue to use HTML, so WWW pages will continue to be written in HTML, so browsers will have to continue to support it, etc.
But what about new browsers? What about new sections of the Internet, like WAP? What about learning XML by way of XHTML? Read on...
Just Say No
Of course, new browsers on new platforms may require XHTML. But then they’ll run into the same problem as the old browsers on the old platforms: they won’t correctly show existing web sites with HTML pages, which means that the end users will feel cheated. To avoid this, new browsers will also have to support HTML.
Of course, XHTML may become the standard language for a new section of the Internet, as WML has become the standard language for WAP pages. This is one of W3C’s reasons for developing it (see above). But frankly I don’t believe that. New sections of the Internet require truly new languages because they will be different from the WWW, while XHTML is only good to write traditional WWW pages in.
Of course XHTML can form a bridge between HTML and XML and make web developers acquainted with XML rules. But I wonder if XML is that important for pure web developers. I’m not convinced that every web developer should know XML, because I don’t think client side XML will be widely used. Server side XML is another case, of course.
Finally, to repeat the key phrase from the W3C quote on the previous page:
Ultimately, it will be possible to develop XHTML-conforming content that is usable by any XHTML-conforming user agent.
Doesn’t this sound familiar? Wasn’t HTML, too, supposed to be working on any user agent? We all know what happened to that plan...
So if HTML is here to stay, why bother to switch to a more difficult language that goes against your coding practices when the switching is not necessary? I don’t see any reason to start using XHTML. I’ll happily continue to write my tags uppercase to separate them from content and I’ll leave out the occasional </P> when I feel like it.
As are all of W3C’s specifications, XHTML is a theoretical construct that is interesting in its implications and may still grow to play an important role on the WWW, but right now it is worthless in practice. Software vendors should make the first move. They should start using (and requiring) XHTML in constructive ways without alienating the users of their products Only then will the rest of the Web follow.
Those fanatics who think that everything W3C says has the power of a Divine Commandment and therefore treat anyone who doesn’t use XHTML as a heretic to be burned at the stake at the earliest opportunity, are simply wrong. XHTML isn’t about the present, it’s about the future.
Translations
Russian (Webmascon.com)
Learn More
Related Topics: HTML and XHTML, Industry
About the Author
Peter-Paul Koch is a freelance web developer in Amsterdam, the Netherlands. He writes and maintains www.quirksmode.org, a compendium of about 180 CSS and JavaScript articles, tips, and tricks, plus a bug reporting system and a blog.
02
아래는 원문입니다. 제가 시간 나는대로 틈틈이 번역된 부분은 한글로 수정해서 올리겠습니다.
XHTML을 text/html 방식으로 전송하는 것의 유해여부 고려
저자: 이안 힉슨 <ian@hixie.ch> (코멘트 감사히 받습니다)
원문 출처 : 위키사전의 원문작성자(이안 힉슨) 글의 링크로부터..
번역: pinkhare(분홍토끼) in GTport
개요
--------
XHTML 내용과 결합하는 데 있어 text/html MIME type의 사용으로 발생하는 많은 문제점들이 논의됩니다.
text/html로써 전송된 XHTML은 깨지고 위험하다는 것을 시사하고 있으므로, 대중을 상대로 한 향후의 작업을
계획중인 작성자는 HTML 4.01에 충실해야 하며, XHTML을 사용하고자 하는 작성자는 작성된 마크업을
application/xhtml+xml로써 전송해야 합니다.
번역
------
Une traduction française est disponible:
http://www.hixie.ch/advocacy/xhtml.fr
배경
------
이것은 2002년 9월에 처음 본 웹 로그 엔트리 내용 일부로 작성되었습니다:
http://ln.hixie.ch/?start=1031465247&count=1
다양한 메일링 리스트와 또다른 토론 포럼들에서 제기되어온 에러를 수정하는 것이 주기적으로 갱신되어왔기
때문입니다. 2004년 후반 현재, 그 사항이 처음 작성되었던 시점과 마찬가지로 여전히 관계가 있습니다.
본 문서는 부록 C에 준용된 XHTML 1.0을 HTML 4.01과 비교한 것이라는 점에 유의하시기 바랍니다.
왜냐하면 HTML 4.01이 text/html로 전송해도 되는 XHTML의 유일한 변종이니까요.
요점
------
XHTML을 사용하려면, XHTML문서를 반드시 application/xhtml+xml 마임타입으로 전송해야 합니다.
그렇게하지 않으려면, XHTML를 사용하지 말고 HTML4를 사용하셔야 합니다.
text/html 방식으로 전송하지 않고 XHTML을 사용하시든지, 밑에 요약해놓은 수많은 문제점들을
발생시키시던지.. 둘 중 하나를 택하세요.
안됐지만, IE6는 application/xhtml+xml을 지원하지 않습니다 (사실 IE6은 전혀 XHTML을 지원하지 못합니다).
XHTML에 text/html를 이용하면 안되는 이유
---------------------------------------------
text/html로 XHTML을 전송하기로 한 작성자(프로그래머,코더)들은 일반적으로 다음과 같은 문제를 겪게 됩니다:
1. Authors write XHTML that makes assumptions that are only valid for
tag soup or HTML4 UAs, and not XHTML UAs, and send it as
text/html. (The common assumptions are listed below.)
2. Authors find everything works fine.
3. Time passes.
4. Author decides to send the same content as application/xhtml+xml,
because it is, after all, XHTML.
5. Author finds site breaks horribly. (See below for a list of
reasons why.)
6. Author blames XHTML.
Steps 1 to 5 have been seen by every single person I have spoken to
who has switched to using the XHTML MIME type. The only reason step 6
didn't happen in those cases is that they were advanced authors who
understood how to fix their content.
SPECIFIC PROBLEMS
These are the issues that affect documents when they are switched from
text/html to application/xhtml+xml:
* <script> and <style> elements in XHTML sent as text/html have to be
escaped using ridiculously complicated strings.
This is because in XHTML, <script> and <style> elements are #PCDATA
blocks, not #CDATA blocks, and therefore <!-- and --> really _are_
comments tags, and are not ignored by the XHTML parser. To escape
script in an XHTML document which may be handled as either HTML4 or
XHTML, you have to use:
<script type="text/javascript"><!--//--><![CDATA[//><!--
...
//--><!]]></script>
To embed CSS in an XHTML document which may be handled as either
HTML4 or XHTML, you have to use:
<style type="text/css"><!--/*--><![CDATA[/*><!--*/
...
/*]]>*/--></style>
Yes, it's pretty ridiculous. If documents _aren't_ escaped like
this, then the contents of <script> and <style> elements get
dropped on the floor when parsed as true XHTML.
(This is all assuming you want your pages to work with older
browsers as well as XHTML browsers. If you only care about XHTML
and HTML4 browsers, you can make it a bit simpler.)
* A CSS stylesheet written for an HTML4 document is interpreted
slightly differently in an XHTML context (e.g. the <body> element
is not magical in XHTML, tag names must be written in lowercase in
XHTML). Thus documents change rendering when parsed as XHTML.
* A DOM-based script written for an HTML4 document has subtly
different semantics in an XHTML context (e.g. element names are
case insensitive and returned in uppercase in HTML4, case sensitive
and always lowercase in XHTML; you have to use the namespace-aware
methods in XHTML, but not in HTML4). BUT, if you send your
documents as text/html, then they will use the HTML4 semantics
DESPITE being XHTML! Thus, scripts are highly likely to break when
the document is parsed as XHTML.
* Scripts that use document.write() will not work in XHTML contexts.
(You have to use DOM Core methods.)
* Current UAs are, for text/html content, HTML4 user agents (at best)
and certainly not XHTML user agents. Therefore if you send them
XHTML you are sending them content in a language which is not
native to them, and instead relying on their error handling. Since
this is not defined in any specification, it may vary from one user
agent to the other.
* XHTML documents that use the "/>" notation, as in "<link />" have
very different semantics when parsed as HTML4. So if there was to
be a fully compliant HTML4 UA, it would be quite correct to show
">" characters all over the page.
For more details on this see the third bullet point in the section
entitled "The Myth of "HTML-compatible XHTML 1.0 documents".
COPY AND PASTE
The worst problem, and the main reason (I suspect) for most of the
REALLY invalid XHTML pages out there, is that authors who have no clue
about XHTML simply copy and pasted their DOCTYPE from another
document. So even if you write valid XHTML, by using XHTML, you are
likely to encourage authors who do not know enough to write valid
XHTML to claim to do so.
Why trying to use XHTML and then sending it as text/html is bad
---------------------------------------------------------------
These are not likely to be problems for authors who regularly validate
their pages, but other authors will run into these problems.
* Documents sent as text/html are handled as tag soup [1] by most UAs.
This is the key. If you send XHTML as text/html, as far as browsers
are concerned, you are just sending them Tag Soup. It doesn't
matter if it validates, they are just going to be treating it the
same was as plain old HTML 3.2 or random HTML garbage.
Since most authors only check their documents using one or two UAs,
rather than using a validator, this means that authors are not
checking for validity, and thus most documents that claim to be
XHTML on the web now are invalid.
See, for example, this study:
http://www.goer.org/Journal/2003/Apr/index.html#results
...but if you don't believe it, feel free to do your own. In any
random sample of documents that appear to claim to be XHTML, the
overwhelming majority of documents are invalid.
Therefore the main advantage of using XHTML, that errors are caught
early because it _has_ to be valid, is lost if the document is then
sent as text/html. (Yes, I said _most_ authors. If you are one of
the few authors who understands how to avoid the issues raised in
this document and does validate all their markup, then this
document probably does not apply to you -- see Appendix B.)
* If you ever switch your documents that claim to be XHTML from
text/html to application/xhtml+xml, then you will in all likelyhood
end up with a considerable number of XML errors, meaning your
content won't be readable by users. (See above: most of these
documents do not validate.)
* If a user saves such an text/html document to disk and later
reopens it locally, triggering the content type sniffing code since
filesystems typically do not include file type information, the
document could be reopened as XML, potentially resulting in
validation errors, parsing differences, or styling differences.
(The same differences as if you start sending the file with an XML
MIME type.)
* The only real advantage to using XHTML rather than HTML4 is that it
is then possible to use XML tools with it. However, if tools are
being used, then the same tools might as well produce HTML4 for you.
Alternatively, the tools could take SGML as input instead of XML.
(SGML is over a decade older than XML and the tools have existed
for years.)
* HTML 4.01 contains everything that XHTML 1.0 contains, so there is
little reason to use XHTML in the real world. It appears the main
reason is simply "jumping on the bandwagon" of using the latest and
(perceived) greatest thing.
The Myth of "HTML-compatible XHTML 1.0 documents"
-------------------------------------------------
RFC 2854 spec refers to "a profile of use of XHTML which is compatible
with HTML 4.01". There is no such thing. Documents that follow the
guidelines in appendix C are not valid HTML 4.01 documents. They just
happen to be close enough that tag soup parsers are able to handle
them just like most of the other pages on the Web.
The simplest examples of this are:
* The "/>" empty tag syntax actually has totally different meaning in
HTML4. (It's the SHORTTAG minimisation feature known as NET, if I
recall the name correctly.) Specifically, the XHTML
<p> Hello <br /> World </p>
...is, if interpreted as HTML4, exactly equivalent to:
<p> Hello <br>> World </p>
...and should really be rendered as:
Hello
> World
* Script and style elements cannot have their contents hidden from
legacy UAs. The following XHTML:
<style type="text/css">
<!-- /* hide from old browsers */
p { color: red; }
-->
</style>
...is exactly equivalent to the following HTML4:
<style type="text/css">
</style>
...because comments are not ignored in XHTML <style> blocks.
* The "xmlns" attribute is invalid HTML4.
* The XHTML DOCTYPEs are not valid HTML4 DOCTYPEs.
Using XHTML and sending it as text/html is effectively the same, from
an HTML4 point of view, as writing tag soup (see "Why UAs can't handle
XHTML sent as text/html as XML" below).
Note: This is covered by HTMLWG issue XHTML-1.0/6232:
http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-1.0?id=6232;
expression=appendix%20c;user=guest
Why UAs can't handle XHTML sent as text/html as XML
---------------------------------------------------
* Documents sent as text/html are handled as tag soup by most UAs.
This means that authors are not checking for validity, and thus
most XHTML documents on the web now are invalid. A conforming XML
UA would thus be unable to show as many documents as current UAs,
and would therefore never get enough marketshare to be relevant.
* It is impossible to reliably autodetect XHTML when sent as
text/html. This is why UAs could not ever treat text/html documents
as XML, even if they did not care about not being usable (see the
first point in this section).
+ You can't sniff for the five characters "<?xml" because:
- The <?xml ... ?> header is optional per Appendix C, and it is
recommended not to include it as it causes IE6 to trigger
quirks mode.
- SGML can also contain PIs (see the example below).
+ You can't trigger from the DOCTYPE since the W3C might introduce
new XHTML DOCTYPEs in future, so you don't know which DOCTYPEs
to look for. (Not to mention that DOCTYPEs are optional for
well-formed XHTML documents, DOCTYPE parsing is hard, DOCTYPEs
may be hidden in comments, and DOCTYPE sniffing has been called
harmful by many leading figures at the W3C and elsewhere.)
+ You can't trigger off the "<html xmlns" string because it might
be there but hidden in a comment (you'd need a complete XML
parser to step past comments, PIs, internal subsets, etc).
e.g. what language is this text/html document in?:
<?xml this is not?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN"
[ <!-- SYSTEM "not XHTML" --> ]>
<!-- -- -->
This is a comment. This document is not XHTML.
<html xmlns="http://www.w3.org/1999/xhtml"/>
Ok, I'm done now. -->
<html>
<title> Need a title in HTML4! </title>
<p> This is a valid HTML4 document.
</html>
* Even if you could detect XHTML, what do you do with a document that
is not well formed (such as the example above)? If you fall back on
HTML4, then there is no advantage to using an XML processor, and you
might as well always treat it as HTML4.
* The HTML working group said that UAs should not do this:
http://lists.w3.org/Archives/Public/www-html/2000Sep/0024.html
The advantages of XHTML
-----------------------
When sent as application/xhtml+xml, XHTML has several advantages:
1. XHTML content will be able to be mixed-and-matched with content
from other well-known namespaces (in particular, MathML). This
is the main advantage for content authors.
2. UAs will immediately catch well-formedness errors
3. Tools interacting with XHTML documents are guaranteed a
well-formed document.
4. XHTML content can be parsed with a simpler parser than tag soup
can, and a _much_ simpler parser than SGML can.
However, none of these apply when an XHTML document is sent as
text/html, and since authors feel their pages should be readable on
the most popular Web browser, which does not support
application/xhtml+xml, there is basically no point in using XHTML at
the moment.
Conclusion
----------
There are few advantages to using XHTML if you are sending the content
as text/html, and many disadvantages.
In addition, currently, the majority (over 90% by most counts) of the
UA market is unable to correctly render real XHTML content sent as
text/xml (or other XML MIME types). For example, point IE at:
http://www.mozillaquestquest.com/
Only Mozilla, Mozilla-based browsers such as Netscape 6 and 7, recent
versions of Opera, and Safari, are able to correctly render that site.
(IE6 shows a DOM tree!)
Authors who are not willing to use one of the XML MIME types should
stick to writing valid HTML 4.01 for the time being. Once user agents
that support XML and XHTML sent as one of the XML MIME types are
widespread, then authors may reconsider learning and using XHTML.
(Advanced authors should also see appendix B.)
Further Reading
---------------
I wrote another document on a related matter: people wanting UAs to
treat XHTML documents sent as text/html as XML and not tag soup.
http://www.damowmow.com/playground/xhtml-in-uas.xhtml
Henri Sivonen wrote a similar document asking what is the point of
XHTML:
http://www.hut.fi/u/hsivonen/xhtml-the-point
There are also many mailing list posts on this matter, e.g. on
www-talk. The following post summarises some issues relating to using
text/html for XHTML content containing XML extensions:
http://lists.w3.org/Archives/Public/www-talk/2001MayJun/0046.html
Some people have run into the problems this document mentions, for
example:
http://flrant.com/index.php?id=P21
There are also some interesting points made in other posts, for
example:
| > But does Mozilla call its xml parser for http://www.w3.org/ ?
|
| Nope. If it did, it would render the page without any expanded
| character entity references, since Mozilla is not a validating
| parser and thus skips parsing the DTD and thus doesn't know what
| , · and © are. Not to mention that it would end up
| ignoring the print-media specific section of the stylesheet, which
| uses uppercase element names and thus wouldn't match any of the
| lower case elements (line 138 of the first stylesheet), and it would
| use an unexpected background colour for the page because the
| stylesheet sets the background on <body> and not <html>, which in
| XHTML will result in a different rendering to the equivalent in
| HTML4 (same sheet, line 5).
-- http://lists.w3.org/Archives/Public/www-talk/2001MayJun/0004.html
Or this post, near the end of the thread:
| I'm still looking for a good reason to write websites in XHTML _at
| the moment_, given that the majority of web browsers don't grok
| XHTML. The only reason I was given (by Dan Connolly [1]) is that it
| makes managing the content using XML tools easier... but it would be
| just as easy to convert the XML to tag soup or HTML before
| publishing it, so I'm not sure I understand that. And even then,
| having the content as XML for content management is one thing, but
| why does that require a minority of web browsers to have to treat
| the document as XML instead of tag soup? What's the advantage of
| doing that? And even _then_, if the person in control of the content
| is using XML tools and so on, they are almost certainly in control
| of the website as well, so why not do the content type munging on
| the server side instead of campaigning for UA authors to spend their
| already restricted resources on implementing content type sniffing?
|
| [1] http://lists.w3.org/Archives/Public/www-talk/2001MayJun/0031.html
-- http://lists.w3.org/Archives/Public/www-talk/2001JulAug/0005.html
Appendix A: application/xhtml+xml
---------------------------------
See: http://ln.hixie.ch/?start=1036767231&count=1
Appendix B: Advanced Authors
----------------------------
Some advanced authors are able to send back XHTML as
application/xhtml+xml to UAs that support it, and as text/html to
legacy UAs.
Assuming you are using XHTML 1.0 compliant to Appendix C (or have
otherwise checked that the XHTML 1.0 you send is compatible with Tag
Soup processors), then that's fine. All I am saying in this document
is that sending XHTML as text/html ONLY is harmful.
Note: Sending XHTML 1.1 as text/html is NEVER fine. There is no spec
that allows this. Sending XHTML 2.0 as anything in a production
(non-testing) context is NEVER fine either, since that spec has not
reached CR yet.
Also note that I would personally suggest that even advanced authors
not use XHTML sent as text/html, since many authors copy and paste
markup from others and thus may easily end up copying the valid XHTML
markup but using it as HTML4.
Appendix C: Acknowledgements
----------------------------
Thanks to Nick Boalch for the abstract. Thanks to Dan Connolly for
pedantry that has improved the quality of this document. Thanks to Ted
Shaneyfelt and many others for suggesting improvements to the text.
Appendix D: Footnotes
---------------------
[1] The term "handled as tag soup" refers to the fact that UAs
typically are very lenient in their error handling, and do not support
any of the "advanced" SGML features. For example, browsers treat the
string "<br/>" as "<br>" and not "<br>>", the latter being what
HTML4/SGML says they should do. Similarly, real world UAs have no
problem dealing with content such as "<b> foo <i> bar </b> baz </i>"
even though according to the HTML4 spec that is meaningless.
- 정의 -
XHTML(Extensible Hypertext Markup Language)은 HTML과 동등한 표현 능력을 지닌 마크업 언어로, HTML보다 엄격한 문법을 가진다. HTML이 SGML의 응용인데 반해, 매우 유연한 마크업 언어인 XHTML은 SGML의 제한된 부분집합인 XML의 응용이다. XHTML 문서는 하나의 XML 문서로서 문법적으로 정확해야 하기 때문에, HTML과 달리 표준 XML 라이브러리를 이용한 자동화된 처리가 가능하다. XHTML 1.0은 2000년 1월 26일, W3C의 권고안이 되었다.
( *별표를 통해 중간중간 기본 용어에 대한 분홍토끼의 보충 해설이나 분석을 넣을 예정입니다.)
* Extensible Hypertext Markup Language : 확장 가능 하이퍼텍스트 마크업 언어
- 흔히들 확장성 이라고 한글로 표기를 하지만, 그것은 좀 어색하다. 확장성이란 확장 가능하다는 뜻이 아니고, 확장 가능한지 그렇지 않은지의 여부를 모두 지칭하는 것이다. '확장성이 있는' 이라고 표현한다면 확장가능하다는 뜻이겠고, 확장성이 없는 이라고 했을 때는 확장 불가능함을 의미하는데 막연히 용어를 줄이기 위해 확장성 이라고만 표현한다는 것은 비록 수월하게 이해하는 사람이 있을지라도 뭔가 어색함을 감추기엔 역부족이다.
따라서 지티포트에서 이하 사용할 용어는 정확히 "확장 가능 하이퍼텍스트 마크업 언어"로 표기하겠다.
- 개요 -
XHTML은 HTML의 후속으로 HTML은 더 이상 개발되지 않으며, 반면에 XHTML의 개선은 꾸준히 이루어지고 있다. 따라서 XHTML을 HTML의 "최신 버전"으로 보아도 무방할 것이다. 하지만 HTML과 XHTML은 별개의 분리된 표준이다. W3C는 지속적으로 웹 출판에서 XHTML 1.1, XHTML 1.0, HTML 4.01 등을 이용하길 권장하고 있다.
좀 더 엄격한 버전의 HTML의 필요를 느끼게 된 가장 큰 이유는 웹 콘텐츠가 기존의 전통 컴퓨터에서 벗어나 여러 가지 장치(이동기기 등)에서 이용되기 시작하면서, 부정확한 HTML을 지원하는데 필요한 자원이 부족한 환경이 생겨났기 때문이다. 문서가 검사될 수 있도록 문서형 정의(DTD)를 사용해 XHTML 문서를 규정한다.
* 문서형 정의(DTD)란 :
- Document Type Definition (문서 형식 정의), SGML 및 XML 프로그래밍에서 사용된다 (DOCTYPE 문서 형식 정의와 비슷하다)
- 표준 범용 문서 생성 언어(SGML) 규약에 근거한 전자 문서를 구성하는 세 부분 중의 하나로, 두 번째에 쓰이는 것. 문서 실현치(實現値)에서 사용하는 교정용 기호(markup)를 SGML로 정의한다. 태그의 이름, 계층 구조 및 속성 등이 정의된다.
최신의 웹 브라우저들은 XHTML을 정확하게 표현해 주며, XHTML이 거의 HTML에 포함되기 때문에 구형의 브라우저에서도 별 문제가 없다. 마찬가지로 XHTML을 지원하는 거의 모든 브라우저들은 HTML 역시 정확하게 표현한다. 혹자는 바로 이점이 HTML에서 XHTML으로의 전환을 더디게 하는 이유라고 말한다.
XHTML의 특별히 유용한 기능은 MathML, SVG와 같은 다른 XML 명칭 공간(네임스페이스; namespace)과의 연동을 들 수 있다.
HTML에서 transitional XHTML으로의 변화는 미미하지만 완전한 XML 문서라는 주된 목적이 이루어진다. 가장 중요한 변화는 문서가 체계화(well formed)되고, 모든 HTML 요소(엘레멘트)들이 닫혀있어야 한다는 점이다. 덧붙여, XHTML에서는 모든 태그들이 소문자로 작성되어야 한다. 이것은 HTML 2.0이 나오던 때에 대부분이 대문자를 사용하던 관습과는 완전히 대조적이다. XHTML에서는 수치를 포함한 모든 속성은 따옴표로 묶여야 한다. (이것은 SGML에서 강제 사항이 아니었기 때문에 HTML에서도 임의적이었다.) img와 br과 같은 빈 태그를 포함한 모든 요소들은 닫혀있어야 한다. 빈 태그를 닫는 것은 시작 태그에 '/'를 추가함으로써 이루어진다(예: <img … />, <br />). 간소화된 속성의 사용 역시 금지된다(예: <option selected> 대신에 <option selected="selected">). 더 자세한 차이는 W3C의 XHTML 기술 명세를 참고하라
- 버전 -
XHTML 1.0
W3C의 XHTML 첫 권고안인 XHTML 1.0은 단순히 HTML 4.01을 XML로 재규정한 것이다. XHTML 1.0은 세 가지 문서형이 있는데, 이것은 각각 HTML 4.01 버전들의 범위와 동일하다.
XHTML 1.0 Strict에서는 문서가 반드시 체계화(well formed)되어야 한다. 이것은 HTML 4.01 Strict의 XML 형식이다.
XHTML 1.0 Transitional는 XHTML 1.0 Strict에서는 사용되지 않는 <center>, <u>, <strike>, <applet> 요소들의 사용을 허용한다.
XHTML 1.0 Frameset: HTML 프레임셋의 사용을 허용한다.
XHTML 1.1
가장 최근의 XHTML W3C 권고안은 XHTML 1.1: 모듈 기반의 XHTML이다. 작성자는 자신의 마크업에 새로운 기능(프레임셋 같은)을 도입할 수 있다. 이 버전은 또한 동아시아(특별히 CJK) 언어의 기술을 위해 필요한 ruby 마크업 지원을 포함한다.
이것은 W3C에서 모든 새로운 웹 페이지에 적용하도록 권고하는 기술 명세이다.
XHTML 2.0 작업 초안 명세
XHTML 2.0에 대한 작업은 2006년 현재, 여전히 진행 중이다. 사실 DTD조차 아직 작성되지 않았다. XHTML 작업 초안은 하위 호환성에 대한 문제 때문에 논쟁이 많아서, 단순히 새 버전을 만들기보다 (X)HTML의 제약을 벗어나는 사실상 새로운 마크업 언어를 제작하게 되었다.
XHTML2.0이 HTML 계열의 마크업 언어에 가져오게 될 새로운 기능들은 다음과 같다:
- HTML 폼(form)은 XForms로 교체된다.
- HTML 프레임은 XFrames로 교체된다.
- DOM 이벤트는 XML DOM을 사용하는 XML 이벤트로 교체된다.
- 네비게이션 목록을 위해 특별히 설계된 새로운 목록 요소인 <nl> 요소가 추가된다. 이것은 현재 다양한 방법을 통해 제작되는 내포된(nested) 메뉴를 제작하는데 유용할 것이다.
- 모든 요소에 대한 하이퍼링크가 가능해진다. 예: <li href="articles.html">Articles</li>
- src 속성을 이용해 모든 요소에 대한 대체 매체 기술이 가능해진다. 예: <p src="lbridge.jpg" type="image/jpeg">London Bridge</p>가 <img src="lbridge.jpg" alt="London Bridge" />를 대체.
- <img src="" alt="" /> 요소는 제거되고 <object type="MIME/ContentType" src="">Alt</object> 형태로 대체된다.
- 표제 요소들(i.e. <h1>, <h2>, <h3> 등)은 <h>의 단일 요소로 대체된다. 표제의 등급은 <h>에 포함된(nested) <section> 요소에 의해 지시된다.
XHTML 1.x(Strict 포함)에서 여전히 허용되는 표현형 요소인 <i>, <b>, <tt>는 XHTML 2.0에서 빠진다. 표현형 요소는 첨자를 위한 <sup>, <sub>만이 남게된다.
다른 XHTML 계열
XHTML basic: 모든 XHTML 집합을 사용하기 힘든 장치를 위한 "가벼운" 버전의 XHTML으로 이동 전화같은 핸드헬드 기기에 주로 사용된다. 이것은 WML과 C-HTML의 대체를 위해 정의되었다.
XHTML 모바일 프로필: XHTML basic을 기초로 몇 가지 이동 전화 특화된 요소들이 추가된 것으로, 이동 전화에서의 사용을 목적으로 OMA(Open Mobile Alliance)에서 정의했다.
적합한 XHTML 문서 작성하기
XHTML 명세에 부합하는 XHTML 문서를 적합한(valid) 문서라 부른다. 모든 브라우저가 웹 표준을 따르고, 적합한 문서들이 모든 브라우저와 플랫폼에서 표시되는 것이 가장 이상적일 것이다. 하지만 실제로는 적합한 XHTML 문서가 항상 크로스-브라우저 호환성을 의미하는 것은 아니며, 그것은 단지 권고 사항일 뿐이다. 문서의 적합성을 검사하기 위해서는 W3C 마크업 적합성 검사 서비스를 사용한다.
문서형 선언(DOCTYPE)
적합성 검사를 위해서는, 문서에 문서형 선언(혹은 DOCTYPE)이 포함되어 있어야 한다. DOCTYPE은 브라우저에 어떤 문서형 정의(DTD)를 적용할 것인가를 선언한다. 문서형 선언은 XHTML 문서의 가장 첫 부분에 위치해야 한다. XHTML 문서의 형 선언은 다음과 같다:
- XHTML 1.0 Strict
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">- XHTML 1.0 Transitional
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">- XHTML 1.0 Frameset
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">- XHTML 1.1
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
XHTML 문서에서 문자 인코딩은 XML 선언이나 meta http-equiv 문에 반드시 명시되어야 한다. (만약 XML 문서에 인코딩이 기술되어 있지 않다면, 상위 프로토콜에서 미리 지정되지 않는 한, XML 해석기는 UTF-8이나 UTF-16으로 간주한다.)
일반적인 오류
다음은 XHTML에서 흔히 있는 오류이다(적합성 검사기가 한글화되어 있지 않기 때문에 영문을 우선 표기함):
- Not closing empty elements (elements without closing tags) - 닫히지 않은 빈 요소
- 틀림:
<br> - 옳음:
<br />
- 틀림:
- Not closing non-empty elements - 닫히지 않은 꽉 찬 요소
- 틀림:
<p>This is a paragraph.<p>This is another paragraph. - 옳음:
<p>This is a paragraph.</p><p>This is another paragraph.</p>
- 틀림:
- Improperly nesting elements (elements must be closed in reverse order) - 부적합한 함유 요소
- 틀림:
<em><strong>This is some text.</em></strong> - 옳음:
<em><strong>This is some text.</strong></em>
- 틀림:
- Not specifying alternate text for images (using the
altattribute, which helps make pages accessible for devices that don't load images or screen-readers for the blind) - 대체 텍스트가 기술되지 않음- 틀림:
<img src="/skins/common/images/poweredby_mediawiki_88x31.png" /> - 옳음:
<img src="/skins/common/images/poweredby_mediawiki_88x31.png" alt="MediaWiki" />
- 틀림:
- Putting text directly in the body of the document - 본문에 직접 텍스트를 삽입
- 틀림:
<body>Welcome to my page.</body> - 옳음:
<body><p>Welcome to my page.</p></body>
- 틀림:
- Nesting block-level elements within inline elements - 인라인 요소에 블록-레벨 요소를 포함
- 틀림:
<em><h2>Introduction</h2></em> - 옳음:
<h2><em>Introduction</em></h2>
- 틀림:
- Not putting quotation marks around attribute values - 속성 값을 인용부호로 감싸지 않음
- 틀림:
<td rowspan=3> - 옳음:
<td rowspan="3">
- 틀림:
- Using the ampersand outside of entities (use
&to display the ampersand character) - '&' 문자를 직접 사용 ('&'로 대체)- 틀림:
<title>Cars & Trucks</title> - 옳음:
<title>Cars & Trucks</title>
- 틀림:
- Using uppercase tag names and/or tag attributes - 태그 이름이나 태그 속성에 대문자를 사용
- 틀림:
<BODY><P>The Best Page Ever</P></BODY> - 옳음:
<body><p>The Best Page Ever</p></body>
- 틀림:
- Attribute minimization - 간소화된 속성 사용
- 틀림:
<textarea readonly>READ-ONLY</textarea> - 옳음:
<textarea readonly="readonly">READ-ONLY</textarea>
- 틀림:
이것은 완전한 목록은 아니지만 XHTML 코드 작성 시에 흔히 생기는 일반적인 오류들을 포함하고 있다.
XHTML 1.0 Strict 문서 예제
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>XHTML Example</title>
</head>
<body>
<p>This is tiny example of an XHTML usage.</p>
</body>
</html>
30
Cascading Style Sheet - 구체적 정의
작성: pinkhare(분홍토끼) in GTport
기존의 HTML 만으로는 웹 문서를 다양하게 설계하는 데 약간의 제한이 따르고, 수시로 수정할 경우 불필요한 시간과 노력이 들어갑니다. 이것을 보완하기 위해 스타일 시트라는 것이 고안되었습니다. 스타일 시트의 표준안이 바로 우리가 공부하려는 CSS 입니다. 그냥 간단하게 스타일시트라고 부르기도 합니다.
HTML을 이용해서 웹 페이지를 제작할 경우 전반적인 틀에서 세세한 글꼴 하나 하나를 사용되는 곳마다 매번 일일이 지정해주어야 하지만, 웹 페이지의 스타일(작성형식)을 미리 저장해 두면 웹 페이지의 한 가지 요소만 변경하더라도 관련된 전체 페이지의 내용이 한꺼번에 변경되므로, 문서 전체의 일관성을 유지할 수 있고 작업 시간 단축 및 편의성을 얻을 수 있습니다.
따라서 웹 개발자들은 보다 풍부한 디자인으로 웹을 설계할 수 있고, 글자의 크기, 글자체, 줄간격, 배경 색상, 배열위치 등을 자유롭게 선택하거나 변경할 수 있으며 유지·보수도 간편하게 할 수 있습니다.
각기 다른 사용자 환경에서 동일한 형태로 문서를 표현해준다는 이점도 가지고 있습니다. CSS로 만들어진 문서는 사용자들의 브라우저 환경에 따라 홈페이지가 다르게 나타나는 일이 없고 어느 환경에서나 제작자가 의도한대로 그 효과가 전달됩니다.
1. Cascading: cascade는 "폭포가 되어 떨어지다"라는 뜻입니다. 폭포는 물이 위에서 아래로 힘차게 흘러 내립니다. style sheet에, cascading이란 말이 붙은 이유를 보면 CSS에는 '우선순위'라는 것이 있습니다.
이 CSS의 우선순위가 정해지는 것이, 꼭 cascade와 비슷한 느낌을 주기 때문에, style sheet에 cascading이란 말을 붙인 것입니다.
2. Style: style은 예를 들어서 어떤 문서가 있는데 글자 크기가 10, 글자색은 노란색, 글꼴은 고딕입니다. 다른 문서는 글자 크기가 12, 글자색은 파란색, 글꼴은 명조입니다. 이 두 개의 문서가 내용이 같을 수는 있지만, 겉모양은 다릅니다. 이것을 보고, 두 문서는 스타일이 다르다고 합니다.
3. Sheet: sheet는 단어 뜻으로 보면 style을 적어 놓은 종이라는 의미입니다. 즉, '겉모양은 이러 이러하게 하라'는 내용이 들어 있는 곳입니다.- 여러 문서가 내용은 다르지만, 겉 모습은 똑같은 경우를 생각해 보면 문서 하나 하나의 내용과 겉 모습을 작성하게 되면, 중복되는 작업이 있습니다. 그러므로 style을 한 번만 정해 놓고, 내용만 있는 문서에 이를 적용하면, 아주 편리해 질 것이다. 바로 이러한 역할을 하는 것이 Style Sheet입니다.






