HyperText Markup Language
Home Page

This is W3C's home page for HTML. Here you will find pointers to our specifications for HTML, guidelines on how to use HTML to the best effect, and pointers to related work at W3C. When W3C decides to become involved in an area of Web technology or policy, it initiates an activity in that area. HTML is one of many Activities currently being pursued. You can learn more about the HTML activity from the HTML Activity Statement.

NEWS

The second draft for the XHTML Events is now available. This specification defines the XHTML Event Module, a module that provides XHTML host languages with the ability to uniformly integrate behaviors with Document Object Model (DOM) Level 2 event interfaces. This specification also defines the XHTML Basic Event Module, a module which subsets the XHTML Event Module for simpler applications and simpler client devices, and the XHTML Event Types Module, a module defining XHTML language event types.

The second draft for the XForms Data Model is now available. XForms is W3C's name for next generation Web forms. The key idea is to separate the user interface and presentation from the data model and logic. XForms brings XML to Web forms, transferring form data as XML. See also the Press Release and Testimonials for the first public draft.

Updated HTML Working Group Roadmap is now available as a W3C Note. It sets out the timeline and deliverables for the HTML Working Group.

XHTML™ 1.0 became a W3C Recommendation on 26 January 2000. It is a reformulation of HTML 4.01 in XML, bringing the rigor of XML to HTML, and can be put to immediate use with existing browsers by following a few simple guidelines. Check out the Press Release and Testimonials.

HTML 4.01, released on 24th December 1999, fixes bugs in the HTML 4.0 specification, which for instance, omitted the name attribute on the img and form elements. HTML 4.01 defines the semantics and datatypes for HTML.

What is HTML?

HTML is the lingua franca for publishing hypertext on the World Wide Web. It is a non-proprietary format based upon SGML, and can be created and processed by a wide range of tools, from simple plain text editors - you type it in from scratch- to sophisticated WYSIWYG authoring tools. HTML uses tags such as <h1> and </h1> to structure text into headings, paragraphs, lists, hypertext links etc. Here is a 10-minute guide for newcomers to HTML. W3C's statement of direction for HTML is given on the HTML Activity Page. See also the page on our work on the next generation of Web forms, and the section on Web history.

Mission of the HTML Working Group

To develop the next generation of HTML as a suite of XML tag sets with a clean migration path from HTML 4.0. Some of the expected benefits include: reduced authoring costs, an improved match to database & workflow applications, a modular solution to the increasingly disparate capabilities of browsers, and the ability to cleanly integrate HTML with other XML applications. For further information, see the Charter for the HTML Working Group.

XHTML 1.0 is the current W3C Recommendation

W3C produces what are known as "Recommendations" for HTML. These are specifications, developed by W3C working groups, and then voted in by Members of the Consortium. A W3C Recommendation indicates that consensus has been reached among the Consortium Members that a specification is appropriate for widespread use.

XHTML 1.0 is W3C's recommendation for the latest version of HTML, following on from earlier work on HTML 4.01, HTML 4.0, HTML 3.2 and HTML 2.0. With a wealth of features, XHTML 1.0 is a reformulation of HTML 4.01 in XML, and combines the strength of HTML4 with the power of XML.

Three "flavors" of XHTML:

XHTML 1.0 is specified in three "flavors". You specify which of these variants you are using by inserting a line at the beginning of the document. For example, the HTML for this document starts with a line which says that is it using XHTML Transitional. Thus, if you want to validate the document, the tool used knows which variant you are using. Each variant has its own DTD - Document Type Definition - which sets out the rules and regulations for using HTML in a succinct and definitive manner. The complete XHTML 1.0 specification is available in English in several formats, including HTML, plain text Postscript, and PDF. See also the list of translations produced by volunteers.

XHTML 1.0 and HTML 4.01

XHTML 1.0 is the first major change to HTML since HTML 4.0 was released in 1997. It brings the rigor of XML to Web pages and is the keystone in W3C's work to create standards that provide richer Web pages on an ever increasing range of browser platforms including cell phones, televisions, cars, wallet sized wireless communicators, kiosks, and desktops.

XHTML is modular making it easy to combine with markup tags for things like vector graphics, multimedia, math, electronic commerce and more. Content providers will find it easier to produce content for a wide range of platforms, with better assurances as to how the content is rendered.

The modular design reflects the realization that a one-size-fits-all approach will no longer work in a world where browsers vary enormously in their capabilities. A browser in a cellphone can't offer the same experience as a top of the range multimedia desktop machine. The cellphone doesn't even have the memory to load the page designed for the desktop browser.

XHTML 1.0 is the first step and the HTML working group is busy on the next. XHTML 1.0 reformulates HTML as an XML application. This makes it easier to process and easier to maintain. XHTML 1.0 borrows the tags from W3C's earlier work on HTML 4, and can be interpreted by existing browsers, by following a few simple guidelines. This allows you to start using XHTML now!

You can roll over your old HTML documents into XHTML using W3C's Open Source HTML Tidy utility. This tool also cleans up markup errors, removes clutter and prettifies the markup making it easier to maintain.

HTML 4.01 is a revision of the HTML 4.0 Recommendation first released on 18th December 1997. The revision fixes minor errors that have been found since then. The XHTML 1.0 spec relies on HTML 4.01 for the meanings of HTML tags. This allowed us to reduce the size of the XHTML 1.0 spec very considerably.

What is the difference between XHTML 1.0, XHTML Basic and XHTML 1.1?

The first step was to reformulate HTML4 in XML, resulting in XHTML 1.0. The next modularized the elements and attributes into convenient collections for use in documents that combine HTML with other tag sets. The modules are defined in XHTML Modularization. XHTML Basic is an example of fairly minimal build of these modules and is targeted at mobile applications. XHTML 1.1 is an example of a larger build of the modules, avoiding many of the presentation features.

Here is a link to further information on the modularization of XHTML.

Other Public Drafts

HTML Working Group Roadmap

This is a W3C NOTE that describes the timeline for deliverables of the HTML working group.

Modularization of XHTML™

This working draft specifies a modularization of XHTML 1.0. There are two aspects to the proposed modularization: modularization into semantic modules, and implementation of these semantic modules through a document type definition (DTD). Semantic modules provide a means for subsetting and extending XHTML, a feature desired for extending XHTML's reach onto emerging platforms. Modularization at the DTD level improves the ability to create new complete DTDs from XHTML and other DTD modules.

Building XHTML™ Modules

This working draft defines the mechanism for defining markup language modules that are compatible with the modularization framework used by XHTML. This includes a definition of the way in which an abstract module is specified, the way in which this abstraction is mapped into an XML DTD, and the way in which the resulting DTD module can be combined with other XHTML DTD modules to create new markup languages. In the future, it is expected that instructions will also be provided for mapping the abstract specifications into an XML Schema. Note that the materials in this document were formerly part of the Modularization of XHTML document, but have been separated out for editorial purposes.

XHTML™ 1.1 - Module-based XHTML

This working draft defines a new XHTML document type that is based solely upon the module framework defined in Building XHTML Modules and the modules defined in Modularization of XHTML. The purpose of this document type is to serve as the basis for future extended XHTML family document types, and to provide a consistent, forward looking document type cleanly separated from the deprecated, legacy function of HTML 4.0 that was brought forward into XHTML 1.0 document types. Note that the materials in this document were formerly part of the Modularization of XHTML document, but have been separated out for editorial purposes.

XForms Requirements

Forms were introduced into HTML in 1993 and have proven to be a valuable part of many Web pages. The experience of the last few years has led to demands for improvements to HTML forms. XHTML Extended Forms is a major revision of HTML Forms. Key goals for the next generation of web forms include improved interoperability and accessibility, enhanced client/server interaction, advanced forms logic, support for internationalization and greater flexibility in presentation.
Work is now starting on defining specifications meeting these requirements, and the XForms Working Group has been formed. Check the XForms home page for more information.

XHTML Document Profile Requirements

The increasing disparities between the capabilities of different kinds of Web browsers present challenges to Web content developers wishing to reach a wide audience. A promising approach is to formally describe profiles for documents intended for broad groups of browsers, for instance, separate document profiles for browsers running on desktops, television, handhelds, cellphones and voice browsers. Document profiles provide a basis for interoperability guarantees. If an author develops content for a given profile and a browser supports the profile then the author may be confident that the document will be rendered as expected. The requirements for document profiles are analyzed.

XHTML Basic

The XHTML Basic document type is a subset of XHTML 1.1. It contains the basic XHTML features inlcluding text structure, images, basic forms, and basic tables. It is designed for Web clients that do not support the full set of XHTML features; for example, Web clients such as mobile phones, PDAs, pagers, and settop boxes. The document type definition is implemented using XHTML modules as defined in "Modularization of XHTML".

XHTML Events

This specification defines the XHTML Event Module, an XHTML module that provides XML languages with the ability to represent in syntax the semantics of the Document Object Model (DOM) Level 2 event interfaces.

Slides on XHTML

You may also be interested in a recent presentation on XHTML at XML'99, 6th December 1999. The presentation describes the work being done by W3C on XHTML.

We would like to hear from you via email. Please send your comments to: www-html@w3.org (archive). Don't forget to include XHTML in the subject line.

Useful information for HTML authors

Guidelines for authoring

Here are some rough guidelines for HTML authors. If you use these, you are more likely to end up with pages that are easy to maintain, look acceptable to users regardless of the browser they are using, and can be accessed by the many Web users with disabilities. Meanwhile W3C have produced some more formal guidlines for authors. Have a look at the detailed accessibility guidelines.
    A question of style sheets. For most people the look of a document - the color, the font, the margins - are as important as the textual content of the document itself. But make no mistake! HTML is not designed to be used to control these aspects of document layout. What you should do is to use HTML to mark up headings, paragraphs, lists, hypertext links, and other structural parts of your document, and then add a style sheet to specify layout separately, just as you might do in a conventional Desk Top Publishing Package. That way, not only is there a better chance of all browsers displaying your document properly, but also, if you want to change such things as the font or color, it's really simple to do so. See the Touch of style.

    FONT tag considered harmful! Many filters from word-processing packages, and also some HTML authoring tools, generate HTML code which is completely contrary to the design goals of the language. What they do is to look at a document almost purely from the point of view of layout, and then mimic that layout in HTML by doing tricks with FONT, BR and &nbsp; (non-breaking spaces). HTML documents are supposed to be structured around items such as paragraphs, headings and lists. Yet some of these documents barely have a paragraph tag in sight!

    The problem comes when the content of pages needs to be updated, or given a new layout, or re-cast in XML (which is now to be the new mark-up language). With proper use of HTML, such operations are not difficult, but with a muddle of non-structural tags it's quite a different matter; maintenance tasks become impractical. To correct pages suffering from injudicious use of FONT, try the HTML Tidy program, which will do its best to put things right and generate better and more manageable HTML.

    Make your pages readable by those with disabilities. The Web is a tremendously useful tool for the visually impaired or blind user, but bear in mind that these users rely on speech synthesizers or Braille readers to render the text. Sloppy mark-up, or mark-up which doesn't have the layout defined in a separate style sheet, is hard for such software to deal with. Wherever possible, use a style sheet for the presentational aspects of your pages, using HTML purely for structural mark-up.

    Also, remember to include descriptions with each image, and try to avoid server-side image maps. For tables, you should include a summary of the table's structure, and remember to associate table data with relevant headers. This will give non-visual browsers a chance to help orientate people as they move from one cell to the next. For forms, remember to include labels for form fields.

Do look at the accessibility guidelines for a more detailed account of how to make your Web pages really accessible.

W3C HTML Validation Service

To further promote the reliability and fidelity of communications on the Web, W3C has introduced the W3C HTML Validation Service at http://validator.w3.org/.

Content providers can use this service to validate their Web pages against the HTML 4.0 Recommendation, thereby ensuring the maximum possible audience for their Web pages. In addition, it can be used to check conformance against previous versions of HTML, including the W3C Recommendation for HTML 3.2 and the IETF HTML 2.0 standard.

To allow authors to broaden their audience even further to those with disabilities, the service will be updated according to the guidelines produced by W3C's Web Accessibility Initiative (WAI). You can also test your pages for accessibility using the Web-based Bobby service.

Software developers who write HTML editing tools can ensure interoperability with other Web software by verifying that the output of their tool complies with the W3C Recommendations for HTML.

HTML Tidy

W3C also provides a stand-alone tool for checking and pretty-printing HTML that is in many cases able to fix up mark-up errors. HTML Tidy is available as W3C open source software, and also offers a means to convert existing HTML content into well-formed XML, for delivery as XHTML.

Discussion Forums

Changes to HTML necessitate obtaining a consensus from a broad range of organizations. If you have a great idea, it will take time to convince others! Here are some of the places where discussion on HTML takes place:
comp.infosystems.www.authoring.html
A USENET newsgroup where HTML authoring issues are discussed. "How To" questions should be addressed here. Note that many issues related to forms and CGI, image maps, transparent gifs, etc. are covered in the WWW FAQ.
www-html@w3.org
A technical discussion list. If you have a proposal for a change to HTML, you might start a discussion here to see what other developers think of it.
W3C HTML Working Group (members only)
The Group's mission is to develop the next generation of HTML as a suite of XML tag sets with a clean migration path from HTML 4.0. Some of the expected benefits include: reduced authoring costs, an improved match to database & workflow applications, a modular solution to the increasingly disparate capabilities of browsers, and the ability to cleanly integrate HTML with other XML applications. The Group is chaired by Steven Pemberton.
w3c-translators@w3.org
This is a mailing list for people working on translations of W3C specifications such as the HTML 4.0 Recommendation. To subscribe, send an email to w3c-translators-request@w3.org with the word "subscribe" in the subject line; (include the word "unsubscribe" if you want to unsubscribe.) The archive for the list is accessible online.
IETF MHTML WG
Standards for packaging compound documents (e.g. HTML+gifs) in MIME multipart messages.
IETF HTML Working Group (closed)
The HTML working group of the IETF, closed in 1996.
Web Conferences
The next international conference dedicated to the Web is WWW10, to be held in Hong Kong, 1st-5th May, 2000. The last was WWW9 which was held in Amsterdam, 15-19th May 1999.

Related W3C Work

XML
XML is a cousin of HTML. It allows you to define your own mark-up formats when HTML is not a good fit. XML is being used increasingly for data; for instance, W3C's metadata format RDF.
Style Sheets
W3C's Cascading Style Sheets langauge (CSS) provides a simple means to style HTML pages, allowing you to control visual and aural characteristics; for instance, fonts, margins, line-spacing, borders, colors, layers and more. W3C is also working on a new style sheet language written in XML called XSL, which provides a means to transform XML documents into HTML.
Document Object Model
Provides ways for scripts to manipulate HTML using a set of methods and data types defined independently of particular programming languages or computer platforms. It forms the basis for dynamic effects in Web pages, but can also be exploited in HTML editors and other tools by extensions for manipulating HTML content.
Internationalization
HTML 4.0 provides a number of features for use with a wide variety of languages and writing systems. For instance, mixed language text, and right-to-left and mixed direction text. HTML 4.0 is formally based upon Unicode, but allows you to store and transmit documents in a variety of character encodings. Further work is envisaged for handling vertical text and phonetic annotations for Kanji (Ruby).
Access for People with Disabilities
HTML 4.0 includes many features for improved access by people with disabilities. W3C's Web Accessibility Initiative is working on providing effective guidelines for making your pages accessible to all, not just those using graphical browsers.
XForms
Forms are a very widely used feature in web pages. W3C is working on the design of the next generation of web forms with a view to separating the presentation, data and logic, as a means to allowing the same forms to be used with widely differing presentations.
Mathematics
Work on representing mathematics on the Web has focused on ways to handle the presentation of mathematical expressions and also the intended meaning. The MathML language is an application of XML, which, while not suited to hand-editing, is easy to process by machine.

Previous Versions of HTML

HTML 4.01
The HTML 4.01 Recommendation released on 24th December 1999 fixes a number of bugs in the HTML 4.0 specification. The list of changes are detailed in appendix A
HTML 4.0
First released as a W3C Recommendation on 18 December 1997. A second release was issued on 24 April 1998 with changes limited to editorial corrections. This specification has now been superseded by HTML 4.01.
HTML 3.2
W3C's recommendation for HTML which represented the consensus on HTML features for 1996. HTML 3.2 added widely-deployed features such as tables, applets, text-flow around images, superscripts and subscripts, while providing backwards compatibility with the existing HTML 2.0 Standard.
HTML 2.0
HTML 2.0 (RFC 1866) was developed by the IETF's HTML Working Group, which closed in 1996. It set the standard for core HTML features based upon current practice in 1994.

ISO HTML

ISO/IEC 15445 is a subset of HTML 4.0. It takes a more rigorous stance for instance, an h3 element can't occur after an h1 element unless there is an intervening h2 element. Roger Price and David Abrahamson have written a user guide for ISO HTML.

Some early ideas for HTML

The Web owes its origins to many people, starting back in medieval times with the development of a rich system of cross references and marginalia. The basic document model for the Web was set: things in the page such as the text and graphics, and cross references to other works. These early hypertext links were able to able to target documents to a fine level thanks to conventions for numbering lines or verses.

Vannevar Bush in the 1940's, in his article As we may think, describes his vision for a computer aided hypertext system he named the memex. His vivid description of browsing the Web of linked information, includes the ability to easily insert new information of your own, to add to the growing web. Dr. Bush was the Director of the US Office of Scientific Research and Development, and coordinated war time research in the application of science to war.

Other visionaries include Douglas Engelbart, who founded the Augmentation Research Center at the Stanford Research Institute (SRI) in 1963. He is widely creditied with helping to develop the computer mouse, hypertext, groupware and many other seminal technologies. He now directs the Bootstrap Institute, which is dedicated to the development of collective IQ in networked communities.

Ted Nelson has spent his life promoting a global hypertext system called Xanadu. He coined the term hypertext, and is well known for his books: Literary Machines and Dream Machines, which describe hypermedia including branching movies, such as the film at the Czechoslovakian Pavilion at Expo `67.

The ACM SIGWEB, formerly SIGLINK, has for many years been the center for academic research into hypertext systems, sponsoring a series of annual conferences. SIGLINK was formed in 1989 following a workshop on hypertext, held in 1987 in Chapel Hill, North Carolina.

Bill Atkinson best known for MacPaint, an easy to use bitmap painting program, gave the world its first popular hypertext system HyperCard. Released in 1987, HyperCard made it easy for anyone to create graphical hypertext applications. It features bitmapped graphics, form fields, scripting and fast full text search. HyperCard is based on a stack of cards metaphor with shared backgrounds. It spawned imitators such as Asymmetrix Toolbook which used drawn graphics and ran on the PC. The OWL Guide was the first professional hypertext system for large scale applications, it predates HyperCard by one year and followed in the footsteps made by Xerox NoteCards, a Lisp-based hypertext system, released in 1985.

Tim Berners-Lee and Robert Caillau both worked at CERN, an international high energy physics research center near Geneva. In 1989 they collaborated on ideas for a linked information system that would be accessible across the wide range of different computer systems in use at CERN. At that time many people were using TeX and Postscript for their documents. A few were using SGML. Tim realized that something simpler was needed that would cope with dumb terminals through high end graphical X Windows workstations. HTML was conceived as a very simple solution, and matched with a very simple network protocol HTTP.

CERN launched the Web in 1991 along with a mailing list called www-talk. Other people thinking along the same lines soon joined and helped to grow the web by setting up Web sites and implementing browsers, such as, Cello, Viola, and MidasWWW. The break through came when the National Center for Supercomputer Applications (NCSA) at Urbana-Champaign encouraged Marc Andreessen and Eric Bina to develop the X Windows Mosaic browser. It was later ported to PCs and Macs and became a run-away sucess story. The Web grew exponentially, eclipsing other Internet based information systems such as WAIS, Hytelnet, Gopher, and UseNet.

We hope to extend this summary and are interested in getting hold of screen shots and feature lists for early browsers. This is your chance to help! You may also be interested in Marc Weber and Kevin Hughes' Web history site, and Shahrooz Feizabadi's short history of the Web and the Internet. We would like to add links to other sites dealing with the history of the web, so please let us know.

The WWW Project Proposal(1989)
This document was an attempt to persuade CERN management that a global hypertext system was in CERN's interests. Here is a description of the Web in 1992.
The first version of HTML
This is the description of a very early version of HTML. This text dates from 1992.
Screen shot of Tim Berners-Lee's browser editor as developed in 1991-92.
This was a true browser editor for the first version of HTML and ran on a NeXt workstation. Implemented in Objective-C, it, made it easy to create, view and edit web documents. Adding a new hypertext link was a breeze!
HTML+, HTML+ Reference or as Postscript (222417 bytes)
This was a proposal by Dave Raggett for extending HTML, first published as an Internet Draft in 1993, and in summary form at the WWW'1 Web Conference in 1994.
HTML 3.0 or as plain text (381229 bytes)
An extended version of HTML+, this was submitted as an Internet Draft in 1994. Like HTML+, it was never standardized, but helped to stimulate further work on features such as tables and math.
Valid XHTML 1.0!
Dave Raggett, Ian Jacobs, Masayasu Ishikawa, Takuya Asada, contact persons for HTML. $Date: 2000/10/04 19:56:22 $