Web Authoring Boot Camp by L.J. Bothell - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

HTML/XHTML Basics

HTML (Hypertext Markup Language) is the starting code language for building the most basic of web pages. You need to really know and understand this to understand and apply more complex languages and scripts for making later web pages dynamic. The current version is HTML4.

XML (Extensible Markup Language) is a set of rules for encoding documents in ma- chine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications. XML's design goals emphasize simplicity, generality, and usability over the Internet. It is a textual data format with strong support via Unicode for the languages of the world. Although the design of XML focuses on documents, it is widely used for the representation of arbitrary data structures, such as in web services.

XHTML (Extensible Hypertext Markup Language) moves past HTML to the more precise, manageable, and World Wide Consortium (W3C) validated current standards for basic web coding. It is a blend of HTML and XML which results in organized and consistent HTML code, allows clean validation, and keeps you away from deprecated tags and quirky practices that give browsers (and you) huge headaches. The current version is XHTML 1.0.

HTML5 is currently being used in some part and the boundaries of browser interpretation pushed as far as designers and developers can go. HTML4 and XHTML 1.0 in terms of additional functionality for websites that is now provided by Flash, JavaScript, and plug-ins. HTML5 adds more markup syntax for documents, especially for defining sections of a web page. It includes a bunch of new tags that this book will not cover for beginner's use because, as hot as HTML5 looks for more web “designer” websites and mobile platforms, it is currently not fully supported at a consistent and reliable level by the current browsers. However, we will look at the non-developer aspects of HTML5, like the doctype and charset which are supported by the W3C Validator and Google/

search engine indexing. We will also look at what can be reliably added to your web work at this point, and offer resources for you to keep your eye on – tutorials, examples, standardization reports, etc.

Our aim in this book is to focus on XHTML 1.0, because it meets current standards for web browsers and validation, and also requires web authors to be professional and accurate with their code. We will also add in some of the mark-up oriented HTML5 that is reliable, in order to get a leg up on the upcoming changes that is being adopted by browsers.

Therefore, coding practices we will cover will focus on XHTML1.0, and added HTML5 mark-up basics. We will not integrate HTML5 section classes other than as a segment of advanced HTML (there are ongoing excellent tutorials for you to explore on your own as HTML5 evolves), and we will not support the coding of deprecated (discarded) HTML4 and earlier formats.

Note: Even though we will focus on XHTML Transitional, this book will refer to tags as HTML-related tags. This is because XHTML is simply stricter HTML and uses the cur- rent HTML tags. Should you move to XHTML Strict, your coding will need to be even tighter or will face more validation errors.

Doctype

The doctype is critical to the build of a web page, because it tells the browser and validator which version of HTML is being used. This is necessary since browsers and the validation process need this information to interpret and display accurate web pages.

There are several current doctypes:

HTML 4.01 Strict

This DTD contains all HTML elements and attributes, but does not include presentational or deprecated elements (like font). Framesets are not allowed.

<!doctype HTML PUBLIC “-//W3C//DTD HTML 4.01//EN" "http://www. w3.org/TR/html4/strict.dtd”>

HTML 4.01 Transitional

This DTD contains all HTML elements and attributes, including presentational and deprecated elements (like font). Framesets are not allowed.

<!doctype HTML PUBLIC “-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd”>

HTML 4.01 Frameset

This DTD is equal to HTML 4.01 Transitional, but allows the use of frameset content.

<!doctype HTML PUBLIC “-//W3C//DTD HTML 4.01 Frameset//EN" "http:// www.w3.org/TR/html4/frameset.dtd”>

XHTML 1.0 Strict

This DTD contains all HTML elements and attributes, but does not include presentational or deprecated elements (like font). Framesets are not allowed. The markup must also be written as well-formed XML.

<!doctype html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN" "http://www. w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”>

XHTML 1.0 Transitional

This DTD contains all HTML elements and attributes, including presentational and dep- recated elements (like font). Framesets are not allowed. The markup must also be written as well-formed XML. We will focus on this doctype in Section 2 because its use seems more balanced for new web authors.

<!doctype html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN" "http:// www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”>

XHTML 1.0 Frameset

This DTD is equal to XHTML 1.0 Transitional, but allows the use of frameset content.

<!doctype html PUBLIC “-//W3C//DTD XHTML 1.0 Frameset//EN" "http:// www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd”>

XHTML 1.1

This DTD is equal to XHTML 1.0 Strict, but allows you to add modules (for example to provide ruby support for East-Asian languages).

<!doctype html PUBLIC “-//W3C//DTD XHTML 1.1//EN" "http://www. w3.org/TR/xhtml11/DTD/xhtml11.dtd”>

HTML5

There are 3 different doctypes in HTML 4.01 and 3 in XHTML 1.0. In HTML 5 there is only one:

<!doctype HTML>

Currently, the normal doctype for new and developing websters to start with is XHTML 1.0 Transitional, which helps browsers assume the web page has been designed using standards mode. It is also more forgiving that XHTML 1.0 Strict and allows for the occasional creep in of a gracefully depreciating tag. Designers then often move into XHTML 1.0 Strict, but you may find it easiest to work into the simple HTML5 doctype as soon as possible.

HTML Tag Namespace

XML namespaces are used for providing uniquely named elements and attributes in an XML document. They are defined in Namespaces in XML, a W3C recommendation. An XML instance may contain element or attribute names from more than one XML vocabulary. If each vocabulary is given a namespace then the ambiguity between identically named elements or attributes can be resolved.

Because XHTML is a blend of XML and HTML, a namespace is also assumed. While HTML4 and earlier pages can use just the plain <html></html> tags, the opening HTML tag in all XHTML documents uses namespace information. Important validation note: This opening HTML tag must abut the end of your doctype, not be tabbed down to the next line in your code. Otherwise, the W3C Validator doesn't recognize the HTML opening tag properly, and you will get errors.

<html xmlns=”http://www.w3.org/1999/xhtml”>

HTML5, however, does not require a stated namespace.

Head Section Charset

A character encoding (charset) is a method of converting bytes into characters, which ad- dresses the issue of what abstract characters may be part of an HTML document. To validate or display an HTML/XHTML document, a program must choose a character encoding.

The basic charset is UTF-8, which is compatible with ASCII characters. Because the ISO character-sets are limited in size, and are not compatible in multilingual environments, the Unicode Consortium developed the Unicode Standard, which covers all the charac- ters, punctuations, and symbols in the world. We will focus on UTF-8, especially since it is also the charset recommended for HTML5.

ISO-8859-1 is also commonly used, since it allows for characters to be used from North America, Western Europe, Latin America, the Caribbean, Canada, and Africa. However, it has limits compared to UTF-8

Head Section Metadata

Metadata is loosely defined as information about data. Meta elements are HTML or XHTML elements used to provide structured metadata about a web page to browsers and search engines. These elements must be placed as tags in the head section of an HTML or XHTML document, and are always passed as name/value pairs. They may be used to specify page description, keywords and any other metadata, and most search engines use this data when adding pages to their search index.

Meta elements have significantly less effect on search engine results pages today than they did in the past, and their utility has decreased dramatically as search engine robots have become more sophisticated. This is due in part to the nearly infinite re-occurrence (keyword stuffing) of meta elements and/or to attempts by unscrupulous website placement consultants to manipulate (spamdexing) or otherwise circumvent search engine ranking algorithms.

HTML/XHTML documents still require several of these meta tags, the information from which is usually hidden to the visitors of the web page. Meta tags can include:

• Content type: This specifies that the page should be served with an HTTP header called “content-type” that has a value: “text/html'.

• Content StyleType: This specifies that the page should be served with an HTTP header called “content-style-type” content=”text/css”> in order to support Cascad- ing Styles.

• Author: Informs search engines of the web page author's name.

• Copyright: Informs search engines of the web page's copyright date.

• Description: Specifies a description of a web page, and usually shows up in search engine results as the summary after the web page title.

• Keywords: This meta element identifies itself as containing the “keywords” relevant to the document. It has little effect on ranking at any of the major search engines today; however if the keywords used in the meta can also be found in the page copy itself, Yahoo still refers to this tag.

• Language: Tells search engines what natural language the website is written in(English, Japanese, or French), as opposed to the coding language (HTML).

• Robots: Supported by several major search engines, robot metadata controls whether search engine spiders are allowed to index a page, or not, and whether they should follow links from a page, or not.