LIS901N: HTML Thomas Krichel 2003-01-05
Structure of talk • HTML • HTML standards and standard adherence much of which based on a paper by Brian Kelly, at http://www.ariadne.ac.uk/issue33/web-focus/
HTML and XHTML • HTML is the hypertext markup language • HTML is a markup language that is widely used on the Word Wide Web (WWW) • The latest, and probably last version of HTML is at http://www.w3.org/TR/html4/ • The WC3, the standard making body for the WWW, have issued XHTML, a replacement of HTML that is compatible with XML. • We will ignore XHTML for the rest of the course.
What is Markup? • Everything in a document that is not content. It can be give in two ways • 1: Procedural • Codes identify point size, style, font, etc. • Usually understood by defining tool • Example: M$ Word • 2: Descriptive • Describes purpose of text within the document • Chapter head, Paragraph, Section Head, TOC • Structure and Style are kept separate • Example: LaTeX, SGML
SGML • Standard Generalized Markup Language • Descriptive approach with three separate layers • structure: types of information in document • content: the information itself • style: matches typesetting with structure • Document Type Definition (DTD) • Defines the structure • Developed for the publishing industry by a group around Goldfarb. • So complicated that no software implements it fully
SGML Document Type Definition • Describes information the document handles • e.g Title,TOC, Chapter, Section • Relationships between fields • e.g. A Chapter contains Sections • Consistency • Logical structure • Information defined by tags
HTML • HyperText Markup Language • Defines an SGML DTD • Head, Title, Body, Paragraph, etc. • Headings, Bold, Italic, etc. • Table, List, Image, etc. • Links to other documents • Forms • Style applied by Web Browser • User has some control
HTML Tags • HTML markup is written as tags. Tags are written as pairs (typically) • begin with <atag> • end with </atag> • atag is the tag name • Can be nested • Can contain non-markup data • Tag names are case-insensitive, but it is best to use the same case, consistently, for human readability.
attributes to tags • <atag attribute_name_one=“value_one” attribute_name_two=“value_two”> • Here attribute_name_one and attribute_name_two are attribute names • and value_one and value_two are attribute values.
Common Tags • Always include the <HTML>…</HTML> tags • Comments start with <!-- and end with --!> • HTML documents • <HEAD> section • Info about the document • Info in header not generally rendered in display window • TITLE element names your Web page • <BODY> section • Page content • Includes text, images, links, forms, etc. • Elements include backgrounds, link colors and font faces • P element forms a paragraph, blank line before and after
common frame for pages • Put the following in your pages: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REChtml40/loose.dtd"> <HTML><HEAD><TITLE></TITLE></HEAD><BODY></BODY></HTML> • The first three lines are the SGML document type declaration that says which kind of HTML it is, we use version 4.0 • Close nested tags properly.
Headers • Headers <H1> to <H6> • Simple form of text formatting • Vary text size based on the header’s “level” • Actual size of text of header element is selected by browser • Can vary significantly between browsers • <CENTER> element • Centers material horizontally • Most elements are left adjusted by default
Text Styling • Underline style • <U>…</U> • Emphasis (italics) style • <EM>…</EM> • Strong (bold) style • <STRONG>…</STRONG> • <B> and <I> tags deprecated • Overstep boundary between content and presentation • Strikethrough with <DEL> • Superscript: <SUP> element • Subscript: <SUB> element
Line break • Use <p> to create a new paragraph • Use <br> to create a line break! • To have several <P> use • Align elements with ALIGN attribute • right, left or center • Example <p align="center”> </p> • You do not need to close <p> and <br>
Linking • Links inserted using the A (anchor) element • Requires HREF attribute which specifies the URL you would like to link to • <A HREF = “address”>…</A> • Can link to email addresses, using • <A HREF = “mailto: emailaddress”>…</A> • Note quotation mark placement • Example: <a href=“http://openlib.org/home/krichel/”> Thomas Krichel</a>
Uniform Resource Locator (URL) http://arcano.openlib.org/~krichel/sae.html” URL can be • Absolute – contain all parts of URL; • Relative – present path and file name relatively current file. Scheme Server name Pass File name
Scheme • http – Hypertext Transfer Protocol to access Web-pages • ftp – File Transfer Protocol to download the file from the net • mailto – to send electronic mail • File – to access file on a local hard disk (File scheme uses ///). • and others…
Relative URL (examples) • A file from the same folder as current file: “file.html” • A file from a subfolder of current folder: “images/picture.gif” • A file from another folder at the same hierarchical level: “../info/data.html” • same conventions as in UNIX!
Links inside document: anchors • Place the cursor in the desirable part of a page, where the link should bring visitors • Create an anchor <A NAME=“anchor_name”>Label text</A> • Label text is a text or image that should be referenced, i.e. where the link should bring the visitor to. • To link to the anchor, use • <A HREF=“#anchor_name”>Label text </A> or • <A HREF=“URL#anchor_name”>Label text </A>
Images • Insert image into page with the <IMG> tag, attributes: • SRC = location • BORDER (in pixels black by default) • ALT (text description for browsers that have images turned off or cannot view images, required) • location can be any URL, or a file name on the server machine • Pixel • Stands for “picture element” • Each pixel represents one addressable dot of color on the screen
Color • Preset colors (white, black, blue, red, etc.) • Hexadecimal code • First two characters for amount of red • Second two characters for amount of green • Last two characters for amount of blue • 00 is the weakest a color can get • FF is the strongest a color can get • Ex. black = #000000
background • Image background • <BODY BACKGROUND = “background_image_file”> • Image does not need to be large because the browser tiles the image across and down the screen • Color background • <body bgcolor=“color”> • color is an indication of color as previously explained.
Formatting Text With <FONT> • <FONT> allows to change font if browser allows it. <FONT> attributes: • COLOR="color" • SIZE • To make text larger, set SIZE = “+x” • To make text smaller, set SIZE = “-x” • x is the number of font point sizes • x is between 1 and 3 • FACE • Font of the text you are formatting • Be careful to use common fonts like Times, Arial, Courier and Helvetica • Browser will display default if unable to display specified font
Special Characters • Inserted as an entity reference • Format can be &code; • Ex. & • Insert an ampersand • Codes often abbreviated forms of the character • Codes can be in hex form • Ex. & to insert an ampersand http://www.w3.org/TR/REC-html40/sgml/entities.html has the list
Horizontal Rules • <HR> tag Inserts a line break directly below it • HR attributes: • WIDTH • Adjusts the width of the rule • Either a number (in pixels) or a percentage • SIZE • Determines the height of the horizontal rule • In pixels • ALIGN • Either left, right or center • NOSHADE • Eliminates default shading effect and displays horizontal rule as a solid-color bar
Tables A table is a matrix formed by the intersection of a number of horizontal rows and vertical columns. Column 1 Column 2 Column 3 Row 1 Row 2 Row 3 Slides prepared by K.Clarck
Cell Cell Cell Cell Cell Cell Cell Cell Cell Tables (continue…) The intersection of a column and row is called a cell. Cells in the same row or column are usually logically related in some way. Column 1 Column 2 Column 3 Row 1 Row 2 Row 3 Slides prepard by K.Clark
Tables (continue…) Container <TABLE> … </TABLE> Attributes: BORDER= n – the border thickness in pixels WIDTH=x – width of the table or a cell within the table in pixels or relative size to the screen display (0% to 100%)
Tables (continue…) • A table is formed row by row • To define a row <TR>…</TR> is used • Within a row table cells with data is determined by <TD>…</TD> or with headers by <TH>…</TH>
Simple Table (example) <TABLE> <TR> <TH>Month</TH> <TH>Quantity</TH> </TR> <TR> <TD>January</TD> <TD>130</TD></TR> <TR> <TD>February</TD> <TD>125</TD> </TR> <TR> <TD>March</TD> <TD>135</TD> </TR> </TABLE>
Tables (more complicated) • To span a cell across a few columns, use the attribute COLSPAN=n, where n is number of columns is used • To span a cell across a few rows use the attribute ROWSPAN=n, where n number of rows is used
Cell Attributes • FONT – establishes the font of a cell • ALIGN – determines horizontal alignment of cell content, accept values: “left”, “center”, or “right” • VALIGN - determines vertical alignment of cell content, accept values: “top”, “middle”, “bottom”, or “base line”
Purposes to use tables • To present tabular data • To create multicolumn text • To create captions for images • To create side bars Cells may contain various HTML containers: Images, Hyperlinks, Text, Objects, even Tables
why standards matter • Avoiding Browser Lock-in • Maximize Access To Browsers • Maximize Accessibility • Enhance Interoperability • Enhance Performance • Facilitate Debugging • Facilitate Migration “Arguing that a resource is almost compliant is like describing someone as almost a virgin!”
Components to add <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html> <head> <title></title> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/> </head> <body> </body> </html>
use validator service • http://validator.w3.org/ • or do it on wotan, Thomas has a validator installed there.
problem: Ampersand in URL • <!-- This is invalid! --> <a href="foo.cgi?chapter=1§ion=2">...</a> • This example generates an error for "unknown entity section" because the "&" is assumed to begin an entity. • To avoid problems with both validators and browsers, always use & in place of &: • <a href="foo.cgi?chapter=1&section=2">...</a>
problem: incorrect nesting • Elements in HTML cannot overlap each other. The following is invalid: • <B><I>Incorrect nesting</B></I> • The following is valid: • <B><I>Correct nesting</I></B>
problem: casing in doctype • In a doctype, the formal public identifier--the quoted string that appears after the PUBLIC keyword--is case sensitive. A common error is to use the following: • <!doctype html public "-//w3c//dtd html 4.0 transitional//en"> • Transitional uses different case: • <!doctype html public "-//W3C//DTD HTML 4.0 Transitional//EN">
problem: name attribute • The HTML 4.0 Specification did not allow a NAME attribute for a FORM or IMG element. However, the HTML 4.01 Specification allows them Thus, you can now use the following document type declaration if you use those attributes: • <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN” "http://www.w3.org/TR/html40/loose.dtd"> • Using a href as scr for an image is a stupid idea.
Using special characters • Compose the document entirely with US-ASCII characters. • Represent other than ASCII characters using character references of the form &#number; where number is the code number of the character in ISO 10646 (Unicode) in decimal notation. • Configure things so that the Web server sends the document with the HTTP headerContent-type: text/html;charset=utf-8
http://openlib.org/home/krichel Thank you for your attention!