Subsections


2.3 Syntax

Having selected a text editor for entering code, we are ready to begin writing. But we still have to learn what to write; we need to learn a computer language.

Te first thing we need to learn about a computer language is the syntax. Consider the following sentence in the human language called English:

The chicken iz to hot too eat!

This sentence has some technical problems. First of all it contains a spelling mistake--“iz” should be “is”--but even if we fix that, there are grammatical errors because the words are not in a valid order. The sentence still does not make sense:

The chicken is to hot too eat!

The sentence is only a valid English sentence once both the spelling and grammar are correct:

The chicken is too hot to eat!

When we write code in a computer language, we call these spelling or grammatical rules the syntax of the language.

It is very important to note that computers tend to be far less flexible than humans when it comes to comprehending language expressions. It is possible for a human to easily understand the original English sentence, even with spelling and grammatical errors. Computers are much more fussy and we will need to get the language syntax perfect before the computer will understand any code that we write.


2.3.1 HTML syntax

This section describes the syntax rules for the HTML language. This information will allow us to write HTML code that is correct in terms of spelling and grammar.

HTML is nice because it is defined by a standard, so there is a single, public specification of HTML syntax. Unfortunately, as is often the case, it is actually defined by several standards. Put another way, there are several different versions of HTML, each with its own standard. We will focus on HTML 4.01 in this book.

HTML has a very simple syntax. HTML code consists of two basic components: elements, which are special HTML keywords, and content, which is just normal everyday text.

2.3.1.1 HTML elements

An element consists of a start tag, an end tag and some content in between. For example, the title element from Figure 2.2 is shown below:

<title>
    Poles of Inaccessibility
</title>

There is a start tag <title>, an end tag </title>, and plain text content.

The title element in a web page is usually displayed in the title bar of the web browser, as can be seen in Figure 2.1.

Some HTML elements may be “empty”, which means that they only consist of a start tag (no end tag and no content). An example is the img (short for “image”) element from Figure 2.2, which inserts the plot in the web page.

<img src="poleplot.png">

The entire img element consists of this single tag.

There is a fixed set of valid HTML elements and only those elements can be used within HTML code. We will encounter several important elements in this chapter and a more comprehensive list is provided in Chapter 3.

2.3.1.2 HTML attributes

HTML elements can have one or more attributes, which provide more information about the element. An attribute consists of the attribute name, an equals sign, and the attribute value, which is surrounded by quote marks. We have just seen an example in the img element above. The img element has a src attribute that describes the location of a file containing the picture to be drawn on the web page. In the example above, the attribute is src="poleplot.png". Many attributes are optional and if they are not specified a default value is provided.

HTML tags must be ordered properly. All elements must nest cleanly and some elements are only allowed inside specific other elements. For example, a title element can only be used inside a head element, and the title element must start and end within the head element. The following HTML code is invalid because the title element does not finish within the head element:

<head>
    <title>
    Poles of Inaccessibility
</head>
    </title>

Finally, there are a few elements that must occur in an HTML document: there must be a DOCTYPE declaration, which states what computer language we are using; there must be a single html element, with a single head element and a single body element inside; and the head element must contain a single title element. Figure 2.3 shows a minimal HTML document.

Figure 2.3: A minimal HTML document.
 

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
    <head>
        <title>A Minimal HTML Document</title>
    </head>
    <body>
        Your content goes here!
    </body>
</html>

2.3.2 Escape sequences

In all computer languages, certain words or characters have a special meaning within the language. These are sometimes called reserved words to indicate that they are reserved by the language for special use and cannot be used for their normal human-language purpose. This means that some words can never be used when writing in a computer language or, in other cases, a special code must be used instead. We will see reserved words and characters in all of the computer languages that we meet. This section describes some examples for HTML.

The content of an HTML element (whatever is written between the start and end tags) is mostly up to the author of the web page, but there are some characters that have a special meaning in the HTML language so these must be avoided. For example, the < character marks the start of an HTML tag, so this cannot be used for its normal meaning of “less than”.

If we need to have a less-than sign within the content of an HTML element, we have to type &lt; instead. This is an example of what is called an escape sequence.

Another special character in HTML is the greater-than sign, >. To produce one of these in the content of an HTML element, we must type &gt;.

All HTML escape sequences are of this form: they start with an ampersand, &. This means of course that the ampersand is itself a special character, with its own escape sequence, &amp;. A larger list of special characters and escape sequences in HTML is given in Chapter 3.

Paul Murrell

Creative Commons License
This document is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.