Menu Close

Articles

Understanding HTML Syntax

Once you understand this language you can create any static web-page using a plain text editor. Next you need to learn also CSS notation to make a page nice, then JavaScript to make the page dynamic. Let’s start with the basics:

Introduction to HTML

HTML = Hyper Text Markup Language

From the name you know that HTML is a language. More precise it is a markup language similar to XML. That means you should learn basic syntax of XML then you will better understand HTML. 

Understanding XML

XML stands for: Extensible Markup Language.

It is a data oriented language. It is not a Turing complete language so you can not make programs to resolve computation problems. Though an XML based scripting language exist, is called Apache ANT. 

In the next example you can investigate a message file, stored as XML:

XML Structure

The core idea behind XML: We use two markup symbols to separate keywords of the language: “<…>”. But XML language do not have keywords defined like in other languages. Instead every company, or developer can define specific keywords for a domain. Therefore this language is extensible.

XML Elements

In the example above you can see the XML is hierarchic. First line is a comment. You can ignore the comments. The <note> is a tag used to start one element. The element has an end tag </note>. This is the end of the element. Between the start and end of an element you can store the “content” of the element, explained below:

Element Content

The content of an element can be plain text or other elements. The elements can be side by site (siblings) or nested one inside the other on any number of levels. For example <to>, <from>, <body> are children elements of <note> that is the root element. The root element contains all other elements.

Element Attributes.

The attributes of an element are enumerated after the element name before the end markup “>” separated by space. For example: category=”secret” is an attribute named category with value secret. The values are always enclosed in double quotes. One element can have no attribute one or several attributes. Sometimes the attribute has no value. This is a Boolean attribute. If is present it has value True if is not present has default value that is False.

Back to HTML

This is the big secret that you need to know to understand HTML: It is XML with predetermined elements and attributes specific describing an Internet document. Therefore HTML is called XHTML. 

Let’s read an HTML example:

Note: In this example we have used some of the HTML “elements” called also “tags”. You can observe the names of the HTML elements are capitalized in this example. However HTML is not case sensitive. Modern HTML is using lowercase for element names and attribute names. This document contains one image, one link and several lines of text.

Open in codepen.io: HTML First example

As you can see HTML is clattered with symbols that you may not comprehend and can be difficult to read, especially when is not aligned properly by a human hand. We can use some tricks to make HTML more readable. For example we can split a paragraph in several lines.

The browser will eliminate the line breaks into a paragraph. The spaces are eliminated by the browser. So we can use indentation in the HTML code and this do not have any effect on the aspect of HTML document.

Used Elements:

  • <HTML> is the root element,
  • <HEAD> is the html header section,
  • <BODY> is the main content of the page,
  • <BR> is a line break, also known as new line,
  • <HR> is used to insert a horisontal line ,
  • <IMG> is used to insert a image,
  • <a href=””> is used to create a hyperlink,
  • <H1>, <H2>, <H3>are headers on different levels,
  • <P> is paragraph of text, it can span multiple lines of text.

There are 3 kind of elements in HTML: 

  • empty elements,
  • block elements,
  • in-line elements.

Empty elements

These are the most simple elements possible. They have only start tag and no end tag. We use these elements to produce an effect that will be otherwise difficult to represent in HTML.

  • <br> — will create a line break,
  • <hr> — will insert an horizontal line separator,
  • <img> — will insert an image in HTML page.

Notes:

  1. Notation for an empty element is using self-closing symbol: “/>”,
  2. Using close tag is invalid HTML: <img src=””></img>. 

Block elements

The <HTML> is the root element of HTML files. This has two children: <HEAD> and <BODY>. The spaces used for indentation are ignored. Actually all the spaces are ignored by a browser when rendering the HTML files.  Multiple spaces between words are reduced to one space.

Block elements can display a text on one or more lines. The block element can’t apply to a section of text. A block elements can have multiple other elements inside. The most common block elements are: paragraph <p> and division <div>.

A division block <div> is very useful to organize your web pages in nested blocks, like panels. Each <div> block can contain multiple other <div> blocks or paragraphs, html tables, input forms or images.

In line elements

In-line tags can apply to a small portion of text. This elements are usually not contain other tags only a small chunk of text.  In the example above we have used several inline tags: <B> is bold text and <I> is italic text. In-line tags can be combined (nested) and can have a cumulative effect if apply to the same text fragment.

WYSIWYG Editors

To improve productivity one can use a WYSIWYG tool to create HTML. This can be embedded editor into a website that is able to edit HTML. For example a blogger have access to buttons for text formatting. WordPress has an embedded editor that is created in JavaScript to aid HTML text authors create content much faster.

Only programmers are using plain text editors to create HTML fragments. If you use WYSIWYG editor like I do, you can learn HTML by looking at the source code. Sometimes you must fix syntax using text mode. I usually put “class” and other style attribute for <div> or <table> in text mode.

Special characters

Some characters can’t be included inside HTML tags. For example: the characters “<” and “>” are the markups in the XML language. So these characters are found by the HTML parser (that is the browser) and interpreted as tag markers. If used in a text content can confuse the parser. Therefore you must replace “<” with: &lt; and “>” with: &gt;

That is, special symbols in XML and in HTML are represented by a code starting with symbol “&” and ending with “;”. We can also use a number that is a special code for the symbol we wish to represent by using notation &#num; Here are other special characters that can be encoded in HTML text content:

SymbolDescriptionCodeNumber
 non-breaking space&nbsp;&#160;
<less than&lt;&#60;
>greater than&gt;&#62;
&ampersand&amp;&#38;
double quotation mark&quot;&#34;
single quotation mark (apostrophe)&apos;&#39;
¢cent&cent;&#162;
£pound&pound;&#163;
¥yen&yen;&#165;
euro&euro;&#8364;
©copyright&copy;&#169;
®registered trademark&reg;&#174;

Now you should be more familiar with HTML notation.

Read next: Most significant HTML tags