Tips for Technologists #4: Understanding XML/HTML/CSS

In Tech Digest by Nick Ruffilo

Tips for Technologists is a series aimed at teaching you to engage with technology in best way possible. You can see all the Tips for Technologists articles here.

By Nick Ruffilo

Tip Level of Difficulty: Basic

This article is targeted at beginners but anyone who uses HTML can benefit from the concepts outlined.

Defining our terms

XML – eXtendable Markup Language. XML is a structured way of storing data by using tags. A tag is simply a name wrapped in < >. <book> would be a book tag, <title> would be a title tag.

HTML –Hyper Text Markup Language. HTML is a specific and well defined XML structure. To summarize, it is a list of specific tags and their expected meaning.

CSS – Cascading SyleSheets. CSS is a specific and well defined set of display attributes. These attributes are applied to HTML markup to define how it is displayed to the user.

XHTML – A strict version of HTML that must conform to all XML standards. Essentially the same as HTML except that, if you fail to close a tag, the content will fail to render, whereas in HTML, the browser will often just make an assumption and close the tag for you.

The Basics of HTML and why its important for all to know

HTML is the language of the web (at least when it comes to display of content). Every website, ebook, and now many apps are driven in some way by HTML. While you may never have to write HTML by hand, knowing its limitations and potential are extremely important. All HTML tags have a start and end. The start is <TAG> and the end if </TAG>. Anything within is considered the value of that tag.

I break HTML tags into the following groups (this is by NO MEANS a complete list. Below are just common tags):

  • Display tags – These tags change the way text is displayed: <b> or <strong> make content bold. <i> or <em> make content italic. <img> tag inserts an image.
  • Grouping Tags – These are tags that logically and semantically group content. <p> groups content in a paragraph, <H1> <H2> <H3> … <H6> denote different levels of headers
  • Structure Tags – These tags are not displayed to the user, but contain metadata about the document or define the structure of the document. <head> tag defines the header, <title> defines the title of the document (must be located in the <head>), <body> defines body content (what is displayed in the view).
  • Control Tags – These tags define interaction of the document. The <a> tag can define a link or a destination (enabling linking within a specific part of a document).
  • Comments – Comments are code that is ignored by the rendering system but there to help people reading the code know what’s going on. Comments are unclosed and start with <!-‍- and end with -‍->
  • Other – Other tags such as <script> and <style> allow you to define javascript, css, and other non-HTML that will control how the content is displayed/interacted with.

Attributes and Values
I previously discussed how the content between a tag was its value. A tag can also have attributes. Attributes are separate from the value and exist in the opening tag. A tag with an attribute follows this form: <tag attribute=”attribute value”>tag value</tag>. A common tag is the ID tag which gives a unique name to your tag. Example: <p id=”first_paragraph”>My first paragraph</p>

Structure of the Document
Every HTML document will have the following structure: I will use <!-‍- -‍-> comments to make notes
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.1//EN” “http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd”> <!-‍- This tells the web browser or interpreting program that the document is an HTML document and it is using the XML structure xhtml11.dtd -‍->
<html>

<head>
<title>My HTML Document</title>
<!-‍- javascript and CSS files would be included to the document here -‍->
</head>
<body>
<!-‍- Body content would go here. Anything here will be displayed -‍->
<p>This is my first and only paragraph.</p>
</body>

</html>

Simple enough, no? A wonderful feature of every web browser is the ability to right-click and “view page source.” This will show you the HTML code for the page you are currently looking at. Since most pages today, the code is actually rendered from some server-side script (such as the HTML you are viewing now, which was generated by wordpress), it is often very complex, but it still follows the above form. While it may have been 20 years ago, I actually learned HTML (and javascript) using the “view source” button.

CSS – Making your content beautiful
HTML is very limiting when it comes to controlling the display of content. While there used to be a <FONT> tag, it has been removed from the HTML spec (it is still supported by most browsers though). Almost all display of your HTML content is controlled by CSS. CSS can be declared in the following ways.

  1. Inline using the “style” attribute – You can put your CSS directly in a tag using the style element. Example: <p style=”text-indent:32px;”>This text will be indented 32 pixels.</p> While this makes the document easy to quickly read, the styles defined work only on the one tag. Inline styling is rarely recommended.
  2. Through IDs in your CSS file – While IDs must be unique in each HTML file, you can reuse the same ID over multiple files, and have the same CSS file for each of those files. The CSS declaration would be “#first_paragraph { text-indent: 32px; }” and the HTML would be: <p id=”first_paragraph”>This text will be indented 32 pixels.</p> The use of # in css denotes an ID as a selector.
  3. Through Tag selection in your CSS file – HTML offers a few logical semantic tags, such as <h1> <h2> <h3>… which define different levels of headers. By simply using these tags, in your CSS you can define how your headers display. Your CSS would look like this:
    h1 { font-size:48px; }
    h2 { font-size: 42px; }
    h3 { font-size: 36px; }
    And your HTML would look like this:
    <h1>Large heading (48 pixels)</h1>
    <h2>Smaller sub-heading (42 pixels)</h2>
    <h3>Still big, but smallest sub-sub heading (36 pixels)</h3>
    Now, every h1, h2, and h3 tag will all have the same look. Additionally, if you want to change the style, it is defined once, in a common CSS file, so you can quickly and easily change it. This is a highly preferred  usage.
  4. Through Class selection in your CSS file – While IDs are unique, elements can also have classes, which may be shared. In HTML we define our list of classes (separated by spaces) in the “class” attribute. Ex: <p class=”big_text_indent font_green”>This text will be indented and green</p>. Our CSS declaration would be:
    .big_text_indent { text-indent: 36px; }
    .green { font-color: rgb ( 0, 255, 0); }
    As you see, two classes were defined and applied to that paragraph. Through classes, we can easily define another paragraph later in the HTML document using: <p class=”big_text_indent”>I will be indented but NOT green</p>

You can actually combine selectors to make them more specific, such as “.big_text_indent b {font-color: rgb(255,0,0); }” which will take all <b> tags that are contained within something with the class “big_text_indent” and make the text red. I will post later on using advanced CSS selectors, but know that there are some really complex selection options.

How to read API Documentation

I linked above to the HTML and CSS documents, and both can seem QUITE daunting due to their length and form. The truth is, they are alot to swallow and are geared towards very technical people (because they are meant to be used by people writing HTML readers to ensure that everything works correctly.) If the terms you see in these documentation pages make your head spin, don’t be afraid to close the tab, open google, and search for “HTML quick reference” or “CSS quick reference.” There are alot of tutorials geared towards different audiences and different levels that are freely available.

About the Author

Nick Ruffilo

Nick Ruffilo is currently the CIO/CTO of Aerbook.com. He was previously Product Manager at Vook and CTO of BookSwim.