« Tech Digest

Tips for Technologists #5: Using HTML Well

Tips for Technologists is a series aimed at teaching you to engage with technology in best way possible. You can see all the Tips for Technologists articles here.

By Nick Ruffilo

Tip Level of Difficulty: Basic/Intermediate

As I noted in Tip #4: Understanding XML/HTML/CSS, HTML is core to the foundation of the web, and will continue to be for a long time due to its simplicity yet ability to represent so many things. This post is geared towards Intermediate/Advanced level HTML users, but I will try to explain in a way that Beginners can understand the concepts.

HTML was originally built as a stand alone markup language, so it had both structured and display-based tags. A structured tag is a <p>, <div>, <span>, etc where a display tag is a <b>, <i>, <center>, or even <font> tag. Some of the display tags — such as fonts, are deprecated which means it will be removed at some point in the future, and is really only supported to be backwards-compatible (allowing things written a long time ago to still work).

IMPORTANT CONCEPT: Just because you can, doesn’t mean you should.

I can’t stress this enough, and it doesn’t just apply to HTML. There is so much bad code, and deprecated functions, around so that old code doesn’t break. In some cases it may be “shorter” to use it (or more comfortable if you’re simply used to it) but avoid it at all costs.

I’m going to re-iterate, because it is important:

Just because you can, doesn’t mean you should.

Specifically how this applies to HTML is in the use of display tags. The problem with a display tag is that it provides no context. Context is important now (Increases SEO, searchability, and value of your content) but in the very near future, context will be KING. Ponder the two following HTML snippets:

<p>The <b>mitochondria</b> is the powerhouse of the cell.</p>

VERSUS:

<p>The <span class=”key_term” id=”mitochondria”>Mitochondria</span> is the powerhouse of the cell.</p>

While the 2nd bit of code won’t actually bold the word “Mitochondria”, a simple CSS declaration: “.key_term { font-weight: bold; }” solves that problem, but we now have a HUGE amount of additional context. Why is this important. Let me list the reasons:

  1. Because we are using contextual classes, we can change ALL of the “key_terms” in your content.  Want them to be italic instead of bold? Do it with 1 line of CSS, not replacing all <b> tags with <i> tags.
  2. Since we used a contextual class, we can create an index with a very simple script. We can extract (programmatically) a list of all key_terms in your content and with an HTML link, link directly back to that term (because we have an ID on it).
  3. We can create a summary. We can utilize xpath/DOM to get the entire paragraph where the key_term exists to show the entire definition/context of the term.

The list can go on, but those demonstrate some HUGE reasons. Using simply a <b> tag doesn’t cut it.

But what if I add a class and ID to the <b> tag?

You could do that, but the B tag implies BOLD.  If, later you decide to over-ride that, and style it simply as italic, you now have HTML that doesn’t match up with display—and that is bad form and hard to read.  Again, just because you can, doesn’t mean you should.

A note about auto-formatting engines:

I’m currently writing this in WordPress, which means, that all the formatting above (the bold/italics) are using the <strong> and <em> tags (alternatives for <b> and <i>). WordPress does give me the ability to click “HTML” and modify the HTML. When done writing this (as well as other articles), I will go back and clean up the code and add classes and spans to the content. Because I don’t have access to the global CSS files, I’ll actually include the CSS inline. Also — I would normally use H1, H2, H3… tags, but, to avoid any conflict with the global CSS styles, I’ll be using classes and <span>/<div> tags.

Understand the Semantics of HTML5:

All modern browsers support all of the semantic tags of HTML5. EPUB 2 supports most and gracefully degrades (treats unknown tags as either DIV or SPAN tags).  EPUB 3 supports all HTML5. So, there is little/no reason NOT to use these tags/structures. Mozilla (Firefox) has done a WONDERFUL job summarizing the tags, so check out this link.

I previously mentioned how the document structure should be, so I will now cover how the article (content within the <body>) tag should look:

<section>

<article>

<hgroup>

<h1>…</h1> [...]

</hgroup>

<p>[your actual text here]</p>

</article>

</section>

While the structure above is a suggestion, the best guidelines I can give you is not to think about your content as a page, but as a compartmentalized piece of information. Your website may have a header, a footer, a sidebar with links, a banner ad, etc. There was a time when you developed all of your content for 800×600 dimension screens. Then 1024×768. Now, with the massive adoption of web on phone, tablets, and desktops, there is no longer a one-size-fits-all solution for laying your content out. There are adaptive design techniques (and layouts) but they can only be used if your content is formatted properly.

A note about future-proofing your content:

The world is going to change and adapt. While the search engines may not recognize certain identifiers, tags, or concepts NOW, that doesn’t mean that they won’t in the future. And, when they do, don’t you want your content to be at the top of the list? Also, since HTML is here to stay, newer and better tools will be coming out on a weekly basis. If you stick to a solid, logical structure, you’ll be able to utilize all of these new tools and their power. Although, if you use things like the <font> tag, expect newer tools to choke or poorly understand your content.

Another reason for doing things right:

If you are writing something, it is usually because you believe it has value. Not only to yourself, but to others. To not follow a structured and semantic model is to reduce the value you are providing and also hindering people from extracting full value. Have you ever opened an article on your phone only to have the text really small and when you try to zoom in, it forces you to scroll left/right to read the content? Maybe you are more patient than I, but I will not read that article. Someone (or many people) put effort into making and curating that content, and its value is lost to me due to bad UI/UX — which can easily be fixed with proper tagging and css.

 

Want more Tips for Technologists?  Join the daily mailing list and see additional content at ZenOfTechnology.com.  You can also see other Tips for Technologists articles here on Publishing Perspectives here.

This entry was posted in Tech Digest and tagged , . Bookmark the permalink. Both comments and trackbacks are currently closed.

One Comment

  1. NickBangO
    Posted January 23, 2013 at 3:30 pm | Permalink

    “the and tags (alternatives for and ).”

    No, they aren’t. And it is quite clearly stated in W3C doc.

    Bold and italics are presentational. They could have been depreciated, they were not because…
    Emphasize and strong, though styled like italics and bold (default values picked by browsers’ devs), doesn’t replace them in any way, and most importantly convey meaning: to emphasize, and to emphasize stronger.
    And we must not forget cite is to be used for titles.

    In other words, each has its own functions. And you can rightfully use all five on the same page. So just because you can span anything doesn’t mean you should span anything. If tags were properly used, most shouldn’t have to use span at all, which is just an empty thing used a lot to duplicate or hack in order to achieve something other tags were made for.

    As for EPUB, since a hell of a lot of apps don’t support CSS styles properly (some even don’t support CSS at all), advising span instead of b, i, em or strong is like advising them to screw their book in a major way…

  • SIGN UP NOW!
    Enter your email address below to receive daily news updates from Publishing Perspectives.
    Click here to learn more about our newsletters
  • Monetize Your Backlist

    Organized by Publishing Perspectives

    Hear experts from publishing and technology discuss strategies and tools you can use to generate more revenue from your backlist content.

    What: Monetizing the Backlist event
    When: 9am–1pm on April 24, 2014
    Where: Scholastic Headquarters, NYC

    Buy your tickets now!