hand on keyboard

Tips for Technologists #9: SQL – Unleash the Power of Your Data

In Tech Digest by Nick Ruffilo

Tips for Technologists is a series aimed at teaching you to engage with technology in best way possible. You can see all the Tips for Technologists articles here.

By Nick Ruffilo

Tip Level of Difficulty: Beginner/Intermediate

hand on keyboardSQL stands for Structured Query Langauge, and is grammar used to fetch data from databases.  Beyond insanely large-data database (such as google) and very old databases, nearly every database system supports SQL (although there are different variants and levels of support, but, key concepts hold true).  The first part of this tutorial will be an introduction to SQL and the later parts will be much more advanced topics, tips, and tricks.

SQL in your daily life

SQL is a Query language, which basically means that you ask a question, and get a response in the form of a set of data. It is not a programming language in the sense that a programming language is a set of instructions to tell a computer what to do. SQL is a way to ask a question and get an answer. When it comes to basic SQL there are three types of commands: SELECT, INSERT and UPDATE. Can you guess what they do? If you’re doing reporting, you’re rarely need to UPDATE/INSERT so I’ll talk about using the SELECT command.

Your data, give it to me! Only if you ask nicely…

SQL relies heavily on a strict grammar.  If you follow that grammar, it will give you what you asked for. That grammar looks like this:

SELECT [something] FROM [somewhere] WHERE [these conditions are met]

So, if you wanted to get a list of your users’ first & last names where their birthday is 1980-1-29 it would be:

SELECT first_name, last_name FROM users WHERE birthday = ‘1980-01-29’ (this assumes you have a table named “users” and the first and last names fields are “first_name” and “last_name” but you’ll have to adjust to your data set.)

While this may not be the case – every organization should give their employees access to their data.  With that said, there are a few caveats:

  1. Keep sensitive data sensitive — If your organization stores credit card, social security, or any other personal information, this should not be available to all employees
  2. Unless it is necessary, keep it anonymous — Rarely do you need to know the first name, last name, or email or a customer to work on aggregate stats, so to protect your customer’s identity, this data should not be shared with everyone.
  3. Have multiple, isolated, data servers for employees — Never give employees access (even read only) to production or development servers. All it takes is one or two poorly written SQL statements to lock the server up for a few seconds (or even minutes). Never risk the customer experience. Secondarily, other employees need access to data, so having secondary servers is always a good idea.
  4. Read Only with access to create temp tables — If you have a dedicated DBA, you may want to set up a unique user for each employee and give them their own set of temp tables (or semi-permanent tables) but if you’re doing data analysis, you really only need access to SELECT and possibly create temp tables.

Keys and Indexes are the Keys and Indexes to success

This is a tip for people working on database structures, but the concepts are useful to understand for anyone working with databases (getting data). Databases are really smart, but they need a bit of guidance from you. Every table should have it’s own unique identifier and every table should have the same naming convention. DO NOT USE THE WORD “ID” FOR YOUR UNIQUE IDENTIFIER FOR EACH TABLE. I’ve seen this, it’s terrible, and it makes queries impossible to understand. In the future, I’ll post about building a proper database structure, but this is a high-level discussion.

Your unique ID will be your primary key. Until recently, mySQL did not support foreign keys, but more recent versions do. A foreign key/foreign key constraint is a way to link data in two different tables. A nice “how to” on foreign keys can be found here. A constraint will prevent you from deleting data that is necessary, and will make doing join queries much simpler.

Beyond keys, the tool that will speed up your searches the most is proper indexing. Indexes allow you to tell your database engine what you will be searching on, so that it can store the data in the most logical/searchable way. There are a few types of indexes (in mySQL, may vary for other systems):

  • Primary — Your primary key is and always should be your unique identifier. You can only have one primary key for each table. It is also unique (meaning you can’t have two rows with the same value)
  • Index — This just tells the database engine to store the values in the most easily searchable way.
  • Unique — Unique is an index that requires that no two rows have the same value. If you try to insert a value into a field that already exists in the table, you will get an error.
  • Fulltext — Fulltext is a way to index text in a way that can be searched with greater options than a standard text field. There are restrictions when using fulltext though (I believe you cannot use mySQL server replication on tables with fulltext index). Basically, research fulltext before you decide to use it for a professional app.

Using proper indexing can reduce a query time from seconds to milliseconds. But, you need to make sure that you are using the indexes properly. For example, if you put an index on a text field (a varchar, or any field holding letters), it will index it alphabetically. So, doing a search: “WHERE first_name LIKE ‘nic%'” will use the indexing, whereas doing a search: “WHERE first_name LIKE ‘%abby%’ ” will result in a full table scan. A full table scan is when every row in the table is scanned to see if it matches your criteria.  This sounds like a bad thing, but it is not, although it should be avoided. For 10,000 records, a full table scan could take 0.04 seconds whereas an indexed search can take 0.0008 seconds (times based off a test query with table ~9,400 rows). Clearly .04 seconds is acceptable, but 0.0008 is orders of magnitude better, and, as your data gets larger, and your application makes more and more queries, all performance gains are welcomed.EXPLAIN it to me

The most eye opening moment for me with SQL was the EXPLAIN command. It gives you metadata on what your command actually does. It tells you what (if any) keys are utilized, how many rows were read to complete the query, and a few other useful bits of information. To use it, simply add EXPLAIN to the front of any query. This is invaluable in figuring out why a query is taking a long time. (I’ve used it to take queries that were taking 19 seconds long, and by adding proper indexing got them down to .002 seconds!) It also gives you a glimpse into how the database is thinking, and when you understand how a tool works, you can best utilize its capabilities.Note to developers: Know your database engines

Unless you are a large organization, you’re most likely using mySQL as your database engine. In fact, some large organizations use mySQL, although it has its limitations. Since you don’t really need to deal with it until your data or usage gets really high, most people don’t really pay attention to what database engine to use. Here’s a quick, handy guide. It is much easier to think about what you will use your database for when creating it than it is to try to change it later. While choosing a less-efficient engine for your purpose will not destroy performance, there can be 10-25% reduction in query times/insert times by choosing the right engine.

About the Author

Nick Ruffilo

Nick Ruffilo is currently the CIO/CTO of Aerbook.com. He was previously Product Manager at Vook and CTO of BookSwim.