Excerpt: Self-Guided SQL (Upcoming Book)

If you work with big data, SQL is an essential skill. It’s also a great introductory language for those wanting to learn programming. The good news is that it’s easy to learn and you don’t have to spend big money to do it.

My upcoming book, Self-Guided SQL: Build Your SQL Skills with SQLite, will help you to build your own skills through the most effective learning method that exists: practice. This book provides hands-on experience in writing SQL statements against three different sample databases including the Chinook music database and the Sakila movie database. It also includes enough explanation so that you’ll understand what you’re doing every step of the way.

Many books focus on theory while presenting the exercises almost as an afterthought. Self-Guided SQL provides active learning for those wanting to jump in and use their new skills.

Subscribe to our newsletter to get updates about this and other new projects!

If you’d like to find out more about this upcoming book, check out the official page here on ComeauSoftware.com. You can download a free sample of the first two chapters along with the sample files and sign up to be notified when the book is released.

Meanwhile, enjoy this bonus excerpt from the book in which I explain database normalization.

Joining Tables

Earlier in the book, I told you about how SQLite works with relational databases in which data is grouped into tables by subject. You’ve seen this already in the sample databases where you have a table for customer data, another for employee data and separate tables for orders, films, invoices, etc..

The rules by which data is organized into this table is called database normalization. It’s a system that was developed starting in 1970 by computer scientist Edgar F. Codd. The main purpose is to reduce or eliminate repeated data and unnecessary dependencies between data fields.

An example of normalization would be the albums and tracks tables in the Chinook database. Every album is going to have multiple tracks on it so we need some way to indicate which album a track is on. This is called a one-to-many relationship. In a spreadsheet like Excel, you might list the album name for each track but this leads to possible errors in different records if the album name is misspelled and that leads to problems searching the data.  It also takes up more space in the file than is necessary.

The relationship between the albums and tracks tables enables many tracks to be listed for each album.

A relational database handles this differently. You’ll notice that the album table has an AlbumID with a unique value for each album. The Tracks table also has an AlbumID field with values that match the values in the Albums table. This kind of cross-reference allows for a SQL query that will read the album name from the albums table where it’s recorded only once and reference it for each track on the album.

SELECT         t.Name, t.Composer,
               Milliseconds / 1000 AS 'Time (seconds)'
FROM           tracks t
INNER JOIN     albums a
ON             a.AlbumId = t.AlbumId
WHERE          a.title = 'Jagged Little Pill'
The results of a query join on the albums and tracks tables.

In the query above, the INNER JOIN … ON syntax joins the two tables on their mutual AlbumID field and selects three fields from tracks that match the ID held by a specific album title. So, it’s possible to search the tracks table by the album title even though the tracks table only holds the ID number, saving space and avoiding possible typos and other errors.

Also, notice that the tables are assigned aliases, t for tracks and a for albums. They’re used throughout the query to refer to the tables. Joined tables can often have field names in common and it’s necessary to specify which table a field is being referenced in.  The short aliases make this easy and keep the query readable.

It’s not necessary for you to know all the rules of database normalization at this point but you do need to be aware of how it is used in databases like the ones we’ve been using. There can be different degrees of normalization in different databases. The Sakila database goes so far as to create a separate table for all the cities used in the database and assign them a reference ID in the address table which is then referenced by the customer table. It’s also possible for a database to be over-normalized but that depends on the application it’s designed for.

In the sample files for this book, you’ll see a diagram for each of the sample databases. These diagrams show the tables and the links between them. In the clip below, you’ll see the relationship between the customers and invoices tables in Chinook where the Invoices table has a CustomerId field so the customer’s name, address and other info can be referenced instead of repeated.

A portion of the Chinook diagram showing the relationship between the Customer and Invoices tables.

These diagrams make it easier to determine the relationships at a glance but, with some experience, you will be able to look at the tables in a database and determine the likely relationships that exist even without documentation. Table relationships are often enforced and maintained by the database software but they can also exist informally in some database systems, working because whatever program is providing the data ensures that the data is consistent.

In this chapter, you’ll get a lot of practice with the different types of joins that you can use in different situations to get the data that you need. You’ll also pick up some more of the terminology used to describe relationships and the fields involved.

Sign up for our newsletter to receive updates about new projects, including the upcoming book "Self-Guided SQL"!

We respect your privacy and will never share your information with third-parties. See our privacy policy for more information.