Bottom Line - Practice SQL by solving real problems.
Table of Contents
Chinook Sample Database
I’m going to use a modified Chinook Sample Database that I created for the examples below. Because I’m working with Ruby on Rails, I created a Rails project that sets up all the Chinook tables and data.
I modified the tables slightly to make them fit the Rails standard by:
- pluralizing table names.
- renaming the primary key in each table to ‘id’.
- changing column names to lowercase and adding underscores between words (snake case).
- adding Rails timestamps to each table (created_at and updated_at)
The original Chinook Database is also available.
You’re going to fetch data using the select statement in SQL. Select is what you want SQL to do for you. Let’s try asking SQL to do some basic math for us.
It’s a bit annoying to have to do all these calculations separately. You can add them all to one select statement as long as you separate them by a comma.
select 50, 5+3, 2*10, 2^4;
We’re not limited to only numbers in the select statement. We can also use SQL functions and display text strings.
Let’s use the
concat function to concatenate the value 50 and the string ‘cents’.
select 2+2, 'fifty', concat(50, ' ', 'cents');
Aliasing Column Names
Our table names aren’t very descriptive. Let’s change (alias) the name of the first and the last column using
select 2+2 as simple_arithmetic, 'fifty', concat(50, ' ', 'cents') as concat_result;
You could also use double quotes for aliasing. For example, instead of
simple_arithmetic, you could write
You can use spaces with double quotes.
In the real world, you’re going to want to
select data from database tables and do something with it. We do this with a
Let’s select the id and the name columns from the artists table.
(Show table schematic)
select id, name from artists;
You can easily change the order of the listed columns. Let’s say I want to show the name followed by ID. No problem.
select name, id from artists;
What if you want to display all the columns in that table without having to write them out manually in the
That’s where the
* (splat) comes in.
select * from artists;
|1||AC/DC||2021-02-13 22:05:35.844418||2021-02-13 22:05:35.844418|
|2||Accept||2021-02-13 22:05:35.847811||2021-02-13 22:05:35.847811|
|3||Aerosmith||2021-02-13 22:05:35.849931||2021-02-13 22:05:35.849931|
You’ll see a few ways of writing these select statement in the wild. One way is to specify the table name, dot, and then a splat or column name.
select artists.* from artists
You can also explicitly define the table where you’ll be getting your columns from right in the select statement.
select artists.id, artists.name from artists
It’s not needed if you’re just querying one table. However, we’ll be joining multiple tables very soon and this technique comes handy.
SQL will let you order your rows ascending and descending with the
order by clause.
Let’s say I want to select the id and the name from the artists table and order by name in ascending order (alphabetical order).
select artists.id, artists.name from artists order by name
|43||A Cor Do Som|
|230||Aaron Copland & London Symphony Orchestra|
You have two options to order by ascending
asc and descending
Let’s order by name but in a descending order (reverse-alphabetical).
select id, name from artists order by name desc
By default, SQL orders the rows by the primary key of the table, which is typically the
Note that when using
order by, SQL will order in ascending order by default. For example:
order by name and
order by name asc are exactly the same.
Try to select all columns from the artists table and order the id in descending order:
select * from artists order by id desc
|275||Philip Glass Ensemble||2021-02-13 22:05:36.197593||2021-02-13 22:05:36.197593|
|274||Nash Ensemble||2021-02-13 22:05:36.196375||2021-02-13 22:05:36.196375|
|273||C. Monteverdi, Nigel Rogers - Chiaroscuro; London Baroque; London Cornett & Sackbu||2021-02-13 22:05:36.195165||2021-02-13 22:05:36.195165|
|272||Emerson String Quartet||2021-02-13 22:05:36.193975||2021-02-13 22:05:36.193975|
Limits and Offsets
Running a query and listing all the rows can be risky if your table has millions of rows. You’ll needlessly stress that database server and piss off the database administrator.
Instead we can
limit our queries to give us a limited number of rows.
select * from artists order by id desc limit 4
We can also
offset rows. Let’s say we want to get rows 3 to 7. Offsetting rows is useful when paginating results on a website.
select id, name from artists limit 5 offset 2
|5||Alice In Chains|
|6||Antnio Carlos Jobim|
where clause is powerful because you can start to filter down what you want to see in your SQL query.
For example, let’s select a name column from the artists table where the name is Metallica. Read this sentence and compare to how similar the SQL query below sounds.
select name from albums where name = 'Metallica'
Multiple Where / AND
Can you select the id and the name columns from artists where id is equal to 5, 10, 30?
One way to do it is to give multiple where clauses separated by
or. You’re telling SQL that the
can be 5 or 10 or 30.
select id, name from artists where id = 5 or id = 10 or id = 30
A less verbose and more straightforward way to do this is to use the
in takes an array of 5, 10, 30 and gives a result.
select id, name from artists where id in (5, 10, 30)
|5||Alice In Chains|
Let’s take a look at the
AND operator. Select the tracks where the composer is ‘Foo Fighters’ and the track length is greater than 300000 milliseconds.
select * from tracks where composer = 'Foo Fighters' and milliseconds > 300000
|1014||Tired Of You||81||1||4||Foo Fighters||311353||10094743||0.99||2020-10-23 23:38:36.402027||2020-10-23 23:38:36.402027|
|1015||Halo||81||1||4||Foo Fighters||306442||10026371||0.99||2020-10-23 23:38:36.402027||2020-10-23 23:38:36.402027|
|1019||Come Back||81||1||4||Foo Fighters||469968||15371980||0.99||2020-10-23 23:38:36.402027||2020-10-23 23:38:36.402027|
Now narrow down that table and select only the name, composer, milliseconds, and bytes. Also order the table by name in the ascending order.
select name, composer, milliseconds, bytes from tracks where composer = 'Foo Fighters' and milliseconds > 300000 order by name
|Come Back||Foo Fighters||469968||15371980|
|Tired Of You||Foo Fighters||311353||10094743|
Milliseconds are not really intuitive. Can you instead convert milliseconds into seconds show that as an additional column in the table report?
select name, composer, milliseconds, (milliseconds / 1000) as seconds, bytes from tracks where composer = 'Foo Fighters' and milliseconds > 300000 order by name
|Come Back||Foo Fighters||469968||469||15371980|
|Tired Of You||Foo Fighters||311353||311||10094743|
Let’s make the report even better and convert the bytes column into megabytes and seconds into minutes.
select name, composer, milliseconds, (milliseconds / 1000) / 60 as minutes, (bytes / 1024) as megabytes, bytes from tracks where composer = 'Foo Fighters' and milliseconds > 300000 order by name
|Come Back||Foo Fighters||469968||7||15011||15371980|
|Tired Of You||Foo Fighters||311353||5||9858||10094743|
You can easily mix
and statements. There are different ways to do this and some are cleaner than others.
Can you select name, composer, milliseconds, bytes from the tracks table where the composer is either Chuck Berry or Little Richard AND the track size in bytes is smaller than 8,000,000?
First way to do it is below. Notice how we use parenthesis to isolate the or statements together.
select name, composer, milliseconds, bytes from tracks where bytes < 8000000 and (composer = 'Chuck Berry' or composer = 'Little Richard')
A cleaner way is to use the
select name, composer, milliseconds, bytes from tracks where bytes < 8000000 and composer in ('Chuck Berry', 'Little Richard')
NOT (negation) and ! (not equal to)
You can also invert a where clause. There are two ways of doing this.
Select the id and name from the artists table where the artist name is not Aerosmith. Limit the rows to 5.
select * from artists where not name = 'Aerosmith' limit 5
|5||Alice In Chains|
|6||Antnio Carlos Jobim|
Note that the id equal to 3 is missing from the table. That was Aerosmith.
In many programming languages,
!= is not equal to. You can also write
<> instead of
!=. Let’s use it in our example below.
select id, name from artists where name != 'Aerosmith' limit 5
Nulls are empty cell in the database. To test whether a cell is empty, you can use either
is null or
is not null to test that it’s not empty. You can’t use
!= with nulls.
Let’s select the tracks where the composer fields are null.
select id, name, composer from tracks where composer is null
|2||Balls to the Wall|
|64||Garota De Ipanema|
|65||Samba De Uma Nota S (One Note Samba)|
|66||Por Causa De Voc|
Below are common operators used with SQL. They are used with the
|Operator||What it does|
|>=||greater than or equal|
|<=||less than or equal|
Pattern matching (string operators)
Pattern string matching is a powerful feature of a database. You can use
LIKE, where a string matches a pattern. There’s
ILIKE, which a case insensitive version of
Find the Artists that have the word ‘Orchestra’ in the name.
-- This is incorrect as it will search exactly for the word Orchestra. -- There are zero results select * from artists where name = 'Orchestra'
This is where you want to pattern match:
select id, name from artists where name like '%Orchestra%'
|192||DJ Dolores & Orchestra Santa Massa|
|210||Hilary Hahn, Jeffrey Kahane, Los Angeles Chamber Orchestra & Margaret Batjer|
|217||Royal Philharmonic Orchestra & Sir Thomas Beecham|
|220||Chicago Symphony Chorus, Chicago Symphony Orchestra & Sir Georg Solti|
|223||London Symphony Orchestra & Sir Charles Mackerras|
|224||Barry Wordsworth & BBC Concert Orchestra|
|229||Boston Symphony Orchestra & Seiji Ozawa|
|230||Aaron Copland & London Symphony Orchestra|
|233||Chicago Symphony Orchestra & Fritz Reiner|
|234||Orchestra of The Age of Enlightenment|
|235||Emanuel Ax, Eugene Ormandy & Philadelphia Orchestra|
|241||Felix Schmidt, London Symphony Orchestra & Rafael Frhbeck de Burgos|
|243||Antal Dorti & London Symphony Orchestra|
It gives you the artists name that have the word “Orchestra”. But what the heck is
%? It’s a wildcard. By putting
‘Orchestra’ says it doesn’t matter what comes before the word ‘Orchestra’. By putting it afterwards also says it doesn’t matter
what comes after ‘Orchestra’.
I usually use
ilike unless the word I’m pattern matching is case specific. For example, I could use
ilike below and
orchestra (lower case). It would return the same thing as above.
select id, name from artists where name ilike '%OrCHEstRA%'
The table below is taken from the postgresql docs. Here are some examples of
|‘abc’ LIKE ‘abc’||true|
|‘abc’ LIKE ‘a%’||true|
|‘abc’ LIKE ‘b’||true|
|‘abc’ LIKE ‘c’||false|
If you want to use regular expressions (REGEX), you can use
SIMILAR TO clause. That’s a topic beyond this scope. See more here.
To aggregate means to combine several elements together to get a whole. When building reports, you want sums, averages, and counts of many rows. This is where aggregate functions come into play.
Let’s start with the
count. Let’s say you want to count the number of rows in the artists table.
select count(*) from artists
The result is 275 rows! The
count is a function that takes a parameter. We’re passing in the
* splat, which represents a whole row.
What if we want to count the number of composers in the tracks tables and leave out the null values?
select count(*) from tracks; -- The count is 3503 select count(composer) from tracks; -- The count is 2525
Note that this is NOT the same count as counting all the rows in the tracks table because we’re counting the composer rows that are not NULL.
We can do multiple aggregate calculations.
Let’s count all the rows in tracks and sum up the unit_price for all the rows. Let’s also alias them to meaningful names.
select count(*) as track_count, sum(unit_price) as unit_price_sum from tracks
Here are some common aggregate functions you can use:
|Function||What it does|
|MIN||The minimum number|
|MAX||The maximum Number|
|SUM||sum of all the values|
|COUNT||count of the # of values|
|COUNT DISTINCT||count of the # of unique (distinct) values.|
Count DISTINCT will not double count rows that have the same value.
How many unique prices are there in the tracks table?
select count(distinct unit_price) from tracks -- Count is 2.
Can you aggregate the average, minimum, and maximum unit_price from the tracks table. Also alias them to more readable column names.
select avg(unit_price) as avg_price, max(unit_price) as maximum_price, min(unit_price) as minimum_price from tracks
Aside If you are a Ruby or Ruby on Rails developer, don’t use Ruby to aggregate thousands of rows of data. This is a common mistake. SQL is incredibly fast compared to Ruby when it comes to manipulating database values. Now that you know how to aggregate and group data, use SQL to do the calculation heavy lifting for you instead of iterating with an each block in Ruby.
Group By function is incredibly powerful in generating reports. It’s also a bit confusing to understand at first. Group by tells the database how to group a result set.
The question is — which composer has the most number of tracks? To answer this, let’s sum the number of tracks each composer has done in descending order.
select composer, count(*) from tracks where composer is not null group by composer order by count desc
Gotcha: If you’re mixing aggregate functions and normal column names, you’ll need a group by function in your statement otherwise you’ll get an error.
select composer, count(*) from tracks -- This will result in an error as you've mixed -- composer (column name) with an aggregate function (count*)
Can you find the minimum file size (bytes) of a track and group by composer?
select composer, min(bytes) from tracks group by composer
|Delroy “Chris” Cooper, , Luke Smith, Paul Watson||3426106|
|Bruce Dickinson/Janick Gers||3151872|
|James Hetfield, Lars Ulrich and Kirk Hammett||8024047|
|Pearl Jam & Eddie Vedder||9991106|
Let’s order that table by the minimum file size in ascending order.
select composer, min(bytes) from tracks group by composer order by min
You can see that Samuel Rosa has the smallest track size.
Let’s group by multiple values. Can you group composers by the maximum track time (milliseconds) and then by maximum file size (bytes)?
select composer, max(milliseconds) as max_time, max(bytes) as max_bytes from tracks where composer is not null group by composer order by max_time desc, max_bytes desc
It looks like Jimmy Page has tracks with the most track time and largest file size.
|Jimmy Page/Led Zeppelin||1116734||36052247|
|Gillan/Glover/Lord/Nix - Blackmore/Paice||913658||29846063|
|1||For Those About To Rock (We Salute You)||1||1||1||Angus Young, Malcolm Young, Brian Johnson||343719||11170334||0||05:37.6||05:37.6|
|2||Balls to the Wall||2||2||1||342562||5510424||0||05:37.6||05:37.6|
|3||Fast As a Shark||3||2||1||F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman||230619||3990994||0||05:37.6||05:37.6|
|4||Restless and Wild||3||2||1||F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. Dirkscneider & W. Hoffman||252051||4331779||0||05:37.6||05:37.6|