Group

You can groups rows with the same value into one group or bucket.

group by
having
Task
Pending

`group by`

If you want to organize your data in groups and calculate some kind of aggregate statistics for these groups, the GROUP BY clause is what you need.

Suppose we’re running a bookstore and want to know how many books of different genres we have in stock. Our database includes a table that lists the books’ titles, genres, and stock quantity.

The visualization below shows how the GROUP BY clause creates groups from table data. We want to know the total quantity of books for each genre; thus, GROUP BY groups the books of the same genre and sums up the corresponding quantities. This creates a result table that lists genres and their total quantity of books in our stock.

Now it’s time for more specific examples of SQL queries with a GROUP BY clause.

We’ll use the book table, which stores the id, title, author, genre, language, price, and quantity of each novel we stock.

Example

create table book (
    id      int unique not null,
    title   text not null,
    author  text not null,
    genre   text not null,
    lang    char(2) not null,
    price   numeric(5,2) not null check (price > 0),
    qty     int not null check (qty >= 0)
);

insert into book (id, title, author, genre, lang, price, qty) values
(1, 'Les Trois Mousquetaires', 'Alexandre Dumas', 'adventure', 'fr', 11.90, 4),
(2, 'A Game of Thrones', 'George R.R. Martin', 'fantasy', 'en', 8.49, 5),
(3, 'Pride and Prejudice', 'Jane Austen', 'romance', 'en', 9.99, 2),
(4, 'Vampire Academy', 'Richelle Mead', 'fantasy', 'en', 7.99, 3),
(5, 'Ivanhoe', 'Walter Scott', 'adventure', 'en', 9.99, 3),
(6, 'Armance', 'Stendhal', 'romance', 'fr', 5.88, 1);

group by is used to bring together those rows in the table that have the same value in all the selected columns.

When you start building a query with group by, you can no longer use select * from table because you are creating a new table with the grouped rows.

And which values will be in the columns that are not grouped?

You can use aggregate functions to compute a single value for each group.

The most common aggregate functions is count which counts the number of rows in each group:

select genre, count(*)
from book
group by genre

Here, in the GROUP BY clause, we select our rows to be grouped by the column genre. Then the count(genre) function in the SELECT statement counts the number of rows within each group (i.e. each book genre), with the result displayed in the corresponding count(*) field:

genre	count(*)
adventure	2
fantasy	2
romance	2

Instead of counting the number of rows, you can also use other aggregate functions like sum to calculate the total number of books of each genre:

select genre, sum(qty) as total
from book
group by genre

The SUM(qty) function in the SELECT statement sums the qty values within each group (i.e. each book genre), with the result displayed in the corresponding total field:

genre	total
adventure	7
fantasy	8
romance	3

Using Aggregate Functions with `GROUP BY`

GROUP BY puts rows with the same value into one bucket. We usually want to compute some statistics for this group of rows, like the average value or the total quantity. To this end, SQL provides aggregate functions that combine values from a certain column into one value for the respective group.

So far, we’ve only used SUM() as our aggregate function for grouping the book titles in stock. However, this is not the only aggregate function you can use with GROUP BY. SQL also offers:

COUNT() to calculate the number of rows in each group.
AVG() to find the average value for each group.
MIN() to return the minimum value in each group.
MAX() to return the maximum value in each group.

Let’s see how the AVG() function works with GROUP BY. This time, we want to calculate the average price for books in each genre. We’ll start by visualizing the output we want to get.

We’re again grouping our books by genre, but this time we want to calculate the average book price in each genre. The SQL query for this looks as follows:

select genre, avg(price) AS avg_price
from book
group by genre;

This query creates a table with two columns (genre and avg_price), where the average price is calculated by averaging the price values for the books of each genre:

genre	avg_price
adventure	10.945
fantasy	8.24
romance	7.935

Actually, we are not limited to using only one aggregate function with a GROUP BY clause.

So, let’s add the information about the minimum and maximum price of the books in each genre:

select genre, min(price) as min_price, avg(price) as avg_price, max(price) as max_price
from book
group by genre;

The result table now contains four columns: genre, min_price, avg_price, and max_price.

genre	min_price	avg_price	max_price
adventure	9.99	10.945	11.9
fantasy	7.99	8.24	8.49
romance	5.88	7.935	9.99

Note that when using GROUP BY the SELECT statement may only include fields that are either used in the aggregate function or listed in the GROUP BY clause.

For example, in our case, we cannot add title or author to our result set. To do so make no sense, as each row in our output table includes information on several books with different titles and authors.

Two columns

In SQL, you can also group your data using several columns. For example, let’s say we want to group our books not only by gender but also by language.

select genre, lang, count(*)
from book
group by genre, lang;

The result table now contains three columns: genre, lang, and count(*). Each row in the table represents the number of books of a certain genre in a certain language.

genre	lang	titles
adventure	en	1
adventure	fr	1
fantasy	en	2
romance	en	1
romance	fr	1

`having`

The HAVING clause filters groups of rows created by GROUP BY.

It’s similar to WHERE, but instead of filtering individual rows, it filters the aggregated results.

You typically use HAVING when you want conditions on things like:

total sales per customer
number of orders per day
average score per student

In all those cases, the condition refers to an aggregate (SUM, COUNT, AVG, …), so it must go into HAVING, not WHERE.

Basic Example: Sales by Customer

Imagine a sales table:

create table sale (
    customer    text not null,
    product     text not null,
    quantity    int not null,
    price       numeric(10,2) not null,
    sale_date   date not null
);

insert into sale (customer, product, quantity, price, sale_date) values
    ('david', 'Laptop',   1, 1200.00, '2024-01-05'),
    ('david', 'Mouse',    2,   25.00, '2024-01-05'),
    ('susan', 'Monitor',  1,  250.00, '2024-01-06'),
    ('susan', 'Laptop',   1, 1200.00, '2024-01-10'),
    ('emma', 'Keyboard', 1,   80.00, '2024-01-15'),
    ('emma', 'Mouse',    1,   25.00, '2024-01-16');

We want all customers whose total spending is at least 500.

First, group by customer and compute total spending:

select customer, sum(quantity * price) as total_spent
from sale
group by customer;

Now add HAVING to keep only customers with total_spent >= 500:

select customer, sum(quantity * price) as total_spent
from sale
group by customer
having sum(quantity * price) >= 500;

HAVING filters after grouping. It sees one row per customer (with total_spent) and applies the condition there.

With SQLite (and other databases) SELECT aliases are allowed to be used in the GROUP BY clause as well.

So the previous query can also be written as:

select customer, sum(quantity * price) as total_spent
from sale
group by customer
having total_spent >= 500;

`WHERE` vs `HAVING`

WHERE filters rows before grouping.

HAVING filters groups after grouping.

You can use both in one query:

select customer, sum(quantity * price) as total_spent
from sale
where sale_date >= '2024-01-10'
group by customer
having sum(quantity * price) >= 500;

Tasks

Online Orders

create table orders (
    id        integer unique not null,
    order_date date not null,
    status    text not null
);

insert into orders (id, order_date, status) values
    (1, '2024-02-01', 'completed'),
    (2, '2024-02-01', 'completed'),
    (3, '2024-02-01', 'cancelled'),
    (4, '2024-02-02', 'completed'),
    (5, '2024-02-02', 'completed'),
    (6, '2024-02-02', 'completed'),
    (7, '2024-02-03', 'pending'),
    (8, '2024-02-03', 'completed'),
    (9, '2024-02-03', 'completed'),
    (10,'2024-02-03', 'completed');

Days with More Than 3 Orders

Days with At Least 3 Completed Orders

select order_date, count(*) as completed_orders
from orders
where status = 'completed'
group by order_date
having count(*) >= 3
order by order_date;

Task

Titanic

On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone on board, resulting in the death of 1502 out of 2224 passengers and crew.

While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.

Import the Titanic dataset from titanic.csv.

Survival count by class

Find how many passengers survived and how many died in each passenger class.

Average fare by embarkation port

For each embarkation port, calculate the average fare passengers paid.

Children vs adults by sex

Count how many children, women, and men there were, broken down by sex.

Average age by survival and class

For each combination of survival status and passenger class, calculate the average age.

Classes with at least 50 survivors

For each passenger class, count how many passengers survived. Only show classes where at least 50 passengers survived.

Ports with high average fare

For each embarkation port, compute the average fare. Show only ports where the average fare is greater than 30.

Groups with at least 10 passengers

Group passengers by passenger class and sex. For each group, show: class sex number of passengers Only keep groups that have at least 10 passengers.

Pending

How to Group by Month in SQLite
Your Year in Data: How SQL Helps You Summarize 12 Months of Information
Using GROUP BY in SQL Now that you know SQL’s core commands, power up your queries with the GROUP BY clause and aggregate functions.
5 Examples of GROUP BY. When you start learning SQL, you quickly come across the GROUP BY clause. Data grouping—or data aggregation—is an important concept in the world of databases. In this article, we’ll demonstrate how you can use the GROUP BY clause in practice. We’ve gathered five GROUP BY examples, from easier to more complex ones so you can see data grouping in a real-life scenario. As a bonus, you’ll also learn a bit about aggregate functions and the HAVING clause.
SQL HAVING Tutorial. Learn how to use the SQL HAVING clause to filter groups using your own specified conditions.
What Is the SQL HAVING Clause?. Are you learning SQL? Are you wondering what you can use the HAVING clause for? Or, perhaps, have you tried to use WHERE on your GROUP BY aggregates? You are in the right place! In this article, we explain how to use HAVING with plenty of examples.
HAVING vs. WHERE in SQL: What You Should Know. This article is about SQL’s WHERE and HAVING clauses. Both clauses are part of the foundations of the SQL SELECT command. They have similar uses, but there are also important differences that every person who uses SQL should know. Let’s see what’s behind the HAVING vs. WHERE debate.

Sqlite - Group

group by