Aggregate Functions & GROUP BY
Learn how to summarize and analyze data using SQL's powerful aggregate functions. Master COUNT, SUM, AVG, MIN, MAX, GROUP BY, and HAVING to transform raw data into meaningful insights.
What are Aggregate Functions?
Aggregate functions perform calculations on multiple rows of data and return a single result. They're essential for data analysis, reporting, and business intelligence.
Real-Life Analogy: The Report Card
Imagine you're a teacher with 100 students. You have all their individual test scores, but you need to summarize them:
- COUNT: How many students took the test? (100 students)
- SUM: What's the total of all scores combined? (8,450 points)
- AVG: What's the average score? (84.5 points)
- MAX: What's the highest score? (98 points)
- MIN: What's the lowest score? (62 points)
Instead of looking at 100 individual scores, aggregate functions give you the summary statistics you need!
🔍 Behind the Scenes: How Aggregates Work
When you run an aggregate function, the database:
- Scans all rows that match your WHERE conditions (if any)
- Collects the values from the specified column
- Performs the calculation (count, sum, average, etc.)
- Returns a single aggregated result
This happens incredibly fast, even on millions of rows, thanks to database optimization!
COUNT()
Counts the number of rows
Use case: "How many customers do we have?"
SUM()
Adds up numeric values
Use case: "What's our total revenue?"
AVG()
Calculates the average
Use case: "What's the average order value?"
MIN()
Finds the minimum value
Use case: "What's our lowest price?"
MAX()
Finds the maximum value
Use case: "What's our highest salary?"
GROUP BY
Groups rows for aggregation
Use case: "Sales by department"
COUNT() Function
The COUNT() function returns the number of rows that match your criteria. It's one of
the most frequently used aggregate functions.
COUNT() Syntax:
COUNT(*) -- Counts all rows
COUNT(column_name) -- Counts non-NULL values in a column
COUNT(DISTINCT column) -- Counts unique non-NULL values
COUNT(*): Counts all rows, including those with NULL valuesCOUNT(column): Counts only rows where the column is NOT NULLCOUNT(DISTINCT column): Counts unique values only
Try It: Count All Employees
How many employees work at our company?
Try It: Count Employees by Department
How many employees are in Engineering?
Try It: Count Distinct Departments
How many unique departments do we have?
🔍 COUNT(*) vs COUNT(column) - What's the Difference?
COUNT(*):
- Counts every row in the result set
- Includes rows where all columns are NULL
- Generally faster because it doesn't need to check column values
COUNT(column_name):
- Counts only rows where the specified column is NOT NULL
- Useful when you want to know how many rows have actual data in a specific column
- Example: COUNT(email) tells you how many employees have an email address on file
Real-World Example: E-commerce Analytics
Imagine you run an online store. Here's how COUNT() helps:
-- How many orders did we receive today?
SELECT COUNT(*) AS orders_today
FROM orders
WHERE order_date = CURRENT_DATE;
-- How many unique customers made purchases this month?
SELECT COUNT(DISTINCT customer_id) AS unique_customers
FROM orders
WHERE order_date >= DATE('now', 'start of month');
-- How many products have reviews?
SELECT COUNT(review_text) AS products_with_reviews
FROM product_reviews;
SUM() Function
The SUM() function adds up all numeric values in a column. It's perfect for calculating
totals like revenue, expenses, or quantities.
SUM() Syntax:
SUM(column_name)
- Only works with numeric columns (INTEGER, DECIMAL, FLOAT, etc.)
- Ignores NULL values automatically
- Returns NULL if no rows match (not zero!)
Real-Life Analogy: Your Bank Account
Think of SUM() like calculating your total savings:
- Checking account: $1,200
- Savings account: $5,000
- Investment account: $3,500
SUM() = $9,700 (total across all accounts)
In SQL, you'd be adding up rows instead of bank accounts!
Try It: Total Salary Expense
What's the total annual salary expense for all employees?
Try It: Sum with WHERE Condition
What's the total salary for the Engineering department?
Try It: Multiple Calculations
Let's calculate both total and average salary in one query:
🔍 Watch Out for NULL Values!
SUM() automatically ignores NULL values:
-- If you have these values: 100, 200, NULL, 300
-- SUM() returns: 600 (not 600 + NULL)
-- COUNT(*) returns: 4 rows
-- COUNT(column) returns: 3 rows (excludes NULL)
This is usually what you want, but be aware that NULL values don't affect the sum!
Real-World Example: Sales Dashboard
-- Total revenue for the year
SELECT SUM(order_total) AS annual_revenue
FROM orders
WHERE STRFTIME('%Y', order_date) = '2024';
-- Total quantity of products sold
SELECT SUM(quantity) AS total_units_sold
FROM order_items;
-- Revenue by product category
SELECT
category,
SUM(price * quantity) AS category_revenue
FROM order_items
JOIN products ON order_items.product_id = products.id
GROUP BY category;
AVG() Function
The AVG() function calculates the arithmetic mean (average) of numeric values. It's
incredibly useful for understanding typical values in your data.
AVG() Syntax:
AVG(column_name)
- Only works with numeric columns
- Ignores NULL values (just like SUM)
- Returns NULL if no rows match
- Formula: AVG = SUM(values) / COUNT(non-null values)
Real-Life Analogy: Test Scores
You got these quiz scores: 85, 90, 88, 92
Average = (85 + 90 + 88 + 92) / 4 = 355 / 4 = 88.75
SQL's AVG() does exactly this calculation across all rows in your table!
Try It: Average Salary
What's the average salary across all employees?
Try It: Average by Department
What's the average salary in Sales?
Try It: Rounded Average
Let's round the average to 2 decimal places for cleaner results:
🔍 AVG() and NULL Values
Understanding how AVG() handles NULL is crucial:
-- Suppose you have these salaries: 50000, 60000, NULL, 80000
-- AVG() calculates: (50000 + 60000 + 80000) / 3 = 63,333
-- NOT: (50000 + 60000 + 0 + 80000) / 4 = 47,500
-- NULL is ignored entirely!
-- This is usually correct, but be aware of it.
To treat NULL as zero, use: AVG(COALESCE(salary, 0))
Real-World Example: Business Intelligence
-- Average customer order value
SELECT ROUND(AVG(order_total), 2) AS avg_order_value
FROM orders;
-- Average product rating
SELECT
product_name,
ROUND(AVG(rating), 1) AS avg_rating
FROM products
JOIN reviews ON products.id = reviews.product_id
GROUP BY product_name;
-- Average days to ship orders
SELECT AVG(JULIANDAY(shipped_date) - JULIANDAY(order_date)) AS avg_ship_days
FROM orders
WHERE shipped_date IS NOT NULL;
MIN() and MAX() Functions
The MIN() and MAX() functions find the smallest and largest values in a
column. They work with numbers, dates, and even text!
MIN() and MAX() Syntax:
MIN(column_name) -- Returns the smallest value
MAX(column_name) -- Returns the largest value
- Work with numeric, date/time, and text columns
- Ignore NULL values
- For text: MIN = alphabetically first, MAX = alphabetically last
- For dates: MIN = earliest date, MAX = latest date
Real-Life Analogy: The Competition
Imagine a marathon race:
- MIN(finish_time): Who finished first? (fastest time)
- MAX(finish_time): Who finished last? (slowest time)
- MIN(age): Who's the youngest runner?
- MAX(age): Who's the oldest runner?
MIN and MAX help you find the extremes in your data!
Try It: Salary Range
What are the highest and lowest salaries in the company?
Try It: Salary Range with Context
Let's add more context with multiple aggregates:
Try It: MIN/MAX with Dates
Find the newest and oldest employees (by hire date):
🔍 MIN/MAX on Different Data Types
Numbers: Smallest/largest numeric value (10 < 100 < 1000)
Dates: Earliest/latest date (2020-01-01 < 2024-12-31)
Text: Alphabetical order (Apple < Banana < Cherry)
-- Numbers
MIN(price) -- Could return 9.99
MAX(price) -- Could return 999.99
-- Dates
MIN(order_date) -- Returns earliest order
MAX(order_date) -- Returns most recent order
-- Text (alphabetical)
MIN(name) -- Could return "Alice"
MAX(name) -- Could return "Zoe"
Real-World Example: Product Catalog
-- Price range for products
SELECT
category,
MIN(price) AS cheapest,
MAX(price) AS most_expensive,
MAX(price) - MIN(price) AS price_range
FROM products
GROUP BY category;
-- Date range of orders
SELECT
MIN(order_date) AS first_order,
MAX(order_date) AS latest_order,
JULIANDAY(MAX(order_date)) - JULIANDAY(MIN(order_date)) AS days_in_business
FROM orders;
-- Inventory levels
SELECT
MIN(stock_quantity) AS lowest_stock,
MAX(stock_quantity) AS highest_stock
FROM products
WHERE stock_quantity > 0;
GROUP BY Clause
The GROUP BY clause is where aggregate functions become truly powerful! It groups rows
with the same values so you can calculate aggregates for each group separately.
GROUP BY Syntax:
SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1;
- Groups rows that have the same values in specified columns
- Allows you to calculate aggregates per group
- Columns in SELECT must be either in GROUP BY or inside an aggregate function
Real-Life Analogy: Sorting Laundry
Imagine you have a pile of mixed laundry:
- Without GROUP BY: "I have 50 pieces of clothing total" (one big pile)
- With GROUP BY: "I have 20 shirts, 15 pants, 10 socks, 5 towels" (sorted into categories)
GROUP BY organizes your data into meaningful categories before counting/summing!
Try It: Count by Department
How many employees are in each department?
Try It: Average Salary by Department
What's the average salary in each department?
Try It: Multiple Aggregates by Group
Let's see comprehensive statistics for each department:
🔍 How GROUP BY Works Step-by-Step
When you run a GROUP BY query, the database:
- Filters rows using WHERE clause (if present)
- Groups rows that have identical values in GROUP BY columns
- Calculates aggregates for each group independently
- Returns one row per group with the aggregate results
-- Example data:
-- Engineering: Alice (95k), Bob (85k), Charlie (90k)
-- Sales: Diana (75k), Eve (80k)
-- With GROUP BY department:
-- Group 1 (Engineering): COUNT=3, AVG=90k
-- Group 2 (Sales): COUNT=2, AVG=77.5k
Important Rule: SELECT and GROUP BY
Every column in SELECT must be either:
- Listed in the GROUP BY clause, OR
- Inside an aggregate function
✅ Correct
SELECT department, COUNT(*)
FROM employees
GROUP BY department;
department is in GROUP BY
❌ Wrong
SELECT department, name, COUNT(*)
FROM employees
GROUP BY department;
name is not in GROUP BY or aggregate
Try It: Group by Multiple Columns
You can group by multiple columns! Count employees by department AND hire year:
Real-World Example: Sales Analytics
-- Monthly revenue
SELECT
STRFTIME('%Y-%m', order_date) AS month,
COUNT(*) AS total_orders,
SUM(order_total) AS revenue
FROM orders
GROUP BY STRFTIME('%Y-%m', order_date)
ORDER BY month DESC;
-- Top selling products
SELECT
product_id,
product_name,
SUM(quantity) AS units_sold,
SUM(price * quantity) AS revenue
FROM order_items
JOIN products ON order_items.product_id = products.id
GROUP BY product_id, product_name
ORDER BY revenue DESC
LIMIT 10;
-- Customer purchase frequency
SELECT
customer_id,
COUNT(*) AS order_count,
AVG(order_total) AS avg_order_value,
SUM(order_total) AS lifetime_value
FROM orders
GROUP BY customer_id
ORDER BY lifetime_value DESC;
HAVING Clause
The HAVING clause filters groups after aggregation, similar to how WHERE filters
individual rows before aggregation.
HAVING Syntax:
SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1
HAVING aggregate_condition;
- Filters groups based on aggregate values
- Used after GROUP BY
- Can use aggregate functions in conditions
Real-Life Analogy: The Guest List
Imagine you're organizing a party and grouping guests by city:
- WHERE: "Only invite people over 21" (filter individuals before grouping)
- HAVING: "Only include cities with 5+ guests" (filter groups after counting)
You filter individuals with WHERE, then filter groups with HAVING!
Try It: Departments with Many Employees
Which departments have more than 2 employees?
Try It: High Average Salary Departments
Which departments have an average salary above $85,000?
Try It: Multiple HAVING Conditions
Departments with 3+ employees AND average salary > $80,000:
🔍 HAVING vs WHERE: Execution Order
Understanding when each clause executes is crucial:
- FROM: Select the table
- WHERE: Filter individual rows (before grouping)
- GROUP BY: Group the remaining rows
- HAVING: Filter groups (after aggregation)
- SELECT: Choose columns to display
- ORDER BY: Sort the final results
This is why HAVING can use aggregate functions but WHERE cannot!
Real-World Example: Business Intelligence
-- High-value customers (spent over $1000)
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(order_total) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(order_total) > 1000
ORDER BY total_spent DESC;
-- Popular products (sold 100+ units)
SELECT
product_id,
product_name,
SUM(quantity) AS units_sold
FROM order_items
JOIN products ON order_items.product_id = products.id
GROUP BY product_id, product_name
HAVING SUM(quantity) >= 100
ORDER BY units_sold DESC;
-- Active cities (10+ orders this month)
SELECT
city,
COUNT(*) AS order_count,
SUM(order_total) AS revenue
FROM orders
JOIN customers ON orders.customer_id = customers.id
WHERE order_date >= DATE('now', 'start of month')
GROUP BY city
HAVING COUNT(*) >= 10
ORDER BY revenue DESC;
WHERE vs HAVING: Key Differences
Understanding when to use WHERE vs HAVING is crucial for writing correct queries.
WHERE Clause
Filters: Individual rows
Timing: Before GROUP BY
Use: Filter data before aggregation
Functions: Cannot use aggregates
HAVING Clause
Filters: Grouped results
Timing: After GROUP BY
Use: Filter groups after aggregation
Functions: Can use aggregate functions
Comparison Examples
WHERE (Before Grouping)
-- Filter rows first, then group
SELECT department, AVG(salary)
FROM employees
WHERE salary > 70000
GROUP BY department;
Calculates avg for employees earning 70k+
HAVING (After Grouping)
-- Group first, then filter groups
SELECT department, AVG(salary)
FROM employees
GROUP BY department
HAVING AVG(salary) > 70000;
Shows depts where avg salary is 70k+
Try It: WHERE Only
Count high earners (>$85k) in each department:
Try It: HAVING Only
Departments where average salary is above $85,000:
Try It: Both WHERE and HAVING
Filter individuals first, then filter groups:
🔍 Performance Tip: WHERE is Faster!
When possible, use WHERE instead of HAVING for better performance:
-- SLOWER: Filter after grouping
SELECT department, COUNT(*)
FROM employees
GROUP BY department
HAVING department = 'Engineering';
-- FASTER: Filter before grouping
SELECT department, COUNT(*)
FROM employees
WHERE department = 'Engineering'
GROUP BY department;
WHERE reduces the dataset before expensive GROUP BY operations!
Quick Decision Guide:
- Use WHERE when: Filtering individual rows (salary > 80000, department = 'Sales')
- Use HAVING when: Filtering groups with aggregates (AVG(salary) > 80000, COUNT(*) > 5)
- Use both when: You need to filter rows AND groups
Multiple Aggregates in One Query
You can combine multiple aggregate functions in a single query to build comprehensive reports and dashboards.
Try It: Complete Department Statistics
Get all statistics for each department in one query:
Try It: Conditional Aggregates
Count different categories using CASE inside aggregates:
🔍 Creating Custom Metrics
You can create calculated fields using aggregate results:
SELECT
department,
COUNT(*) AS employees,
SUM(salary) AS total_payroll,
-- Average salary per employee
ROUND(SUM(salary) / COUNT(*), 2) AS avg_salary,
-- Percent of total company payroll
ROUND(100.0 * SUM(salary) / (SELECT SUM(salary) FROM employees), 2) AS pct_of_total
FROM employees
GROUP BY department;
Real-World Example: Executive Dashboard
-- Comprehensive sales report
SELECT
STRFTIME('%Y-%m', order_date) AS month,
COUNT(*) AS total_orders,
COUNT(DISTINCT customer_id) AS unique_customers,
SUM(order_total) AS revenue,
ROUND(AVG(order_total), 2) AS avg_order_value,
MIN(order_total) AS min_order,
MAX(order_total) AS max_order,
-- Revenue per customer
ROUND(SUM(order_total) / COUNT(DISTINCT customer_id), 2) AS revenue_per_customer
FROM orders
WHERE order_date >= DATE('now', '-12 months')
GROUP BY STRFTIME('%Y-%m', order_date)
ORDER BY month DESC;
Combining Aggregates with JOINs
The real power of SQL comes from combining JOINs with aggregate functions to analyze related data across multiple tables.
Pattern: JOIN then GROUP
SELECT table1.column, aggregate_function(table2.column)
FROM table1
JOIN table2 ON table1.id = table2.foreign_key
GROUP BY table1.column;
First JOIN the tables, then GROUP BY and aggregate the results!
Try It: Count Employees and Projects
How many employees work on each project?
Try It: Department Summary with Location
Show department statistics along with their location:
Try It: Employee Salary Distribution
Show employee count distribution by salary ranges:
🔍 Watch Out for JOIN Multiplication!
Be careful when joining tables with one-to-many relationships:
-- WRONG: This can count duplicates!
SELECT department, SUM(salary)
FROM employees
JOIN project_assignments ON employees.id = project_assignments.employee_id
GROUP BY department;
-- If an employee has 3 projects, their salary gets counted 3 times!
-- CORRECT: Use DISTINCT or structure your query differently
SELECT department, SUM(DISTINCT salary)
FROM employees
JOIN project_assignments ON employees.id = project_assignments.employee_id
GROUP BY department;
Real-World Example: E-commerce Analytics
-- Customer purchase summary
SELECT
c.id,
c.name,
c.email,
COUNT(o.id) AS order_count,
COALESCE(SUM(o.order_total), 0) AS lifetime_value,
ROUND(AVG(o.order_total), 2) AS avg_order_value,
MAX(o.order_date) AS last_order_date
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name, c.email
ORDER BY lifetime_value DESC
LIMIT 20;
-- Product performance by category
SELECT
cat.name AS category,
COUNT(DISTINCT p.id) AS product_count,
COUNT(oi.id) AS times_ordered,
SUM(oi.quantity) AS units_sold,
SUM(oi.price * oi.quantity) AS revenue
FROM categories cat
JOIN products p ON cat.id = p.category_id
LEFT JOIN order_items oi ON p.id = oi.product_id
GROUP BY cat.id, cat.name
ORDER BY revenue DESC;
Practice Exercises
Test your understanding of aggregate functions with these hands-on exercises!
Exercise 1: Total and Average Salary
Calculate the total and average salary for all employees. Round the average to 2 decimal places.
Exercise 2: Count Employees in Sales
How many employees work in the Sales department?
Exercise 3: Salary Statistics by Department
For each department, show the minimum, maximum, and average salary. Order by average salary descending.
Exercise 4: Large Departments
Find departments with more than 3 employees. Show department name and employee count.
Exercise 5: High-Paying Departments
Find departments where the average salary exceeds $80,000. Include department name, employee count, and average salary.
Exercise 6: Projects by Status
Count how many projects exist for each status. Show status and count, ordered by count descending.
Exercise 7: Salary Range Analysis
For employees earning over $70,000, count how many are in each department. Only show departments with at least 2 such employees.
Exercise 8: Complete Department Report
Create a comprehensive report showing: department, employee count, total payroll, average salary, and salary range (max - min). Order by total payroll descending.
Knowledge Check Quiz
Test your understanding of aggregate functions:
1. What does the COUNT(*) function return?
2. Which clause groups rows for aggregation?
3. What's the difference between WHERE and HAVING?
4. How does SUM() handle NULL values?
5. Which query finds departments with more than 5 employees?
6. What does COUNT(DISTINCT column) do?
7. In a SELECT with GROUP BY, columns must be either:
8. What does AVG(salary) calculate?