Chapter 10

Advanced SQL Techniques

Master the most powerful SQL features! Learn window functions, CTEs, and advanced patterns used by professionals.

🚀 Welcome to Advanced SQL

Congratulations on making it to the final chapter! 🎉 You've learned the fundamentals of SQL. Now it's time to unlock the advanced techniques that separate beginners from experts!

🎯 What Makes These Techniques "Advanced"?

Think of SQL like learning a musical instrument:

  • Beginner: You know the notes (SELECT, WHERE, JOIN)
  • Intermediate: You can play songs (Complex queries, subqueries)
  • Advanced: You can improvise and compose (Window functions, CTEs, complex analytics)

Advanced SQL lets you solve problems that would be nearly impossible with basic queries!

Window Functions

Perform calculations across related rows

Use case: Rankings, running totals, moving averages

CTEs

Create temporary named result sets

Use case: Simplify complex queries, recursive operations

Advanced Patterns

Clever techniques for common problems

Use case: Pivoting, deduplication, gaps & islands

Why Learn These Techniques?

  • 📊 Analytics: Generate reports and insights that basic SQL can't
  • Performance: Often faster than multiple queries or application logic
  • 💼 Career: These skills are highly valued in data roles
  • 🧩 Elegance: Solve complex problems with clean, readable code

🪟 Window Functions Introduction

Window functions perform calculations across a set of rows that are related to the current row - like looking through a "window" at related data!

🎯 Real-Life Analogy: Class Rankings

Imagine a teacher calculating student rankings:

  • Regular aggregate: "The class average is 85" - one number for everyone
  • Window function: "You scored 92, which ranks you #3 in the class" - each student sees their rank AND keeps their individual score

Key difference: Window functions give you aggregated info WITHOUT collapsing rows like GROUP BY does!

Window Function Syntax:

SELECT
    column1,
    column2,
    WINDOW_FUNCTION() OVER (
        PARTITION BY column3
        ORDER BY column4
    ) AS result
FROM table;
  • OVER: Indicates this is a window function
  • PARTITION BY: Divide rows into groups (optional, like GROUP BY)
  • ORDER BY: Define the order within each partition (optional)

Difference Between GROUP BY and Window Functions:

-- GROUP BY (collapses rows)
SELECT department, AVG(salary)
FROM employees
GROUP BY department;
-- Result: One row per department

-- Window Function (keeps all rows)
SELECT
    name,
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) as dept_avg
FROM employees;
-- Result: All rows, each showing their department's average

🔍 Common Window Functions:

  • ROW_NUMBER(): Assigns unique sequential numbers
  • RANK(): Assigns ranks with gaps for ties
  • DENSE_RANK(): Assigns ranks without gaps
  • SUM/AVG/COUNT(): Aggregate functions as window functions
  • LEAD/LAG(): Access next/previous row values

🔢 ROW_NUMBER() - Assigning Sequential Numbers

ROW_NUMBER() assigns a unique number to each row within a partition!

🎯 Real-Life Analogy: Race Positions

In a race, each runner gets a position number:

  • 1st place, 2nd place, 3rd place, etc.
  • Even if two runners have the same time, they get different positions
  • ROW_NUMBER() works the same way - always unique, sequential numbers!

Try It: ROW_NUMBER() Example

Output will appear here...

🔍 What Just Happened?

  1. PARTITION BY department: Split data into groups (Engineering, Sales)
  2. ORDER BY salary DESC: Within each department, sort by salary (highest first)
  3. ROW_NUMBER(): Assigned 1, 2, 3... within each department
  4. Alice is #1 in Engineering ($95k), Bob is #2, Eve is #3
  5. Frank is #1 in Sales ($75k), Carol is #2, David is #3

Practical Use Case: Finding Top N Per Group

Want to find the top 2 highest-paid employees in each department?

-- Step 1: Use ROW_NUMBER in a subquery
WITH ranked_employees AS (
    SELECT
        name,
        department,
        salary,
        ROW_NUMBER() OVER (
            PARTITION BY department
            ORDER BY salary DESC
        ) as rn
    FROM employees
)
-- Step 2: Filter for top 2
SELECT name, department, salary
FROM ranked_employees
WHERE rn <= 2;

This is a very common interview question! 💡

🏆 RANK() and DENSE_RANK()

RANK() and DENSE_RANK() handle ties differently than ROW_NUMBER()!

🎯 Real-Life Analogy: Olympic Medals

Imagine a race where two runners tie for 1st place:

  • ROW_NUMBER(): 1, 2, 3, 4 (one gets 1st, other gets 2nd arbitrarily)
  • RANK(): 1, 1, 3, 4 (both get 1st, next is 3rd - skips 2)
  • DENSE_RANK(): 1, 1, 2, 3 (both get 1st, next is 2nd - no skip)

Try It: Compare ROW_NUMBER, RANK, and DENSE_RANK

Output will appear here...

Results Breakdown:

Student Score ROW_NUMBER RANK DENSE_RANK
Alice 95 1 1 1
Bob 90 2 2 2
Carol 90 3 ⬅️ unique 2 ⬅️ tie 2 ⬅️ tie
David 85 4 4 ⬅️ skipped 3 3 ⬅️ no skip
Eve 85 5 4 3
Frank 80 6 6 ⬅️ skipped 5 4 ⬅️ no skip

🔍 When to Use Each:

  • ROW_NUMBER(): When you need unique numbers (deduplication, pagination)
  • RANK(): When ties should share the same rank, and you want gaps (traditional ranking)
  • DENSE_RANK(): When ties should share the same rank, but you don't want gaps (compact ranking)

📊 Aggregate Window Functions

Use familiar aggregates (SUM, AVG, COUNT) as window functions to calculate running totals, moving averages, and more!

🎯 Real-Life Analogy: Running Total

Imagine tracking your daily spending:

  • Day 1: Spent $20 → Running total: $20
  • Day 2: Spent $35 → Running total: $55
  • Day 3: Spent $15 → Running total: $70

Window functions let you calculate this running total in SQL without complex self-joins!

Try It: Running Total (Cumulative Sum)

Output will appear here...

🔍 What's Happening:

  • SUM(amount) OVER (ORDER BY date): Calculates cumulative sum
  • Day 1: 100
  • Day 2: 100 + 150 = 250
  • Day 3: 250 + 200 = 450
  • Each row sees the sum of all rows up to and including itself!

Try It: Moving Average

Output will appear here...

Window Frame Clauses:

Control which rows are included in the calculation:

  • ROWS BETWEEN 2 PRECEDING AND CURRENT ROW: Current + 2 previous rows
  • ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: All rows from start to current (running total)
  • ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING: Previous + current + next row

Real-World Example: Compare to Department Average

SELECT
    name,
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) as dept_avg,
    salary - AVG(salary) OVER (PARTITION BY department) as diff_from_avg
FROM employees;

Shows each employee's salary compared to their department average!

📝 Common Table Expressions (CTEs)

CTEs let you create temporary named result sets that you can reference in your main query!

🎯 Real-Life Analogy: Cooking with Prep Steps

Think of making a complex dish:

  • Without CTEs: "Mix eggs with flour, sugar, butter, vanilla while simultaneously chopping vegetables and marinating chicken..." (confusing!)
  • With CTEs: "Step 1: Make batter. Step 2: Chop vegetables. Step 3: Combine everything." (clear steps!)

CTEs break complex queries into readable, manageable steps!

CTE Syntax:

WITH cte_name AS (
    -- Your SELECT query here
    SELECT ...
)
SELECT * FROM cte_name;
  • WITH: Starts the CTE definition
  • cte_name: Name you give to your temporary result set
  • AS (...): The query that populates the CTE
  • Main query: Uses the CTE like a regular table

Try It: Basic CTE Example

Output will appear here...

🔍 Why Use CTEs?

  • Readability: Break complex queries into logical steps
  • Reusability: Reference the same CTE multiple times in one query
  • Maintainability: Easier to debug and modify
  • Recursive queries: Can reference themselves (advanced!)

Multiple CTEs in One Query:

WITH
high_value_customers AS (
    SELECT customer_id
    FROM orders
    GROUP BY customer_id
    HAVING SUM(amount) > 1000
),
recent_orders AS (
    SELECT *
    FROM orders
    WHERE order_date > '2024-01-01'
)
SELECT *
FROM recent_orders
WHERE customer_id IN (SELECT customer_id FROM high_value_customers);

You can define multiple CTEs separated by commas!

💡 CTE Practical Examples

Let's see real-world use cases for CTEs!

Example 1: Deduplication - Find Duplicates and Remove Them

Output will appear here...

🔍 Deduplication Explained:

  1. CTE ranks each email: First occurrence gets rn=1, duplicates get rn=2, 3, etc.
  2. WHERE rn = 1: Keeps only the first occurrence
  3. Alice's record from 2024-01-01 kept, 2024-01-05 removed
  4. Bob's record from 2024-01-02 kept, 2024-01-07 removed

Example 2: Comparing to Previous Row

Output will appear here...

LAG() and LEAD() Functions:

  • LAG(column): Gets value from previous row
  • LEAD(column): Gets value from next row
  • LAG(price, 2): Gets value from 2 rows back

Great for comparing sequential data (time series, rankings, etc.)

🔁 Recursive CTEs

Recursive CTEs can reference themselves - perfect for hierarchical data like organization charts or file trees!

🎯 Real-Life Analogy: Family Tree

To find all your descendants:

  1. Base case: Start with yourself
  2. Recursive step: Find your children
  3. Repeat: For each child, find their children
  4. Continue: Until no more descendants found

Recursive CTEs work the same way!

Recursive CTE Syntax:

WITH RECURSIVE cte_name AS (
    -- Base case (anchor)
    SELECT ...
    WHERE ...

    UNION ALL

    -- Recursive case
    SELECT ...
    FROM cte_name  -- References itself!
    WHERE ...
)
SELECT * FROM cte_name;

Try It: Generate Number Sequence

Output will appear here...

🔍 How It Works:

  1. Iteration 1: Base case returns 1
  2. Iteration 2: Recursive part adds 1 → 2
  3. Iteration 3: 2 + 1 → 3
  4. ... continues until...
  5. Iteration 10: WHERE n < 10 becomes false, stops

Real-World Example: Organization Hierarchy

Output will appear here...

Common Recursive CTE Use Cases:

  • 🌳 Organizational charts: Find all reports under a manager
  • 📁 File systems: List all files in nested folders
  • 🛣️ Graph traversal: Find all connected nodes
  • 📅 Date sequences: Generate calendar dates
  • 🔢 Number series: Fibonacci, factorials, etc.

🎨 Advanced SQL Patterns

Clever techniques for solving common real-world problems!

1. Pivot Tables

Turn rows into columns

Use case: Sales reports, crosstab data

2. Unpivot

Turn columns into rows

Use case: Normalize wide tables

3. Gaps and Islands

Find missing sequences

Use case: Missing dates, broken sequences

4. Running Differences

Calculate changes over time

Use case: Trends, growth rates

Pattern: Pivot - Turn Rows into Columns

Output will appear here...

🔍 How Pivot Works:

Original data (rows):

Laptop, Jan, 5000
Laptop, Feb, 6000
Phone, Jan, 3000
Phone, Feb, 3500

After pivot (columns):

Product  | Jan  | Feb  | Total
Laptop   | 5000 | 6000 | 11000
Phone    | 3000 | 3500 | 6500

The CASE statement creates conditional columns!

Pattern: Find Missing Dates (Gaps)

Output will appear here...

More Advanced Patterns:

  • Percentiles: Find median, quartiles (NTILE function)
  • Year-over-year growth: Compare same period different years
  • First/Last value: FIRST_VALUE(), LAST_VALUE() window functions
  • Conditional aggregation: Different aggregates based on conditions

Practice Exercises

Part 1: Guided Exercises

Guided Exercise 1: Rank Products by Sales

Task: Use DENSE_RANK() to rank products by total sales amount.

Output will appear here...

Part 2: Independent Practice

Challenge 1: Calculate Running Average

Difficulty: Hard

Task: Create a query that shows each day's temperature along with the running average of all days up to that point.

Output will appear here...

📝 Knowledge Check Quiz

1. What's the key difference between window functions and GROUP BY?

2. What does a CTE (Common Table Expression) do?

3. When would you use DENSE_RANK() instead of RANK()?

🎓 Course Conclusion - You Did It!

Congratulations! 🎉🎊

You've completed all 10 chapters of the SQL course! You've gone from absolute beginner to mastering advanced SQL techniques. That's a huge accomplishment!

What You've Learned:

Chapter 1-2

SQL basics, SELECT, WHERE, filtering data

Chapter 3-4

JOINs, relationships, aggregate functions, GROUP BY

Chapter 5-6

Subqueries, INSERT, UPDATE, DELETE

Chapter 7-8

CREATE/ALTER tables, indexes, performance optimization

Chapter 9

Transactions, ACID properties, data integrity

Chapter 10

Window functions, CTEs, advanced patterns

🚀 Where to Go From Here:

  • 💼 Practice, practice, practice: The best way to master SQL is by using it on real projects
  • 📊 Work with real datasets: Try Kaggle datasets, public databases, or your own projects
  • 🔍 Learn database-specific features: PostgreSQL, MySQL, SQL Server each have unique capabilities
  • Study query optimization: Learn about execution plans, query tuning
  • 🏗️ Database design: Master normalization, schema design patterns
  • 🔐 Security: Learn about SQL injection, permissions, encryption

Keep Learning:

  • 📚 Documentation: Read official docs for PostgreSQL, MySQL, SQLite
  • 💻 Practice platforms: LeetCode, HackerRank, SQLZoo
  • 🎯 Real projects: Build a database for a personal project
  • 👥 Community: Join SQL forums, Stack Overflow, Reddit r/SQL

Thank You! 🙏

Thank you for completing this course! You now have the SQL skills to build powerful data-driven applications. Keep practicing, keep learning, and most importantly - have fun with SQL!

Happy Querying! 🎉