Chapter 10: Working with Real Data

🚀 Welcome to Advanced SQL

Congratulations on making it to the final chapter! 🎉 You've learned the fundamentals of SQL. Now it's time to unlock the advanced techniques that separate beginners from experts!

🎯 What Makes These Techniques "Advanced"?

Think of SQL like learning a musical instrument:

Beginner: You know the notes (SELECT, WHERE, JOIN)
Intermediate: You can play songs (Complex queries, subqueries)
Advanced: You can improvise and compose (Window functions, CTEs, complex analytics)

Advanced SQL lets you solve problems that would be nearly impossible with basic queries!

Window Functions

Perform calculations across related rows

Use case: Rankings, running totals, moving averages

CTEs

Create temporary named result sets

Use case: Simplify complex queries, recursive operations

Advanced Patterns

Clever techniques for common problems

Use case: Pivoting, deduplication, gaps & islands

Why Learn These Techniques?

📊 Analytics: Generate reports and insights that basic SQL can't
⚡ Performance: Often faster than multiple queries or application logic
💼 Career: These skills are highly valued in data roles
🧩 Elegance: Solve complex problems with clean, readable code

🪟 Window Functions Introduction

Window functions perform calculations across a set of rows that are related to the current row - like looking through a "window" at related data!

🎯 Real-Life Analogy: Class Rankings

Imagine a teacher calculating student rankings:

Regular aggregate: "The class average is 85" - one number for everyone
Window function: "You scored 92, which ranks you #3 in the class" - each student sees their rank AND keeps their individual score

Key difference: Window functions give you aggregated info WITHOUT collapsing rows like GROUP BY does!

Window Function Syntax:

SELECT
    column1,
    column2,
    WINDOW_FUNCTION() OVER (
        PARTITION BY column3
        ORDER BY column4
    ) AS result
FROM table;

OVER: Indicates this is a window function
PARTITION BY: Divide rows into groups (optional, like GROUP BY)
ORDER BY: Define the order within each partition (optional)

Difference Between GROUP BY and Window Functions:

-- GROUP BY (collapses rows)
SELECT department, AVG(salary)
FROM employees
GROUP BY department;
-- Result: One row per department

-- Window Function (keeps all rows)
SELECT
    name,
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) as dept_avg
FROM employees;
-- Result: All rows, each showing their department's average

🔍 Common Window Functions:

ROW_NUMBER(): Assigns unique sequential numbers
RANK(): Assigns ranks with gaps for ties
DENSE_RANK(): Assigns ranks without gaps
SUM/AVG/COUNT(): Aggregate functions as window functions
LEAD/LAG(): Access next/previous row values

🔢 ROW_NUMBER() - Assigning Sequential Numbers

ROW_NUMBER() assigns a unique number to each row within a partition!

🎯 Real-Life Analogy: Race Positions

In a race, each runner gets a position number:

1st place, 2nd place, 3rd place, etc.
Even if two runners have the same time, they get different positions
ROW_NUMBER() works the same way - always unique, sequential numbers!

Try It: ROW_NUMBER() Example

Output will appear here...

🔍 What Just Happened?

PARTITION BY department: Split data into groups (Engineering, Sales)
ORDER BY salary DESC: Within each department, sort by salary (highest first)
ROW_NUMBER(): Assigned 1, 2, 3... within each department
Alice is #1 in Engineering ($95k), Bob is #2, Eve is #3
Frank is #1 in Sales ($75k), Carol is #2, David is #3

Practical Use Case: Finding Top N Per Group

Want to find the top 2 highest-paid employees in each department?

-- Step 1: Use ROW_NUMBER in a subquery
WITH ranked_employees AS (
    SELECT
        name,
        department,
        salary,
        ROW_NUMBER() OVER (
            PARTITION BY department
            ORDER BY salary DESC
        ) as rn
    FROM employees
)
-- Step 2: Filter for top 2
SELECT name, department, salary
FROM ranked_employees
WHERE rn <= 2;

This is a very common interview question! 💡

🏆 RANK() and DENSE_RANK()

RANK() and DENSE_RANK() handle ties differently than ROW_NUMBER()!

🎯 Real-Life Analogy: Olympic Medals

Imagine a race where two runners tie for 1st place:

ROW_NUMBER(): 1, 2, 3, 4 (one gets 1st, other gets 2nd arbitrarily)
RANK(): 1, 1, 3, 4 (both get 1st, next is 3rd - skips 2)
DENSE_RANK(): 1, 1, 2, 3 (both get 1st, next is 2nd - no skip)

Try It: Compare ROW_NUMBER, RANK, and DENSE_RANK

Output will appear here...

Results Breakdown:

Student	Score	ROW_NUMBER	RANK	DENSE_RANK
Alice	95	1	1	1
Bob	90	2	2	2
Carol	90	3 ⬅️ unique	2 ⬅️ tie	2 ⬅️ tie
David	85	4	4 ⬅️ skipped 3	3 ⬅️ no skip
Eve	85	5	4	3
Frank	80	6	6 ⬅️ skipped 5	4 ⬅️ no skip

🔍 When to Use Each:

ROW_NUMBER(): When you need unique numbers (deduplication, pagination)
RANK(): When ties should share the same rank, and you want gaps (traditional ranking)
DENSE_RANK(): When ties should share the same rank, but you don't want gaps (compact ranking)

📊 Aggregate Window Functions

Use familiar aggregates (SUM, AVG, COUNT) as window functions to calculate running totals, moving averages, and more!

🎯 Real-Life Analogy: Running Total

Imagine tracking your daily spending:

Day 1: Spent $20 → Running total: $20
Day 2: Spent $35 → Running total: $55
Day 3: Spent $15 → Running total: $70

Window functions let you calculate this running total in SQL without complex self-joins!

Try It: Running Total (Cumulative Sum)

Output will appear here...

🔍 What's Happening:

SUM(amount) OVER (ORDER BY date): Calculates cumulative sum
Day 1: 100
Day 2: 100 + 150 = 250
Day 3: 250 + 200 = 450
Each row sees the sum of all rows up to and including itself!

Try It: Moving Average

Output will appear here...

Window Frame Clauses:

Control which rows are included in the calculation:

ROWS BETWEEN 2 PRECEDING AND CURRENT ROW: Current + 2 previous rows
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: All rows from start to current (running total)
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING: Previous + current + next row

Real-World Example: Compare to Department Average

SELECT
    name,
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) as dept_avg,
    salary - AVG(salary) OVER (PARTITION BY department) as diff_from_avg
FROM employees;

Shows each employee's salary compared to their department average!

📝 Common Table Expressions (CTEs)

CTEs let you create temporary named result sets that you can reference in your main query!

🎯 Real-Life Analogy: Cooking with Prep Steps

Think of making a complex dish:

Without CTEs: "Mix eggs with flour, sugar, butter, vanilla while simultaneously chopping vegetables and marinating chicken..." (confusing!)
With CTEs: "Step 1: Make batter. Step 2: Chop vegetables. Step 3: Combine everything." (clear steps!)

CTEs break complex queries into readable, manageable steps!

CTE Syntax:

WITH cte_name AS (
    -- Your SELECT query here
    SELECT ...
)
SELECT * FROM cte_name;

WITH: Starts the CTE definition
cte_name: Name you give to your temporary result set
AS (...): The query that populates the CTE
Main query: Uses the CTE like a regular table

Try It: Basic CTE Example

Output will appear here...

🔍 Why Use CTEs?

Readability: Break complex queries into logical steps
Reusability: Reference the same CTE multiple times in one query
Maintainability: Easier to debug and modify
Recursive queries: Can reference themselves (advanced!)

Multiple CTEs in One Query:

WITH
high_value_customers AS (
    SELECT customer_id
    FROM orders
    GROUP BY customer_id
    HAVING SUM(amount) > 1000
),
recent_orders AS (
    SELECT *
    FROM orders
    WHERE order_date > '2024-01-01'
)
SELECT *
FROM recent_orders
WHERE customer_id IN (SELECT customer_id FROM high_value_customers);

You can define multiple CTEs separated by commas!

💡 CTE Practical Examples

Let's see real-world use cases for CTEs!

Example 1: Deduplication - Find Duplicates and Remove Them

Output will appear here...

🔍 Deduplication Explained:

CTE ranks each email: First occurrence gets rn=1, duplicates get rn=2, 3, etc.
WHERE rn = 1: Keeps only the first occurrence
Alice's record from 2024-01-01 kept, 2024-01-05 removed
Bob's record from 2024-01-02 kept, 2024-01-07 removed

Example 2: Comparing to Previous Row

Output will appear here...

LAG() and LEAD() Functions:

LAG(column): Gets value from previous row
LEAD(column): Gets value from next row
LAG(price, 2): Gets value from 2 rows back

Great for comparing sequential data (time series, rankings, etc.)

🔁 Recursive CTEs

Recursive CTEs can reference themselves - perfect for hierarchical data like organization charts or file trees!

🎯 Real-Life Analogy: Family Tree

To find all your descendants:

Base case: Start with yourself
Recursive step: Find your children
Repeat: For each child, find their children
Continue: Until no more descendants found

Recursive CTEs work the same way!

Recursive CTE Syntax:

WITH RECURSIVE cte_name AS (
    -- Base case (anchor)
    SELECT ...
    WHERE ...

    UNION ALL

    -- Recursive case
    SELECT ...
    FROM cte_name  -- References itself!
    WHERE ...
)
SELECT * FROM cte_name;

Try It: Generate Number Sequence

Output will appear here...

🔍 How It Works:

Iteration 1: Base case returns 1
Iteration 2: Recursive part adds 1 → 2
Iteration 3: 2 + 1 → 3
... continues until...
Iteration 10: WHERE n < 10 becomes false, stops

Real-World Example: Organization Hierarchy

-- Create employees with manager relationships
DROP TABLE IF EXISTS employees_org;
CREATE TABLE employees_org (
    id INTEGER,
    name TEXT,
    manager_id INTEGER  -- NULL for CEO
);

INSERT INTO employees_org VALUES (1, 'Alice (CEO)', NULL);
INSERT INTO employees_org VALUES (2, 'Bob (VP)', 1);
INSERT INTO employees_org VALUES (3, 'Carol (VP)', 1);
INSERT INTO employees_org VALUES (4, 'David (Manager)', 2);
INSERT INTO employees_org VALUES (5, 'Eve (Manager)', 2);
INSERT INTO employees_org VALUES (6, 'Frank (Employee)', 4);

-- Find all reports under Bob (including indirect)
WITH RECURSIVE reports AS (
    -- Base: Start with Bob
    SELECT id, name, manager_id, 0 as level
    FROM employees_org
    WHERE id = 2

UNION ALL

-- Recursive: Find their reports
    SELECT e.id, e.name, e.manager_id, r.level + 1
    FROM employees_org e
    INNER JOIN reports r ON e.manager_id = r.id
)
SELECT
    name,
    level,
    CASE
        WHEN level = 0 THEN 'Manager'
        WHEN level = 1 THEN 'Direct Report'
        ELSE 'Indirect Report'
    END as relationship
FROM reports
ORDER BY level;

Output will appear here...

Common Recursive CTE Use Cases:

🌳 Organizational charts: Find all reports under a manager
📁 File systems: List all files in nested folders
🛣️ Graph traversal: Find all connected nodes
📅 Date sequences: Generate calendar dates
🔢 Number series: Fibonacci, factorials, etc.

🎨 Advanced SQL Patterns

Clever techniques for solving common real-world problems!

1. Pivot Tables

Turn rows into columns

Use case: Sales reports, crosstab data

2. Unpivot

Turn columns into rows

Use case: Normalize wide tables

3. Gaps and Islands

Find missing sequences

Use case: Missing dates, broken sequences

4. Running Differences

Calculate changes over time

Use case: Trends, growth rates

Pattern: Pivot - Turn Rows into Columns

Output will appear here...

🔍 How Pivot Works:

Original data (rows):

Laptop, Jan, 5000
Laptop, Feb, 6000
Phone, Jan, 3000
Phone, Feb, 3500

After pivot (columns):

Product  | Jan  | Feb  | Total
Laptop   | 5000 | 6000 | 11000
Phone    | 3000 | 3500 | 6500

The CASE statement creates conditional columns!

Pattern: Find Missing Dates (Gaps)

Output will appear here...

More Advanced Patterns:

Percentiles: Find median, quartiles (NTILE function)
Year-over-year growth: Compare same period different years
First/Last value: FIRST_VALUE(), LAST_VALUE() window functions
Conditional aggregation: Different aggregates based on conditions

Practice Exercises

Part 1: Guided Exercises

Guided Exercise 1: Rank Products by Sales

Task: Use DENSE_RANK() to rank products by total sales amount.

Output will appear here...

Part 2: Independent Practice

Challenge 1: Calculate Running Average

Difficulty: Hard

Task: Create a query that shows each day's temperature along with the running average of all days up to that point.

Output will appear here...

SELECT
    day,
    temperature,
    AVG(temperature) OVER (ORDER BY day) as running_avg
FROM temps
ORDER BY day;

📝 Knowledge Check Quiz

🎓 Course Conclusion - You Did It!

Congratulations! 🎉🎊

You've completed all 10 chapters of the SQL course! You've gone from absolute beginner to mastering advanced SQL techniques. That's a huge accomplishment!

What You've Learned:

Chapter 1-2

SQL basics, SELECT, WHERE, filtering data

Chapter 3-4

JOINs, relationships, aggregate functions, GROUP BY

Chapter 5-6

Subqueries, INSERT, UPDATE, DELETE

Chapter 7-8

CREATE/ALTER tables, indexes, performance optimization

Chapter 9

Transactions, ACID properties, data integrity

Chapter 10

Window functions, CTEs, advanced patterns

🚀 Where to Go From Here:

💼 Practice, practice, practice: The best way to master SQL is by using it on real projects
📊 Work with real datasets: Try Kaggle datasets, public databases, or your own projects
🔍 Learn database-specific features: PostgreSQL, MySQL, SQL Server each have unique capabilities
⚡ Study query optimization: Learn about execution plans, query tuning
🏗️ Database design: Master normalization, schema design patterns
🔐 Security: Learn about SQL injection, permissions, encryption

Keep Learning:

📚 Documentation: Read official docs for PostgreSQL, MySQL, SQLite
💻 Practice platforms: LeetCode, HackerRank, SQLZoo
🎯 Real projects: Build a database for a personal project
👥 Community: Join SQL forums, Stack Overflow, Reddit r/SQL

Thank You! 🙏

Thank you for completing this course! You now have the SQL skills to build powerful data-driven applications. Keep practicing, keep learning, and most importantly - have fun with SQL!

Happy Querying! 🎉

Advanced SQL Techniques

🚀 Welcome to Advanced SQL

🎯 What Makes These Techniques "Advanced"?

Window Functions

CTEs

Advanced Patterns

Why Learn These Techniques?

🪟 Window Functions Introduction

🎯 Real-Life Analogy: Class Rankings

Window Function Syntax:

Difference Between GROUP BY and Window Functions:

🔍 Common Window Functions:

🔢 ROW_NUMBER() - Assigning Sequential Numbers

🎯 Real-Life Analogy: Race Positions

Try It: ROW_NUMBER() Example

🔍 What Just Happened?

Practical Use Case: Finding Top N Per Group

🏆 RANK() and DENSE_RANK()

🎯 Real-Life Analogy: Olympic Medals

Try It: Compare ROW_NUMBER, RANK, and DENSE_RANK

Results Breakdown:

🔍 When to Use Each:

📊 Aggregate Window Functions

🎯 Real-Life Analogy: Running Total

Try It: Running Total (Cumulative Sum)

🔍 What's Happening:

Try It: Moving Average

Window Frame Clauses:

Real-World Example: Compare to Department Average

📝 Common Table Expressions (CTEs)

🎯 Real-Life Analogy: Cooking with Prep Steps

CTE Syntax:

Try It: Basic CTE Example

🔍 Why Use CTEs?

Multiple CTEs in One Query:

💡 CTE Practical Examples

Example 1: Deduplication - Find Duplicates and Remove Them

🔍 Deduplication Explained:

Example 2: Comparing to Previous Row

LAG() and LEAD() Functions:

🔁 Recursive CTEs

🎯 Real-Life Analogy: Family Tree

Recursive CTE Syntax:

Try It: Generate Number Sequence

🔍 How It Works:

Real-World Example: Organization Hierarchy

Common Recursive CTE Use Cases:

🎨 Advanced SQL Patterns

1. Pivot Tables

2. Unpivot

3. Gaps and Islands

4. Running Differences

Pattern: Pivot - Turn Rows into Columns

🔍 How Pivot Works:

Pattern: Find Missing Dates (Gaps)

More Advanced Patterns:

Practice Exercises

Part 1: Guided Exercises

Guided Exercise 1: Rank Products by Sales

Explanation:

Part 2: Independent Practice

Challenge 1: Calculate Running Average

📝 Knowledge Check Quiz

🎓 Course Conclusion - You Did It!

Congratulations! 🎉🎊

What You've Learned:

Chapter 1-2

Chapter 3-4

Chapter 5-6

Chapter 7-8

Chapter 9

Chapter 10

🚀 Where to Go From Here:

Keep Learning:

Thank You! 🙏