Chapter 8

Indexes & Performance

Learn how to make your SQL queries lightning fast! Master the secret weapon of database performance.

🚀 What Are Indexes?

Imagine searching for a specific name in a 1,000-page phone book. Without an alphabetical index, you'd have to flip through every single page! Indexes in databases work the same way - they help the database find data super quickly without scanning every row.

🎯 Real-Life Analogy: Indexes are Like a Book's Index Page

Think about how you use a textbook:

  • Without an index: You flip through every page to find "Photosynthesis" (slow!)
  • With an index: You look at the index page, see "Photosynthesis - Page 87", jump directly there (fast!)

Database indexes work exactly like this! They create a quick lookup table so the database doesn't have to scan every row.

What is a Database Index?

An index is a special data structure that stores a sorted copy of selected columns from a table, along with pointers to the actual rows.

  • Purpose: Speed up data retrieval (SELECT queries)
  • Trade-off: Uses extra disk space and slows down INSERT/UPDATE/DELETE slightly
  • When it helps: When searching, sorting, or filtering large tables

Real-World Example: Finding a Customer

Scenario: You have a table with 1 million customer records.

Without an index:

SELECT * FROM customers WHERE email = 'alice@example.com';

❌ Database scans all 1,000,000 rows one-by-one (Full Table Scan) - SLOW!

With an index on email:

SELECT * FROM customers WHERE email = 'alice@example.com';

✅ Database uses the index, finds the row instantly - FAST! (like using a phone book's alphabetical ordering)

🔍 Quick Summary:

  • Index = Speed Boost for searching and filtering
  • Works like a sorted lookup table
  • Most useful on columns you frequently search/filter by (WHERE, JOIN, ORDER BY)

🔍 How Indexes Work Behind the Scenes

Let's peek under the hood and understand exactly how indexes make queries faster!

🎯 Real-Life Analogy: Library Card Catalog

Think of an old-school library:

  • Books on shelves: Your actual data (table rows)
  • Card catalog: The index (sorted by title, author, or subject)
  • Finding a book: Instead of walking through every aisle, you look in the card catalog, which tells you exactly which shelf to go to!

The card catalog doesn't contain the entire book - just a reference to where the book is located. Same with database indexes!

Step-by-Step: How a Query Uses an Index

Example Table: employees

id name email salary
1 Alice alice@company.com 75000
2 Bob bob@company.com 82000
3 Carol carol@company.com 91000

Imagine this table has 100,000 rows instead of just 3...

Query WITHOUT an Index:

SELECT * FROM employees WHERE email = 'bob@company.com';

What the database does (Full Table Scan):

  1. Start at Row 1 → Check if email = 'bob@company.com' → No, keep going
  2. Go to Row 2 → Check if email = 'bob@company.com' → Yes! Found it!
  3. Problem: Had to check rows sequentially until it found the match
  4. With 100,000 rows, it might scan 50,000 rows on average (very slow!)

Query WITH an Index on email:

Index Structure (simplified):

Index on email column:
alice@company.com  → Row 1
bob@company.com    → Row 2
carol@company.com  → Row 3
(sorted alphabetically!)

What the database does (Index Lookup):

  1. Look at the index (which is sorted like a dictionary)
  2. Use binary search to find 'bob@company.com' in the index (very fast!)
  3. Index says "bob@company.com is in Row 2"
  4. Jump directly to Row 2, retrieve the data
  5. Result: Found in just a few comparisons instead of scanning all rows!

🔍 Technical Detail: B-Tree Index Structure

Most databases use a B-Tree (Balanced Tree) data structure for indexes:

  • Sorted structure: Data is kept in sorted order
  • Fast lookups: Can find any value in O(log N) time (very fast, even for millions of rows)
  • Balanced: Tree stays balanced, ensuring consistent performance

Analogy: Think of it like a game of "20 Questions" - each step eliminates half the possibilities, so you find the answer super quickly!

Full Table Scan

Without Index

Time: O(N) - Linear

Checks every single row

Index Lookup

With Index

Time: O(log N) - Logarithmic

Binary search in sorted index

Speed Comparison Example:

For a table with 1,000,000 rows:

  • Full Table Scan: ~500,000 comparisons on average
  • Index Lookup: ~20 comparisons (log₂ 1,000,000 ≈ 20)
  • Speed improvement: ~25,000x faster!

🛠️ Creating Indexes

Now let's learn how to actually create indexes in SQL!

CREATE INDEX Syntax:

CREATE INDEX index_name
ON table_name (column_name);

Try It: Create Your First Index

Output will appear here...

🔍 What Just Happened?

  1. We created a table called products
  2. We created an index named idx_category on the category column
  3. Now when we search by category, the database uses the index for faster lookups!
  4. The index is automatically maintained - when you INSERT/UPDATE/DELETE, the index updates too

Creating Indexes on Multiple Columns (Composite Index):

Try It: Composite Index

Output will appear here...

🎯 Composite Index Explained:

A composite index on (customer_id, status) works like a phone book sorted by:

  1. First: Last Name (customer_id in our case)
  2. Then: First Name (status in our case)

Good for queries that filter by:

  • ✅ customer_id only
  • ✅ customer_id AND status
  • ❌ status only (can't use the index efficiently - like trying to find all "Johns" in a phone book sorted by last name)

Unique Indexes:

A UNIQUE index not only speeds up queries but also enforces uniqueness (like a UNIQUE constraint):

CREATE UNIQUE INDEX idx_email ON users(email);

This ensures no two users can have the same email AND makes email lookups fast!

Removing Indexes:

DROP INDEX index_name;

Use this if you no longer need an index or if it's slowing down INSERT/UPDATE operations too much.

⚖️ When to Use Indexes (And When NOT To)

Indexes are powerful, but they're not always the right choice. Let's learn when to use them!

🎯 Real-Life Analogy: Index Trade-offs

Think about organizing a small personal library vs. a huge city library:

  • Small library (10 books): No need for a card catalog - just look at the shelf!
  • Huge library (1 million books): Definitely need a card catalog - you'd never find anything otherwise!

Same with database indexes: Small tables don't need indexes, but large tables do!

✅ GOOD Reasons to Create an Index:

1. Large Tables

Tables with thousands or millions of rows

Benefit: Huge speed improvement

2. Frequent Searches

Columns used often in WHERE clauses

Example: WHERE email = '...'

3. JOIN Columns

Foreign keys used in JOINs

Example: customer_id, product_id

4. ORDER BY

Columns used for sorting

Example: ORDER BY created_date

5. Unique Constraints

Enforce uniqueness AND speed up lookups

Example: username, email

❌ BAD Reasons to Create an Index:

1. Small Tables

Tables with < 1000 rows

Why? Full scan is already fast enough

2. Rarely Queried Columns

Columns you never search by

Why? Wastes space, slows down writes

3. High Write/Low Read Tables

Tables with many INSERT/UPDATE operations

Why? Index maintenance overhead

4. Low Selectivity Columns

Columns with few unique values

Example: gender (M/F) - index won't help much

🔍 The Trade-offs of Indexes:

Benefits Costs
SELECT ✅ Much faster queries -
INSERT - ❌ Slower (must update index)
UPDATE - ❌ Slower (must update index)
DELETE - ❌ Slower (must update index)
Storage - ❌ Uses extra disk space

Rule of Thumb:

Create indexes on columns where:

  • You frequently use WHERE, JOIN, or ORDER BY
  • The table has many rows (> 1000)
  • The column has high selectivity (many unique values)
  • You read data more often than you write it

📚 Types of Indexes

Different types of indexes for different needs!

1. Primary Key Index

Automatically created when you define PRIMARY KEY

CREATE TABLE users (
    id INTEGER PRIMARY KEY
);

Always unique, never NULL

2. Unique Index

Enforces uniqueness AND speeds up lookups

CREATE UNIQUE INDEX
idx_email
ON users(email);

No duplicate values allowed

3. Single-Column Index

Index on one column

CREATE INDEX idx_name
ON users(last_name);

Most common type

4. Composite Index

Index on multiple columns

CREATE INDEX idx_name_age
ON users(last_name, age);

Good for multi-column filters

5. Partial Index

Index only certain rows (SQLite 3.8+)

CREATE INDEX idx_active
ON users(email)
WHERE status = 'active';

Saves space, faster updates

6. Full-Text Index

For text search (FTS in SQLite)

CREATE VIRTUAL TABLE
articles_fts USING fts5(title, content);

Great for search features

🔍 Which Index Type Should You Use?

  • Primary Key: Always use for the main identifier
  • Unique Index: For columns that must be unique (email, username)
  • Single-Column: For columns you frequently search by alone
  • Composite: When you often filter by multiple columns together
  • Partial: When you only care about certain rows (e.g., active users)
  • Full-Text: For searching within text content (articles, comments)

⚡ Query Optimization Basics

Learn to write queries that run blazing fast!

🎯 What is Query Optimization?

Query optimization is the art of writing SQL queries that execute as efficiently as possible. It's like taking the fastest route on a road trip instead of the scenic route!

Optimization Techniques:

1. Use Indexes

Create indexes on frequently queried columns

Impact: 100x-1000x faster

2. SELECT Only What You Need

Avoid SELECT *

-- Bad
SELECT * FROM users;

-- Good
SELECT id, name FROM users;

3. Use LIMIT

Limit results when you don't need all rows

SELECT * FROM logs
ORDER BY created_at DESC
LIMIT 100;

4. Avoid Functions in WHERE

Functions prevent index usage

-- Bad (can't use index)
WHERE UPPER(name) = 'ALICE'

-- Good (can use index)
WHERE name = 'Alice'

5. Use EXISTS Instead of IN

For subqueries, EXISTS is often faster

-- Good
WHERE EXISTS (
  SELECT 1 FROM orders
  WHERE customer_id = c.id
)

6. Avoid OR, Use UNION

OR can prevent index usage

-- Instead of OR, use UNION
SELECT * FROM users
WHERE city = 'NYC'
UNION
SELECT * FROM users
WHERE age > 65;

Before & After Optimization Example:

-- ❌ SLOW Query (No index, SELECT *, unnecessary sorting)
SELECT * FROM customers
WHERE LOWER(email) LIKE '%@gmail.com%'
ORDER BY created_at;

-- ✅ FAST Query (Uses index, selects only needed columns, added LIMIT)
CREATE INDEX idx_email ON customers(email);
CREATE INDEX idx_created ON customers(created_at);

SELECT id, email, name FROM customers
WHERE email LIKE '%@gmail.com'
ORDER BY created_at DESC
LIMIT 100;

🔍 EXPLAIN QUERY PLAN (Advanced)

In SQLite, you can see how a query will be executed:

EXPLAIN QUERY PLAN
SELECT * FROM users WHERE email = 'alice@example.com';

This shows if the database is using an index or doing a full table scan!

  • "SCAN TABLE" = Bad (full table scan)
  • "SEARCH TABLE USING INDEX" = Good (using index)

💡 Performance Tips & Best Practices

Practical tips to keep your database running at peak performance!

✅ DO: Index Foreign Keys

Always index columns used in JOINs

CREATE INDEX idx_customer
ON orders(customer_id);

✅ DO: Keep Indexes Small

Index smaller data types when possible

INTEGER index is faster than TEXT

✅ DO: Monitor Performance

Use EXPLAIN to check query plans

Measure before/after optimization

❌ DON'T: Over-Index

Too many indexes slow down writes

Only index what you actually use

❌ DON'T: Index Small Tables

Tables with < 1000 rows don't benefit

Full scan is fast enough

❌ DON'T: Index Low-Selectivity

Columns with few unique values

Example: boolean flags, gender

Quick Performance Checklist:

  • ✅ Do I have indexes on columns used in WHERE clauses?
  • ✅ Do I have indexes on foreign key columns?
  • ✅ Am I selecting only the columns I need?
  • ✅ Am I using LIMIT when I don't need all rows?
  • ✅ Are my queries avoiding functions on indexed columns?
  • ✅ Have I tested with EXPLAIN QUERY PLAN?

🔍 Real-World Example: E-Commerce Database

Common Query: "Show me all orders for customer #12345"

-- Without index: Scans all 10 million order rows (SLOW!)
SELECT * FROM orders WHERE customer_id = 12345;

-- With index: Uses index to find matching rows instantly (FAST!)
CREATE INDEX idx_customer_id ON orders(customer_id);
SELECT * FROM orders WHERE customer_id = 12345;

Impact: Query time reduced from 5 seconds to 0.01 seconds!

Practice Exercises

Part 1: Guided Exercises

Guided Exercise 1: Create an Index for Faster Lookups

Task: Create a table called "customers" with id, name, email, and city. Then create an index on the email column to speed up email searches.

Output will appear here...

Part 2: Independent Practice

Challenge 1: Optimize a Slow Query

Difficulty: Medium

Task: You have a "products" table with product_id, name, category, and price. Create a composite index that would speed up queries filtering by both category and price range.

Output will appear here...

📝 Knowledge Check Quiz

1. What is the main purpose of a database index?

2. What is a trade-off of adding many indexes to a table?

3. When is it NOT beneficial to create an index?