The Ultimate Guide to Indexing in Database Design

On this week, I attended an interview for the position of WordPress developer. As a WordPress developer, I have traditionally relied on default WordPress tables to store data in projects, avoiding the use of custom tables. However, I recently started using custom tables in my new plugin projects. During this process, I came across the concept of “indexing.” At the time, I didn’t fully understand its real use cases and benefits, so I skipped that part. In the recent interview, the interviewer asked about indexing, and I realized I couldn’t provide a proper answer. This motivated me to collect as much detailed information as possible and gain a deeper understanding of indexing. This blog post serves as a comprehensive summary and reference for me in the future.

What is Indexing?

In a database, an index is a separate data structure that improves query performance by minimizing the amount of data the database has to scan to retrieve results. Think of it like an index at the back of a book that helps you quickly locate a topic instead of flipping through every page.

How Indexing Works Internally

Index Data Structures

Databases use various data structures to implement indexing, with the most common being:

B-Tree (Balanced Tree):
- Organizes data hierarchically in sorted order.
- Provides efficient lookups, insertions, and deletions in logarithmic time.
- Ideal for range queries and sorting.
Structure Example:
```
         30
      /      \
    20        40
  /   \      /   \
 10   25    35    50
```
Hash Table:
- Maps index values to fixed-size buckets using hash functions.
- Highly efficient for exact-match queries.
- Not suitable for range queries or sorting.
Other Specialized Indexes:
- Bitmap Index: Efficient for low-cardinality columns.
- R-Tree: Used for spatial and geographic data.
- Gin/GiST (PostgreSQL): Used for full-text search and custom data types.

Primary vs. Secondary Storage

Table Data (Heap): Stores the actual rows of the table.
Index Data: A separate structure containing indexed column values and pointers to the corresponding rows.

Example:

Consider a table Employees:

EmployeeID	Name	Age	Department
1	Alice	30	HR
2	Bob	25	IT
3	Charlie	35	Finance
4	David	30	IT

If we create an index on Age:

CREATE INDEX idx_age ON Employees(Age);

The index might look like this:

Age (Key)	Pointer to Row
25	Row 2
30	Row 1, Row 4
35	Row 3

When you query WHERE Age = 30, the database uses this index to directly fetch Rows 1 and 4, bypassing a full table scan.

How Indexes Are Used

Index Scan vs. Sequential Scan

Index Scan: Searches the index structure for matching keys, reducing the number of rows accessed.
Sequential Scan: Scans all rows in the table. It’s used when most rows match the query or no index is available.

Query Execution Example

Suppose you run:

SELECT * FROM Employees WHERE Age > 30;

Without Index: The database reads all rows to find matches.
With Index: It uses the index to locate rows with Age > 30, then fetches the corresponding rows from the table.

How Databases Automatically Manage Indexes

1. Insertions

When a new row is added, the database:

Inserts the row into the table.
Updates the index by inserting the new value in the correct position (e.g., rebalancing a B-tree if necessary).

2. Updates

If an indexed column is updated:

The database removes the old index entry.
Adds a new entry for the updated value.

3. Deletions

When a row is deleted:

The corresponding index entry is also removed.
The database may rebalance the index to maintain performance.

4. Query Optimization

The query optimizer evaluates whether to use an index based on factors like:

Query conditions (WHERE, ORDER BY clauses).
Estimated cost of accessing the index versus scanning the table.

Types of Indexes

1. Primary Index

Automatically created on a table’s primary key.
Ensures unique values.

CREATE TABLE Users (
    UserID INT PRIMARY KEY,
    Name VARCHAR(50)
);

2. Composite Index

Indexes multiple columns together.
The column order affects its efficiency.

CREATE INDEX idx_name_age ON Employees(Name, Age);

3. Unique Index

Ensures all values in the indexed column are unique.

CREATE UNIQUE INDEX idx_email ON Employees(Email);

4. Full-Text Index

Optimized for text search.

CREATE FULLTEXT INDEX idx_name ON Employees(Name);

5. Clustered Index

Sorts and stores data rows based on the indexed column.
Only one clustered index per table is allowed.

CREATE CLUSTERED INDEX idx_empid ON Employees(EmployeeID);

Advanced Index Concepts

1. Covering Index

Contains all columns required for a query, avoiding table lookups.

CREATE INDEX idx_covering ON Employees(Age, Department);

2. Partial Index

Indexes only rows meeting a specific condition.

CREATE INDEX idx_active_users ON Users(Status) WHERE Status = 'active';

3. Function-Based Index

Indexes the result of a function applied to a column.

CREATE INDEX idx_lower_name ON Employees(LOWER(Name));

Indexing Trade-Offs and Limitations

Advantages

Faster query performance for reads.
Efficient sorting and filtering.
Reduced disk I/O for large datasets.

Disadvantages

Storage Overhead: Indexes require additional disk space.
Write Performance Impact: Insert, update, and delete operations take longer because the index must be updated.
Maintenance Costs: Fragmentation over time can degrade performance.

Best Practices

Index frequently queried columns.
Avoid over-indexing; too many indexes slow down writes.
Use composite indexes for multi-column filtering.
Regularly analyze and rebuild indexes to reduce fragmentation.

Tools for Monitoring Index Performance

EXPLAIN/EXPLAIN PLAN: Analyze how a query uses indexes.

EXPLAIN SELECT * FROM Employees WHERE Age > 25;

Database-Specific Tools:
- MySQL: pt-index-usage
- PostgreSQL: pg_stat_user_indexes
Statistics Collection: Use commands like ANALYZE to update index statistics.

Practical Example: Index Optimization Workflow

Step 1: Create Table

CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    Status VARCHAR(20),
    TotalAmount DECIMAL(10, 2)
);

Step 2: Analyze Queries

SELECT * FROM Orders WHERE CustomerID = 101;
SELECT * FROM Orders WHERE OrderDate > '2023-01-01' AND Status = 'Shipped';

Step 3: Add Indexes

CREATE INDEX idx_customer ON Orders(CustomerID);
CREATE INDEX idx_date_status ON Orders(OrderDate, Status);

Step 4: Monitor Performance

Use EXPLAIN to verify index usage and adjust as needed.

Indexing is a powerful tool for optimizing database performance, but it requires careful design and maintenance. By understanding the internal workings, types, and trade-offs, we can create indexes that strike the right balance between query speed and storage efficiency.

Ready to improve the database queries? Start analyzing the workloads and applying these indexing techniques today!

Looking for a skilled WordPress/WooCommerce developer? I'm currently available for freelance, contract, or full-time remote opportunities! Let's create something amazing together. Send me a quick message, and I'll respond within 24 hours!

Your Questions / Comments

If you found this article interesting, found errors, or just want to discuss about it, please get in touch.

Sarathlal N