Sarathlal N

The Ultimate Guide to Indexing in Database Design

On this week, I attended an interview for the position of WordPress developer. As a WordPress developer, I have traditionally relied on default WordPress tables to store data in projects, avoiding the use of custom tables. However, I recently started using custom tables in my new plugin projects. During this process, I came across the concept of “indexing.” At the time, I didn’t fully understand its real use cases and benefits, so I skipped that part. In the recent interview, the interviewer asked about indexing, and I realized I couldn’t provide a proper answer. This motivated me to collect as much detailed information as possible and gain a deeper understanding of indexing. This blog post serves as a comprehensive summary and reference for me in the future.


What is Indexing?

In a database, an index is a separate data structure that improves query performance by minimizing the amount of data the database has to scan to retrieve results. Think of it like an index at the back of a book that helps you quickly locate a topic instead of flipping through every page.


How Indexing Works Internally

Index Data Structures

Databases use various data structures to implement indexing, with the most common being:

  1. B-Tree (Balanced Tree):
    • Organizes data hierarchically in sorted order.
    • Provides efficient lookups, insertions, and deletions in logarithmic time.
    • Ideal for range queries and sorting.

    Structure Example:

             30
          /      \
        20        40
      /   \      /   \
     10   25    35    50
    
  2. Hash Table:
    • Maps index values to fixed-size buckets using hash functions.
    • Highly efficient for exact-match queries.
    • Not suitable for range queries or sorting.
  3. Other Specialized Indexes:
    • Bitmap Index: Efficient for low-cardinality columns.
    • R-Tree: Used for spatial and geographic data.
    • Gin/GiST (PostgreSQL): Used for full-text search and custom data types.

Primary vs. Secondary Storage

Example:

Consider a table Employees:

EmployeeID Name Age Department
1 Alice 30 HR
2 Bob 25 IT
3 Charlie 35 Finance
4 David 30 IT

If we create an index on Age:

CREATE INDEX idx_age ON Employees(Age);

The index might look like this:

Age (Key) Pointer to Row
25 Row 2
30 Row 1, Row 4
35 Row 3

When you query WHERE Age = 30, the database uses this index to directly fetch Rows 1 and 4, bypassing a full table scan.


How Indexes Are Used

Index Scan vs. Sequential Scan

Query Execution Example

Suppose you run:

SELECT * FROM Employees WHERE Age > 30;

How Databases Automatically Manage Indexes

1. Insertions

When a new row is added, the database:

  1. Inserts the row into the table.
  2. Updates the index by inserting the new value in the correct position (e.g., rebalancing a B-tree if necessary).

2. Updates

If an indexed column is updated:

  1. The database removes the old index entry.
  2. Adds a new entry for the updated value.

3. Deletions

When a row is deleted:

  1. The corresponding index entry is also removed.
  2. The database may rebalance the index to maintain performance.

4. Query Optimization

The query optimizer evaluates whether to use an index based on factors like:


Types of Indexes

1. Primary Index

CREATE TABLE Users (
    UserID INT PRIMARY KEY,
    Name VARCHAR(50)
);

2. Composite Index

CREATE INDEX idx_name_age ON Employees(Name, Age);

3. Unique Index

CREATE UNIQUE INDEX idx_email ON Employees(Email);

4. Full-Text Index

CREATE FULLTEXT INDEX idx_name ON Employees(Name);

5. Clustered Index

CREATE CLUSTERED INDEX idx_empid ON Employees(EmployeeID);

Advanced Index Concepts

1. Covering Index

CREATE INDEX idx_covering ON Employees(Age, Department);

2. Partial Index

CREATE INDEX idx_active_users ON Users(Status) WHERE Status = 'active';

3. Function-Based Index

CREATE INDEX idx_lower_name ON Employees(LOWER(Name));

Indexing Trade-Offs and Limitations

Advantages

Disadvantages

Best Practices


Tools for Monitoring Index Performance

  1. EXPLAIN/EXPLAIN PLAN: Analyze how a query uses indexes.

    EXPLAIN SELECT * FROM Employees WHERE Age > 25;
    
  2. Database-Specific Tools:
    • MySQL: pt-index-usage
    • PostgreSQL: pg_stat_user_indexes
  3. Statistics Collection: Use commands like ANALYZE to update index statistics.

Practical Example: Index Optimization Workflow

Step 1: Create Table

CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    Status VARCHAR(20),
    TotalAmount DECIMAL(10, 2)
);

Step 2: Analyze Queries

SELECT * FROM Orders WHERE CustomerID = 101;
SELECT * FROM Orders WHERE OrderDate > '2023-01-01' AND Status = 'Shipped';

Step 3: Add Indexes

CREATE INDEX idx_customer ON Orders(CustomerID);
CREATE INDEX idx_date_status ON Orders(OrderDate, Status);

Step 4: Monitor Performance

Use EXPLAIN to verify index usage and adjust as needed.


Indexing is a powerful tool for optimizing database performance, but it requires careful design and maintenance. By understanding the internal workings, types, and trade-offs, we can create indexes that strike the right balance between query speed and storage efficiency.

Ready to improve the database queries? Start analyzing the workloads and applying these indexing techniques today!


Recent posts

  1. The Ultimate Guide to Indexing in Database Design
  2. Understanding 'update_meta_cache' in WordPress - When to Use It, When Not to, and Its Impact on Database Queries
  3. A Guide to Configuring JavaScript and SCSS Paths in WordPress Plugins with @wordpress/scripts
  4. Disabling Payment Methods in WooCommerce Based on Conditions
  5. How to Update Product Quantity in WooCommerce Using Custom Code

Your Questions / Comments

If you found this article interesting, found errors, or just want to discuss about them, please get in touch.