The Ultimate Guide to Indexing in Database Design
On this week, I attended an interview for the position of WordPress developer. As a WordPress developer, I have traditionally relied on default WordPress tables to store data in projects, avoiding the use of custom tables. However, I recently started using custom tables in my new plugin projects. During this process, I came across the concept of “indexing.” At the time, I didn’t fully understand its real use cases and benefits, so I skipped that part. In the recent interview, the interviewer asked about indexing, and I realized I couldn’t provide a proper answer. This motivated me to collect as much detailed information as possible and gain a deeper understanding of indexing. This blog post serves as a comprehensive summary and reference for me in the future.
What is Indexing?
In a database, an index is a separate data structure that improves query performance by minimizing the amount of data the database has to scan to retrieve results. Think of it like an index at the back of a book that helps you quickly locate a topic instead of flipping through every page.
How Indexing Works Internally
Index Data Structures
Databases use various data structures to implement indexing, with the most common being:
- B-Tree (Balanced Tree):
- Organizes data hierarchically in sorted order.
- Provides efficient lookups, insertions, and deletions in logarithmic time.
- Ideal for range queries and sorting.
Structure Example:
30 / \ 20 40 / \ / \ 10 25 35 50
- Hash Table:
- Maps index values to fixed-size buckets using hash functions.
- Highly efficient for exact-match queries.
- Not suitable for range queries or sorting.
- Other Specialized Indexes:
- Bitmap Index: Efficient for low-cardinality columns.
- R-Tree: Used for spatial and geographic data.
- Gin/GiST (PostgreSQL): Used for full-text search and custom data types.
Primary vs. Secondary Storage
- Table Data (Heap): Stores the actual rows of the table.
- Index Data: A separate structure containing indexed column values and pointers to the corresponding rows.
Example:
Consider a table Employees
:
EmployeeID | Name | Age | Department |
---|---|---|---|
1 | Alice | 30 | HR |
2 | Bob | 25 | IT |
3 | Charlie | 35 | Finance |
4 | David | 30 | IT |
If we create an index on Age
:
CREATE INDEX idx_age ON Employees(Age);
The index might look like this:
Age (Key) | Pointer to Row |
---|---|
25 | Row 2 |
30 | Row 1, Row 4 |
35 | Row 3 |
When you query WHERE Age = 30
, the database uses this index to directly fetch Rows 1 and 4, bypassing a full table scan.
How Indexes Are Used
Index Scan vs. Sequential Scan
- Index Scan: Searches the index structure for matching keys, reducing the number of rows accessed.
- Sequential Scan: Scans all rows in the table. It’s used when most rows match the query or no index is available.
Query Execution Example
Suppose you run:
SELECT * FROM Employees WHERE Age > 30;
- Without Index: The database reads all rows to find matches.
- With Index: It uses the index to locate rows with
Age > 30
, then fetches the corresponding rows from the table.
How Databases Automatically Manage Indexes
1. Insertions
When a new row is added, the database:
- Inserts the row into the table.
- Updates the index by inserting the new value in the correct position (e.g., rebalancing a B-tree if necessary).
2. Updates
If an indexed column is updated:
- The database removes the old index entry.
- Adds a new entry for the updated value.
3. Deletions
When a row is deleted:
- The corresponding index entry is also removed.
- The database may rebalance the index to maintain performance.
4. Query Optimization
The query optimizer evaluates whether to use an index based on factors like:
- Query conditions (
WHERE
,ORDER BY
clauses). - Estimated cost of accessing the index versus scanning the table.
Types of Indexes
1. Primary Index
- Automatically created on a table’s primary key.
- Ensures unique values.
CREATE TABLE Users (
UserID INT PRIMARY KEY,
Name VARCHAR(50)
);
2. Composite Index
- Indexes multiple columns together.
- The column order affects its efficiency.
CREATE INDEX idx_name_age ON Employees(Name, Age);
3. Unique Index
- Ensures all values in the indexed column are unique.
CREATE UNIQUE INDEX idx_email ON Employees(Email);
4. Full-Text Index
- Optimized for text search.
CREATE FULLTEXT INDEX idx_name ON Employees(Name);
5. Clustered Index
- Sorts and stores data rows based on the indexed column.
- Only one clustered index per table is allowed.
CREATE CLUSTERED INDEX idx_empid ON Employees(EmployeeID);
Advanced Index Concepts
1. Covering Index
- Contains all columns required for a query, avoiding table lookups.
CREATE INDEX idx_covering ON Employees(Age, Department);
2. Partial Index
- Indexes only rows meeting a specific condition.
CREATE INDEX idx_active_users ON Users(Status) WHERE Status = 'active';
3. Function-Based Index
- Indexes the result of a function applied to a column.
CREATE INDEX idx_lower_name ON Employees(LOWER(Name));
Indexing Trade-Offs and Limitations
Advantages
- Faster query performance for reads.
- Efficient sorting and filtering.
- Reduced disk I/O for large datasets.
Disadvantages
- Storage Overhead: Indexes require additional disk space.
- Write Performance Impact: Insert, update, and delete operations take longer because the index must be updated.
- Maintenance Costs: Fragmentation over time can degrade performance.
Best Practices
- Index frequently queried columns.
- Avoid over-indexing; too many indexes slow down writes.
- Use composite indexes for multi-column filtering.
- Regularly analyze and rebuild indexes to reduce fragmentation.
Tools for Monitoring Index Performance
-
EXPLAIN/EXPLAIN PLAN: Analyze how a query uses indexes.
EXPLAIN SELECT * FROM Employees WHERE Age > 25;
- Database-Specific Tools:
- MySQL:
pt-index-usage
- PostgreSQL:
pg_stat_user_indexes
- MySQL:
- Statistics Collection: Use commands like
ANALYZE
to update index statistics.
Practical Example: Index Optimization Workflow
Step 1: Create Table
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE,
Status VARCHAR(20),
TotalAmount DECIMAL(10, 2)
);
Step 2: Analyze Queries
SELECT * FROM Orders WHERE CustomerID = 101;
SELECT * FROM Orders WHERE OrderDate > '2023-01-01' AND Status = 'Shipped';
Step 3: Add Indexes
CREATE INDEX idx_customer ON Orders(CustomerID);
CREATE INDEX idx_date_status ON Orders(OrderDate, Status);
Step 4: Monitor Performance
Use EXPLAIN
to verify index usage and adjust as needed.
Indexing is a powerful tool for optimizing database performance, but it requires careful design and maintenance. By understanding the internal workings, types, and trade-offs, we can create indexes that strike the right balance between query speed and storage efficiency.
Ready to improve the database queries? Start analyzing the workloads and applying these indexing techniques today!
Looking for a skilled WordPress/WooCommerce developer? I'm currently available for freelance, contract, or full-time remote opportunities! Let's create something amazing together. Send me a quick message, and I'll respond within 24 hours!
Recent Posts
- Understanding 'update_meta_cache' in WordPress - When to Use It, When Not to, and Its Impact on Database Queries
- A Guide to Configuring JavaScript and SCSS Paths in WordPress Plugins with @wordpress/scripts
- Disabling Payment Methods in WooCommerce Based on Conditions
- How to Update Product Quantity in WooCommerce Using Custom Code
- Dynamically Generating a Table of Contents in WordPress
Your Questions / Comments
If you found this article interesting, found errors, or just want to discuss about it, please get in touch.