On this week, I attended an interview for the position of WordPress developer. As a WordPress developer, I have traditionally relied on default WordPress tables to store data in projects, avoiding the use of custom tables. However, I recently started using custom tables in my new plugin projects. During this process, I came across the concept of “indexing.” At the time, I didn’t fully understand its real use cases and benefits, so I skipped that part. In the recent interview, the interviewer asked about indexing, and I realized I couldn’t provide a proper answer. This motivated me to collect as much detailed information as possible and gain a deeper understanding of indexing. This blog post serves as a comprehensive summary and reference for me in the future.
In a database, an index is a separate data structure that improves query performance by minimizing the amount of data the database has to scan to retrieve results. Think of it like an index at the back of a book that helps you quickly locate a topic instead of flipping through every page.
Databases use various data structures to implement indexing, with the most common being:
Structure Example:
30
/ \
20 40
/ \ / \
10 25 35 50
Consider a table Employees
:
EmployeeID | Name | Age | Department |
---|---|---|---|
1 | Alice | 30 | HR |
2 | Bob | 25 | IT |
3 | Charlie | 35 | Finance |
4 | David | 30 | IT |
If we create an index on Age
:
CREATE INDEX idx_age ON Employees(Age);
The index might look like this:
Age (Key) | Pointer to Row |
---|---|
25 | Row 2 |
30 | Row 1, Row 4 |
35 | Row 3 |
When you query WHERE Age = 30
, the database uses this index to directly fetch Rows 1 and 4, bypassing a full table scan.
Suppose you run:
SELECT * FROM Employees WHERE Age > 30;
Age > 30
, then fetches the corresponding rows from the table.When a new row is added, the database:
If an indexed column is updated:
When a row is deleted:
The query optimizer evaluates whether to use an index based on factors like:
WHERE
, ORDER BY
clauses).CREATE TABLE Users (
UserID INT PRIMARY KEY,
Name VARCHAR(50)
);
CREATE INDEX idx_name_age ON Employees(Name, Age);
CREATE UNIQUE INDEX idx_email ON Employees(Email);
CREATE FULLTEXT INDEX idx_name ON Employees(Name);
CREATE CLUSTERED INDEX idx_empid ON Employees(EmployeeID);
CREATE INDEX idx_covering ON Employees(Age, Department);
CREATE INDEX idx_active_users ON Users(Status) WHERE Status = 'active';
CREATE INDEX idx_lower_name ON Employees(LOWER(Name));
EXPLAIN/EXPLAIN PLAN: Analyze how a query uses indexes.
EXPLAIN SELECT * FROM Employees WHERE Age > 25;
pt-index-usage
pg_stat_user_indexes
ANALYZE
to update index statistics.CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE,
Status VARCHAR(20),
TotalAmount DECIMAL(10, 2)
);
SELECT * FROM Orders WHERE CustomerID = 101;
SELECT * FROM Orders WHERE OrderDate > '2023-01-01' AND Status = 'Shipped';
CREATE INDEX idx_customer ON Orders(CustomerID);
CREATE INDEX idx_date_status ON Orders(OrderDate, Status);
Use EXPLAIN
to verify index usage and adjust as needed.
Indexing is a powerful tool for optimizing database performance, but it requires careful design and maintenance. By understanding the internal workings, types, and trade-offs, we can create indexes that strike the right balance between query speed and storage efficiency.
Ready to improve the database queries? Start analyzing the workloads and applying these indexing techniques today!
If you found this article interesting, found errors, or just want to discuss about them, please get in touch.