Mastering Complex Database Queries- Best Practices
What "Complex" Queries Actually Means
When developers talk about complex queries, they usually mean one of three things:
- Joins spanning multiple tables (5+ tables isn't unusual)
- Subqueries nested inside subqueries
- Aggregations with multiple GROUP BY columns and window functions
Most "complex" queries are just poorly written simple queries. Before you blame the database engine, ask yourself if your query is doing more work than necessary. 90% of slow queries are slow because of bad design, not hardware limitations.
Read the Execution Plan First
You cannot optimize what you can't measure. Every major database system provides a way to inspect how your query actually executes:
- PostgreSQL: EXPLAIN ANALYZE
- MySQL: EXPLAIN
- SQL Server: SET STATISTICS IO ON / Showplan
- Oracle: EXPLAIN PLAN
The execution plan tells you where the database spends time. Look for these red flags:
- Sequential scans on large tables
- Nested loop joins when hash joins would work better
- High estimated row counts that don't match reality
- Sort operations spilling to disk
What the Numbers Actually Mean
Cost is relative to your database. A cost of 1000 might be fine on a beefy server and terrible on a shared hosting environment. Focus on relative differences between query versions, not absolute cost numbers.
Row estimates vs actual rows matter more than cost. When estimates are way off, your statistics are stale or your data distribution changed significantly.
Indexing: The Obvious Fix Nobody Does Right
Add an index. Everyone knows that. Here's what they don't tell you:
Composite Indexes Have Order
A composite index on (status, created_at) works for queries filtering by status first, then sorting by date. It does not help if you filter by created_at first.
Put the most selective column first. This isn't a hard rule—query patterns matter more than theory. Test both orderings with your actual queries.
Index Bloat Kills Write Performance
Every index you add slows down INSERT, UPDATE, and DELETE operations. A table with 15 indexes isn't optimized—it's over-indexed. Audit your indexes quarterly. Remove anything that hasn't been used in query plans for 6+ months.
Partial Indexes Save Space and Time
If you only query active users, index only active users:
CREATE INDEX idx_active_users_email
ON users(email)
WHERE status = 'active';
This index is smaller, fits in memory easier, and updates faster than a full table index.
Query Writing Patterns That Actually Matter
Avoid SELECT *
Nobody needs every column. SELECT * forces the database to read metadata for every column, prevents index-only scans, and breaks your application when schema changes. List the columns you actually need. That's not premature optimization—it's basic competence.
OR Conditions Break Index Usage
This query:
WHERE status = 'active' OR status = 'pending'
Should be:
WHERE status IN ('active', 'pending')
The OR version often forces a full scan. The IN version can use an index. Same result, completely different performance characteristics.
Functions on Indexed Columns Disable Indexes
This kills performance:
WHERE LOWER(email) = 'user@example.com'
Either store a lowercase version and index that, or use a case-insensitive collation. Functions on columns mean the database can't use the B-tree structure—it has to evaluate every row.
UNION vs UNION ALL
UNION removes duplicates. That requires sorting and comparing all rows. If you know there are no duplicates (or don't care), use UNION ALL. It simply appends rows. No deduplication overhead.
Join Strategies and When to Use Them
Different join types have different performance profiles. Your database picks one based on statistics, but you can force a specific type if needed.
| Join Type | Best For | Warning |
|---|---|---|
| Hash Join | Large tables, no indexes, equality conditions | Requires memory; spills to disk if table doesn't fit |
| Nested Loop | Small outer table, indexed inner table | Can be slow if inner table isn't indexed |
| Merge Join | Pre-sorted data, range conditions | Needs sort operation if data isn't pre-sorted |
Most of the time, let the query planner decide. Only override when you've measured and confirmed a better strategy exists.
Subqueries vs Joins: Pick the Right One
Correlated subqueries run once per row. That's usually terrible:
SELECT name, (SELECT COUNT(*) FROM orders WHERE orders.user_id = users.id)
FROM users
This runs the subquery for every user row. A JOIN or window function does the same work in a single pass:
SELECT u.name, COUNT(o.id)
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name
Non-correlated subqueries are fine—the database evaluates them once and treats them like a temporary table.
Common Mistakes That Make Simple Queries Slow
- Pagination without cursor: OFFSET 10000 fetches and discards 10000 rows. Use a WHERE id > last_seen_id pattern instead.
- Implicit type casting: WHERE user_id = '123' forces a cast on every row if user_id is an integer. Pass the right type.
- Leading wildcards: WHERE email LIKE '%@gmail.com' can't use indexes. There's no fix for this one—just accept the scan.
- Missing LIMIT on subqueries: Correlated subqueries without limits can return massive result sets unexpectedly.
Getting Started: How to Fix a Slow Query
Here's the actual process:
- Find it. Enable slow query logging. Set threshold to 1 second initially. Review logs daily.
- Reproduce it. Run the query with EXPLAIN ANALYZE. Save the output.
- Identify the bottleneck. Look for scans, sorts, and high row estimates.
- Add an index. Test on a copy of production data first. Measure the improvement.
- Deploy and monitor. The fix worked in staging. It might not in production if data distribution differs.
- Document it. Why was this slow? What did you fix? Someone will need this context later.
Quick Wins Checklist
- Add covering indexes for frequently-run SELECT queries
- Replace OR with IN where possible
- Remove functions from WHERE clauses
- Switch to UNION ALL when duplicates don't matter
- Replace correlated subqueries with JOINs or window functions
- Use LIMIT on subqueries
Tools Worth Using
| Tool | Database | What It Does |
|---|---|---|
| pg_stat_statements | PostgreSQL | Tracks query performance over time |
| pt-query-digest | MySQL | Analyzes slow query logs |
| Query Store | SQL Server | Captures execution plans and performance |
| SQL Tuning Advisor | Oracle | Automated recommendations |
Manual optimization beats automated tools most of the time. Tools suggest indexes. You understand your access patterns. Combine both.
When to Rewrite vs When to Add Hardware
Hardware is cheap. Developer time isn't. But sometimes throwing hardware at the problem is the right call:
- One-time reporting queries? Scale up the replica.
- Regular dashboard loads? Cache the results.
- Real-time user-facing queries? Optimize the query.
Optimization has diminishing returns. If you've applied the basics and still have problems, consider whether the query belongs in a queue job rather than synchronous execution.