Analytic Functions- How to Test and Verify
What Are Analytic Functions and Why Testing Them Matters
Analytic functions are SQL functions that calculate values across a set of rows related to the current row. Unlike aggregate functions, they don't collapse your result setβthey keep individual rows intact while adding computed context.
If you're working with window functions like ROW_NUMBER(), RANK(), LEAD(), or LAG(), you need to test them properly. The problem? They're deceptively simple-looking but hide complex edge cases that will bite you in production.
This guide cuts through the theory. Here's how to actually test and verify your analytic functions work correctly.
The Core Analytic Functions You Need to Test
Before testing, know what you're working with. These are the functions that cause the most headaches:
- ROW_NUMBER() β Assigns unique sequential integers to rows within a partition
- RANK() β Assigns ranks with gaps when ties occur
- DENSE_RANK() β Assigns ranks without gaps
- LEAD() and LAG() β Accesses data from subsequent or previous rows
- SUM(), AVG(), COUNT() as window functions β Running totals and averages
- FIRST_VALUE() and LAST_VALUE() β Returns first or last value in a window frame
Testing Strategy: The Three-Part Approach
Don't wing it. A solid testing approach covers three areas:
1. Partition Boundary Testing
Analytic functions operate within partitions. If your PARTITION BY clause has bugs, everything downstream breaks. Test these scenarios:
- Single-row partitions (edge case)
- Empty partitions
- Partitions with all identical values
- Partitions with NULL values
2. Order By Verification
The ORDER BY inside your window frame determines which row is "current." Wrong ordering = wrong results. Always verify:
- Ascending vs descending sort produces expected row assignments
- Ties are handled according to your business logic
- NULL handling (FIRST vs LAST) matches your requirements
3. Frame Boundary Testing
If you're using ROWS BETWEEN or RANGE BETWEEN, test the boundaries explicitly:
- First and last row of each partition
- Sliding window scenarios
- Unbounded preceding/following behavior
How To: Writing Test Cases for Analytic Functions
Here's a practical approach you can use right now:
Step 1: Create a Test Dataset
Build a small, controlled dataset that covers your edge cases. Don't test against production dataβcreate synthetic data with known inputs and expected outputs.
-- Example test data setup
CREATE TABLE test_sales (
region VARCHAR(50),
sale_date DATE,
amount DECIMAL(10,2)
);
INSERT INTO test_sales VALUES
('North', '2024-01-01', 100.00),
('North', '2024-01-02', 150.00),
('North', '2024-01-03', 100.00), -- Tie case
('South', '2024-01-01', 200.00),
('South', '2024-01-02', 175.00);
Step 2: Write Your Query
Apply your analytic function and verify results match expectations:
SELECT
region,
sale_date,
amount,
ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) as row_num,
RANK() OVER (PARTITION BY region ORDER BY amount DESC) as rank_val,
LAG(amount) OVER (PARTITION BY region ORDER BY sale_date) as prev_amount
FROM test_sales;
Step 3: Verify Manually
For each row, trace through the logic by hand. Check:
- Which rows share the same partition key
- How ORDER BY affects the sequence
- What values LEAD/LAG should return for boundary rows
Step 4: Assert Expected Values
Write assertions comparing actual vs expected. In SQL, use a CTE or subquery to isolate your analytic function, then filter for unexpected results:
WITH ranked AS (
SELECT
region,
amount,
ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) as rn
FROM test_sales
)
SELECT * FROM ranked
WHERE rn = 1
AND NOT (region = 'North' AND amount = 150.00);
-- This query should return 0 rows if your expectations are correct
Verification Methods Compared
Different verification approaches have different tradeoffs:
| Method | Pros | Cons | Best For |
|---|---|---|---|
| Manual SQL Testing | Fast, no setup | Error-prone, not repeatable | Quick validation during development |
| Unit Tests (dbt, SQLFluff) | Repeatable, version-controlled | Setup overhead | Production pipelines |
| Sample Data Comparison | Easy to understand | Limited coverage | Stakeholder verification |
| Automated Assertions | Catches regressions | Maintenance burden | Critical business logic |
Common Pitfalls That Break Analytic Functions
These mistakes show up constantly. Don't make them:
- Assuming ROW_NUMBER() is deterministic β Without a unique ORDER BY, ties get arbitrary row assignments. If you need deterministic results, add a tiebreaker column.
- Ignoring NULL in PARTITION BY β All NULLs go into the same partition. This is usually not what you want.
- Wrong frame specification β RANGE BETWEEN and ROWS BETWEEN behave differently. RANGE looks at value ranges; ROWS counts physical rows.
- Off-by-one errors in sliding windows β Test your window boundaries explicitly. "1 preceding and 1 following" excludes the current rowβmake sure that's intentional.
- Forgetting the default frame β If you don't specify a frame, most databases use RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This catches many developers off guard.
Tools That Help You Test Analytic Functions
You don't have to do this manually. These tools make testing easier:
- dbt (data build tool) β Great for writing SQL tests on analytic functions within transformation pipelines
- Great Expectations β Validates data quality including window function results
- SQLFuzz β Property-based testing for SQL queries
- DuckDB β Lightweight in-memory database perfect for quick testing without spinning up a full instance
Quick Validation Checklist
Before you ship any query with analytic functions, run through this:
- β Does each partition contain the rows you expect?
- β Does the ORDER BY produce the sequence you need?
- β Have you tested with NULL values in the partition key?
- β Have you tested with single-row partitions?
- β Do LEAD/LAG return NULL at partition boundaries as expected?
- β Does your frame specification match your intent?
- β Are tie-breaking rules explicit and correct?
The Brutal Truth About Testing Analytic Functions
Most developers test analytic functions poorly because the results "look right" on small datasets. The bugs only surface when you hit production volumes or real-world data distributions.
The only way to catch these issues: write explicit test cases covering partition boundaries, tie scenarios, and NULL handling. No amount of visual inspection substitutes for assertions against known inputs.
Set up automated tests. Use small, controlled datasets. Verify your assumptions before the data grows.