Analytic Functions- How to Test and Verify

What Are Analytic Functions and Why Testing Them Matters

Analytic functions are SQL functions that calculate values across a set of rows related to the current row. Unlike aggregate functions, they don't collapse your result set—they keep individual rows intact while adding computed context.

If you're working with window functions like ROW_NUMBER(), RANK(), LEAD(), or LAG(), you need to test them properly. The problem? They're deceptively simple-looking but hide complex edge cases that will bite you in production.

This guide cuts through the theory. Here's how to actually test and verify your analytic functions work correctly.

The Core Analytic Functions You Need to Test

Before testing, know what you're working with. These are the functions that cause the most headaches:

ROW_NUMBER() — Assigns unique sequential integers to rows within a partition
RANK() — Assigns ranks with gaps when ties occur
DENSE_RANK() — Assigns ranks without gaps
LEAD() and LAG() — Accesses data from subsequent or previous rows
SUM(), AVG(), COUNT() as window functions — Running totals and averages
FIRST_VALUE() and LAST_VALUE() — Returns first or last value in a window frame

Testing Strategy: The Three-Part Approach

Don't wing it. A solid testing approach covers three areas:

1. Partition Boundary Testing

Analytic functions operate within partitions. If your PARTITION BY clause has bugs, everything downstream breaks. Test these scenarios:

Single-row partitions (edge case)
Empty partitions
Partitions with all identical values
Partitions with NULL values

2. Order By Verification

The ORDER BY inside your window frame determines which row is "current." Wrong ordering = wrong results. Always verify:

Ascending vs descending sort produces expected row assignments
Ties are handled according to your business logic
NULL handling (FIRST vs LAST) matches your requirements

3. Frame Boundary Testing

If you're using ROWS BETWEEN or RANGE BETWEEN, test the boundaries explicitly:

First and last row of each partition
Sliding window scenarios
Unbounded preceding/following behavior

How To: Writing Test Cases for Analytic Functions

Here's a practical approach you can use right now:

Step 1: Create a Test Dataset

Build a small, controlled dataset that covers your edge cases. Don't test against production data—create synthetic data with known inputs and expected outputs.

-- Example test data setup
CREATE TABLE test_sales (
    region VARCHAR(50),
    sale_date DATE,
    amount DECIMAL(10,2)
);

INSERT INTO test_sales VALUES
('North', '2024-01-01', 100.00),
('North', '2024-01-02', 150.00),
('North', '2024-01-03', 100.00),  -- Tie case
('South', '2024-01-01', 200.00),
('South', '2024-01-02', 175.00);

Step 2: Write Your Query

Apply your analytic function and verify results match expectations:

SELECT 
    region,
    sale_date,
    amount,
    ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) as row_num,
    RANK() OVER (PARTITION BY region ORDER BY amount DESC) as rank_val,
    LAG(amount) OVER (PARTITION BY region ORDER BY sale_date) as prev_amount
FROM test_sales;

Step 3: Verify Manually

For each row, trace through the logic by hand. Check:

Which rows share the same partition key
How ORDER BY affects the sequence
What values LEAD/LAG should return for boundary rows

Step 4: Assert Expected Values

Write assertions comparing actual vs expected. In SQL, use a CTE or subquery to isolate your analytic function, then filter for unexpected results:

WITH ranked AS (
    SELECT 
        region,
        amount,
        ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) as rn
    FROM test_sales
)
SELECT * FROM ranked 
WHERE rn = 1 
AND NOT (region = 'North' AND amount = 150.00);
-- This query should return 0 rows if your expectations are correct

Verification Methods Compared

Different verification approaches have different tradeoffs:

Method	Pros	Cons	Best For
Manual SQL Testing	Fast, no setup	Error-prone, not repeatable	Quick validation during development
Unit Tests (dbt, SQLFluff)	Repeatable, version-controlled	Setup overhead	Production pipelines
Sample Data Comparison	Easy to understand	Limited coverage	Stakeholder verification
Automated Assertions	Catches regressions	Maintenance burden	Critical business logic

Common Pitfalls That Break Analytic Functions

These mistakes show up constantly. Don't make them:

Assuming ROW_NUMBER() is deterministic — Without a unique ORDER BY, ties get arbitrary row assignments. If you need deterministic results, add a tiebreaker column.
Ignoring NULL in PARTITION BY — All NULLs go into the same partition. This is usually not what you want.
Wrong frame specification — RANGE BETWEEN and ROWS BETWEEN behave differently. RANGE looks at value ranges; ROWS counts physical rows.
Off-by-one errors in sliding windows — Test your window boundaries explicitly. "1 preceding and 1 following" excludes the current row—make sure that's intentional.
Forgetting the default frame — If you don't specify a frame, most databases use RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This catches many developers off guard.

Tools That Help You Test Analytic Functions

You don't have to do this manually. These tools make testing easier:

dbt (data build tool) — Great for writing SQL tests on analytic functions within transformation pipelines
Great Expectations — Validates data quality including window function results
SQLFuzz — Property-based testing for SQL queries
DuckDB — Lightweight in-memory database perfect for quick testing without spinning up a full instance

Quick Validation Checklist

Before you ship any query with analytic functions, run through this:

✅ Does each partition contain the rows you expect?
✅ Does the ORDER BY produce the sequence you need?
✅ Have you tested with NULL values in the partition key?
✅ Have you tested with single-row partitions?
✅ Do LEAD/LAG return NULL at partition boundaries as expected?
✅ Does your frame specification match your intent?
✅ Are tie-breaking rules explicit and correct?

The Brutal Truth About Testing Analytic Functions

Most developers test analytic functions poorly because the results "look right" on small datasets. The bugs only surface when you hit production volumes or real-world data distributions.

The only way to catch these issues: write explicit test cases covering partition boundaries, tie scenarios, and NULL handling. No amount of visual inspection substitutes for assertions against known inputs.

Set up automated tests. Use small, controlled datasets. Verify your assumptions before the data grows.