Mastering Cumulative Revenue in SQL: Your Complete Interview Guide

Mastering Cumulative Revenue in SQL: Your Interview Guide

Calculating a cumulative sum, or a "running total," is one of the most practical and frequently asked SQL questions in data analytics and business intelligence interviews. It demonstrates your understanding of modern SQL features and your ability to solve real-world business problems.

This guide will walk you through every scenario, from the basics to advanced challenges, all framed from the perspective of a technical interview.

The Setup: Our Sample `sales` Table

For all our examples, we'll work with a simple `sales` table. This is our ground truth.

Input Table: `sales`

CREATE TABLE sales (
    order_id INT PRIMARY KEY,
    product_category VARCHAR(50),
    order_date DATE,
    revenue DECIMAL(10, 2)
);

INSERT INTO sales (order_id, product_category, order_date, revenue) VALUES
(1, 'Electronics', '2023-10-01', 150.00),
(2, 'Books',       '2023-10-01', 25.50),
(3, 'Electronics', '2023-10-02', 200.00),
(4, 'Home Goods',  '2023-10-02', 75.00),
(5, 'Books',       '2023-10-03', 40.00),
(6, 'Electronics', '2023-10-03', 500.00),
(7, 'Books',       '2023-10-03', 15.00);
order_idproduct_categoryorder_daterevenue
1Electronics2023-10-01150.00
2Books2023-10-0125.50
3Electronics2023-10-02200.00
4Home Goods2023-10-0275.00
5Books2023-10-0340.00
6Electronics2023-10-03500.00
7Books2023-10-0315.00

Scenario 1: The Basic Cumulative Sum

Interviewer Question 💬: "Write a SQL query to show a running total of revenue over time, ordered by the transaction date."

The Solution: Using Window Functions

The most efficient and modern way to solve this is with the `SUM() OVER()` window function.

SELECT
    order_date,
    revenue,
    SUM(revenue) OVER (ORDER BY order_date ASC) AS cumulative_revenue
FROM
    sales
ORDER BY
    order_date;

Output Table:

order_daterevenuecumulative_revenue
2023-10-01150.00175.50
2023-10-0125.50175.50
2023-10-02200.00450.50
2023-10-0275.00450.50
2023-10-0340.001005.50
2023-10-03500.001005.50
2023-10-0315.001005.50

💡 Logical Reasoning (Your Interview Answer)

"I'll solve this using a window function, specifically `SUM() OVER()`, because it's highly efficient and designed for this type of calculation.

  1. `SUM(revenue)` specifies the aggregation I want to perform.
  2. The `OVER()` clause defines the 'window' or set of rows for the sum.
  3. `ORDER BY order_date ASC` inside the `OVER()` clause is the key. It tells SQL to sum the revenue for the current row and all preceding rows based on the date.
  4. Notice that rows with the same date are treated as peers. The cumulative value is the same for both rows on '2023-10-01' because the window includes all rows up to that date. The default window frame includes all peers of the current row."

Visualizing the Window Frame

For the row with revenue 200.00 on 2023-10-02, the window looks like this:

2023-10-01
150.00
2023-10-01
25.50
2023-10-02
200.00
2023-10-02
75.00

Sum = 150 + 25.50 + 200 + 75 = 450.50


Scenario 2: Cumulative Revenue Per Day (Aggregated)

Interviewer Question 💬: "That's good, but we usually want to see a single cumulative figure for each day, not for each transaction. How would you show the total daily revenue and the cumulative revenue at the end of each day?"

The Solution: Aggregate First with a CTE

This is a two-step problem. The best practice is to use a Common Table Expression (CTE) to first aggregate the data, then apply the window function.

WITH daily_revenue AS (
    -- Step 1: Aggregate revenue by day
    SELECT
        order_date,
        SUM(revenue) AS total_daily_revenue
    FROM
        sales
    GROUP BY
        order_date
)
-- Step 2: Calculate the cumulative sum on the aggregated data
SELECT
    order_date,
    total_daily_revenue,
    SUM(total_daily_revenue) OVER (ORDER BY order_date ASC) AS cumulative_revenue
FROM
    daily_revenue
ORDER BY
    order_date;

Output Table:

order_datetotal_daily_revenuecumulative_revenue
2023-10-01175.50175.50
2023-10-02275.00450.50
2023-10-03555.001005.50

💡 Logical Reasoning (Your Interview Answer)

"This requires a two-step logical process, which I'll implement using a Common Table Expression (CTE) for readability and maintainability.

  1. The CTE `daily_revenue`: First, I need to get the total revenue for each day. I do this by `GROUPING BY order_date` and using `SUM(revenue)`. This creates a temporary table with one row per day.
  2. The Final `SELECT`: Then, I can treat this CTE as a regular table. I apply the same `SUM() OVER (ORDER BY ...)` window function to the `total_daily_revenue` column. This correctly calculates the running total on the daily aggregated figures.
This approach cleanly separates the aggregation logic from the window function logic."


Scenario 3: Cumulative Revenue with Partitions

Interviewer Question 💬: "Excellent. Now, the product team wants to track growth for each category independently. Can you calculate the cumulative revenue, but have it restart for each `product_category`?"

The Solution: The `PARTITION BY` Clause

This is the perfect use case for `PARTITION BY`, which acts like a `GROUP BY` for window functions.

SELECT
    order_date,
    product_category,
    revenue,
    SUM(revenue) OVER (PARTITION BY product_category ORDER BY order_date ASC) AS category_cumulative_revenue
FROM
    sales
ORDER BY
    product_category,
    order_date;

Output Table:

order_dateproduct_categoryrevenuecategory_cumulative_revenue
2023-10-01Books25.5025.50
2023-10-03Books40.0065.50
2023-10-03Books15.0080.50
2023-10-01Electronics150.00150.00
2023-10-02Electronics200.00350.00
2023-10-03Electronics500.00850.00
2023-10-02Home Goods75.0075.00

💡 Logical Reasoning (Your Interview Answer)

"Absolutely. I can achieve this by adding the `PARTITION BY` clause to the window function.

  • `PARTITION BY product_category`: This clause logically splits the data into separate partitions, one for 'Books', one for 'Electronics', and one for 'Home Goods'.
  • The `SUM()` calculation is then performed independently within each of these partitions. When the query moves from the 'Books' partition to the 'Electronics' partition, the running total resets and starts from scratch.
  • The `ORDER BY order_date` still dictates the summation order, but it operates *within* each partition."

Visualizing PARTITION BY
Partition: Books
25.50 → 25.50
40.00 → 65.50
15.00 → 80.50
Partition: Electronics
150.00 → 150.00
200.00 → 350.00
500.00 → 850.00
Partition: Home Goods
75.00 → 75.00

The calculations in each box are completely separate.


🚀 Advanced Scenarios & Bonus Points

Scenario 4: Handling Days with No Sales

Interviewer Question 💬: "This is great, but our business report needs to show a row for *every single day* in October, even if there were no sales. The cumulative revenue should carry forward from the last day with sales. How would you solve this?"

The Solution: Calendar Table + `LEFT JOIN`

This is a senior-level question that tests your ability to handle sparse data. The key is to generate a complete date series (a calendar) and `LEFT JOIN` your sales data to it.

-- This CTE generates a date series. Syntax may vary by SQL dialect.
-- (This is a portable recursive CTE version)
WITH RECURSIVE calendar AS (
    SELECT CAST('2023-10-01' AS DATE) AS report_date
    UNION ALL
    SELECT DATE_ADD(report_date, INTERVAL 1 DAY) -- Use DATEADD for SQL Server
    FROM calendar
    WHERE report_date < '2023-10-31'
),
daily_revenue AS (
    SELECT
        order_date,
        SUM(revenue) AS total_daily_revenue
    FROM sales
    GROUP BY order_date
),
-- Join the complete calendar to our actual sales data
filled_daily_revenue AS (
    SELECT
        cal.report_date,
        COALESCE(dr.total_daily_revenue, 0) AS daily_revenue
    FROM calendar cal
    LEFT JOIN daily_revenue dr ON cal.report_date = dr.order_date
)
-- Finally, calculate the cumulative sum on the complete, filled-in data
SELECT
    report_date,
    daily_revenue,
    SUM(daily_revenue) OVER (ORDER BY report_date ASC) AS cumulative_revenue
FROM
    filled_daily_revenue
ORDER BY
    report_date;

💡 Logical Reasoning (Your Interview Answer)

"This is a common data warehousing problem. My approach involves four steps to ensure every day is represented:

  1. Generate a Calendar: First, I'll create a complete list of dates for the desired period using a recursive CTE. This acts as my master list.
  2. Aggregate Sales Data: I'll use the same daily aggregation CTE from before to get the total revenue for days that actually had sales.
  3. `LEFT JOIN` and `COALESCE`: The crucial step is to `LEFT JOIN` the calendar to the sales data. This keeps all dates from the calendar. For days with no sales, the revenue will be `NULL`. I then use `COALESCE(revenue, 0)` to convert these `NULL`s to zero, which is essential for the sum to work correctly.
  4. Final Cumulative Sum: Now that I have a clean, complete dataset with a row for every day, I can apply the standard `SUM() OVER()` window function to get the correct running total that carries forward over the gaps.
"

Scenario 5: The "Old School" Method (No Window Functions)

Interviewer Question 💬: "Just for fun, let's say you're working with an old version of MySQL that doesn't support window functions. How would you calculate the cumulative revenue per day then?"

The Solution: Correlated Subquery

This tests your knowledge of more fundamental SQL techniques. A correlated subquery is a classic way to solve this, though less efficient.

WITH daily_revenue AS (
    SELECT
        order_date,
        SUM(revenue) AS total_daily_revenue
    FROM sales
    GROUP BY order_date
)
SELECT
    d1.order_date,
    d1.total_daily_revenue,
    (SELECT SUM(d2.total_daily_revenue)
     FROM daily_revenue d2
     WHERE d2.order_date <= d1.order_date) AS cumulative_revenue
FROM
    daily_revenue d1
ORDER BY
    d1.order_date;

💡 Logical Reasoning (Your Interview Answer)

"Without window functions, I'd use a correlated subquery. Here's how it works:

  • First, I'd still aggregate the daily revenue into a CTE, just like before.
  • For the main query, I select the date and daily revenue from the CTE, which I'll alias as `d1`.
  • The cumulative revenue column is calculated with a subquery. For each row (each day) in the outer query (`d1`), the subquery runs again. It sums up all `total_daily_revenue` from the CTE (aliased as `d2`) where the date (`d2.order_date`) is less than or equal to the current row's date (`d1.order_date`).
  • Crucial Follow-up: It's important to note that while this works, it's far less performant than window functions, especially on large datasets. The subquery is executed for every single row, leading to a complexity that approaches O(N²), whereas window functions are typically much more optimized, closer to O(N log N) or O(N)."


🏆 Top 5 Interview Questions: Quick Recap

1. How do you calculate a basic running total?

Use SUM(column) OVER (ORDER BY date_column).

2. How do you find the cumulative sum per day (not per transaction)?

Use a CTE or subquery to GROUP BY day first, then apply the window function to the aggregated daily totals.

3. How do you restart the running total for different groups (e.g., categories)?

Add PARTITION BY category_column inside the OVER() clause.

4. How do you handle missing days in a report?

Generate a calendar table, LEFT JOIN your sales data to it, use COALESCE to replace NULLs with 0, then apply the window function.

5. Why are window functions preferred over self-joins/correlated subqueries for running totals?

Performance. Window functions are significantly more efficient and scalable as they typically require only one pass over the data, while self-joins can lead to quadratic complexity (O(N²)) and become very slow on large tables.

❓ Frequently Asked Questions (FAQs)

What's the difference between `ROWS` and `RANGE` in the window frame?

This is an advanced detail. The default frame is `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`. With `RANGE`, rows with the same `ORDER BY` value are treated as peers (as seen in Scenario 1). If you used `ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`, it would only sum up to the physical current row, giving a different result if the ordering of peer rows is not guaranteed. For most cumulative totals by date, `RANGE` is what you want.

Can you use other functions like `AVG()` or `COUNT()` as window functions?

Yes! You can calculate a cumulative average (`AVG() OVER (...)`), a cumulative count (`COUNT() OVER (...)`), or use other functions like `MIN()`, `MAX()`, `LAG()`, and `LEAD()` to get amazing insights across your data window.

What if my `order_date` column is actually a `DATETIME` or `TIMESTAMP` with time?

That makes the `ORDER BY` more precise. If you want to group by day, you should first `CAST(order_timestamp AS DATE)` in your aggregation CTE. If you want a running total by the exact timestamp, you can just use `ORDER BY order_timestamp` directly.


Discover more from Data Engineer Journey

Subscribe to get the latest posts sent to your email.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top

Discover more from Data Engineer Journey

Subscribe now to keep reading and get access to the full archive.

Continue reading