Python cumulative sum dataframe column


The cumsum() function can be used to calculate the cumulative sum of values in a column of a pandas DataFrame.

You can use the following syntax to calculate a reversed cumulative sum of values in a column:

df['cumsum_reverse'] = df.loc[::-1, 'my_column'].cumsum()[::-1]

This particular syntax adds a new column called cumsum_reverse to a pandas DataFrame that shows the reversed cumulative sum of values in the column titled my_column.

The following example shows how to use this syntax in practice.

Example: Calculate a Reversed Cumulative Sum in Pandas

Suppose we have the following pandas DataFrame that shows the total sales made by some store during 10 consecutive days:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'day': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                   'sales': [3, 6, 0, 2, 4, 1, 0, 1, 4, 7]})

#view DataFrame
df

      day   sales
0	1	3
1	2	6
2	3	0
3	4	2
4	5	4
5	6	1
6	7	0
7	8	1
8	9	4
9	10	7

We can use the following syntax to calculate a reversed cumulative sum of the sales column:

#add new column that shows reverse cumulative sum of sales
df['cumsum_reverse_sales'] = df.loc[::-1, 'sales'].cumsum()[::-1]

#view updated DataFrame
df

	day	sales	cumsum_reverse_sales
0	1	3	28
1	2	6	25
2	3	0	19
3	4	2	19
4	5	4	17
5	6	1	13
6	7	0	12
7	8	1	12
8	9	4	11
9	10	7	7

The new column titled cumsum_reverse_sales shows the cumulative sales starting from the last row.

Here’s how we would interpret the values in the cumsum_reverse_sales column:

  • The cumulative sum of sales for day 10 is 7.
  • The cumulative sum of sales for day 10 and day 9 is 11.
  • The cumulative sum of sales for day 10, day 9, and day 8 is 12.
  • The cumulative sum of sales for day 10, day 9, day 8, and day 7 is 12.

And so on.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Sum Specific Columns in Pandas
How to Perform a GroupBy Sum in Pandas
How to Sum Columns Based on a Condition in Pandas

Python cumulative sum dataframe column
Onyejiaku Theophilus Chidalu

Overview

The cummsum() function of a DataFrame object is used to obtain the cumulative sum over its axis.

Note: Axis here simply represents the row and column of the DataFrame. An axis with a value of 0 indicates the axes running vertically downwards across a row, while a value of 1 indicates the axes running horizontally across a column.

Syntax

DataFrame.cumsum(axis=None, skipna=True, *args, **kwargs)

Syntax for the cumsum() function in pandas

Parameters

  • axis: This represents the name for the row ( designated as 0 or 'index') or the column (designated as 1 or columns) axis.
  • skipna: This takes a boolean value indicating if null values are to be excluded or not. This is an optional parameter.
  • args, **kwargs: These keywords have no effect but may be accepted for compatibility with NumPy. These are optional.

Return value

This function returns a Series or DataFrame object showing the cumulative maximum in the axis.

Example

# A code to illustrate the cumsum() function in Pandas

# importing the pandas library

import pandas as pd

# creating a dataframe

df = pd.DataFrame([[5,10,4,15,3],

[1,7,5,9,0.5],

[3,11,13,14,12]],

columns=list('ABCDE'))

# printing the dataframe

print(df)

# obtaining the cummulative sum vertically across rows

print(df.cumsum(axis="index"))

# obtaining the cummulative sum horizontally over columns

print(df.cumsum(axis="columns"))

Syntax for the cumusum() function

Explanation

  • Line 4: We import the pandas library.
  • Lines 7–10: We create a DataFrame, df.
  • Line 12: We print the DataFrame, df.
  • Line 15: We use the cumsum() function to obtain the cumulative maximum values running downwards across the rows (axis 0). We print the result to the console.
  • Line 18: We use the cumsum() function to obtain the cumulative maximum values running horizontally across columns (axis 1). We print the result to the console.

CONTRIBUTOR

Python cumulative sum dataframe column
Onyejiaku Theophilus Chidalu

How do you do a cumulative sum in a DataFrame in Python?

The cumsum() method returns a DataFrame with the cumulative sum for each row. The cumsum() method goes through the values in the DataFrame, from the top, row by row, adding the values with the value from the previous row, ending up with a DataFrame where the last row contains the sum of all values for each column.

How do you make a cumulative column in Python?

Cumulative sum of a column in Pandas can be easily calculated with the use of a pre-defined function cumsum()..
Syntax: cumsum(axis=None, skipna=True, *args, **kwargs).
Parameters:.
axis: {index (0), columns (1)}.
skipna: Exclude NA/null values. ... .
Returns: Cumulative sum of the column..

How do you find the sum of a column in pandas?

Pandas DataFrame sum() Method The sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.

How does cumulative sum work?

The cumulative sum can be defined as the sum of a set of numbers as the sum value grows with the sequence of numbers.