Python cumulative sum dataframe column

The cumsum[] function can be used to calculate the cumulative sum of values in a column of a pandas DataFrame.

You can use the following syntax to calculate a reversed cumulative sum of values in a column:

df['cumsum_reverse'] = df.loc[::-1, 'my_column'].cumsum[][::-1]

This particular syntax adds a new column called cumsum_reverse to a pandas DataFrame that shows the reversed cumulative sum of values in the column titled my_column.

The following example shows how to use this syntax in practice.

Example: Calculate a Reversed Cumulative Sum in Pandas

Suppose we have the following pandas DataFrame that shows the total sales made by some store during 10 consecutive days:

import pandas as pd

#create DataFrame
df = pd.DataFrame[{'day': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                   'sales': [3, 6, 0, 2, 4, 1, 0, 1, 4, 7]}]

#view DataFrame
df

      day   sales
0	1	3
1	2	6
2	3	0
3	4	2
4	5	4
5	6	1
6	7	0
7	8	1
8	9	4
9	10	7

We can use the following syntax to calculate a reversed cumulative sum of the sales column:

#add new column that shows reverse cumulative sum of sales
df['cumsum_reverse_sales'] = df.loc[::-1, 'sales'].cumsum[][::-1]

#view updated DataFrame
df

	day	sales	cumsum_reverse_sales
0	1	3	28
1	2	6	25
2	3	0	19
3	4	2	19
4	5	4	17
5	6	1	13
6	7	0	12
7	8	1	12
8	9	4	11
9	10	7	7

The new column titled cumsum_reverse_sales shows the cumulative sales starting from the last row.

Here’s how we would interpret the values in the cumsum_reverse_sales column:

  • The cumulative sum of sales for day 10 is 7.
  • The cumulative sum of sales for day 10 and day 9 is 11.
  • The cumulative sum of sales for day 10, day 9, and day 8 is 12.
  • The cumulative sum of sales for day 10, day 9, day 8, and day 7 is 12.

And so on.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Sum Specific Columns in Pandas
How to Perform a GroupBy Sum in Pandas
How to Sum Columns Based on a Condition in Pandas

Onyejiaku Theophilus Chidalu

Overview

The cummsum[] function of a DataFrame object is used to obtain the cumulative sum over its axis.

Note: Axis here simply represents the row and column of the DataFrame. An axis with a value of 0 indicates the axes running vertically downwards across a row, while a value of 1 indicates the axes running horizontally across a column.

Syntax

DataFrame.cumsum[axis=None, skipna=True, *args, **kwargs]

Syntax for the cumsum[] function in pandas

Parameters

  • axis: This represents the name for the row [ designated as 0 or 'index'] or the column [designated as 1 or columns] axis.
  • skipna: This takes a boolean value indicating if null values are to be excluded or not. This is an optional parameter.
  • args, **kwargs: These keywords have no effect but may be accepted for compatibility with NumPy. These are optional.

Return value

This function returns a Series or DataFrame object showing the cumulative maximum in the axis.

Example

# A code to illustrate the cumsum[] function in Pandas

# importing the pandas library

import pandas as pd

# creating a dataframe

df = pd.DataFrame[[[5,10,4,15,3],

[1,7,5,9,0.5],

[3,11,13,14,12]],

columns=list['ABCDE']]

# printing the dataframe

print[df]

# obtaining the cummulative sum vertically across rows

print[df.cumsum[axis="index"]]

# obtaining the cummulative sum horizontally over columns

print[df.cumsum[axis="columns"]]

Syntax for the cumusum[] function

Explanation

  • Line 4: We import the pandas library.
  • Lines 7–10: We create a DataFrame, df.
  • Line 12: We print the DataFrame, df.
  • Line 15: We use the cumsum[] function to obtain the cumulative maximum values running downwards across the rows [axis 0]. We print the result to the console.
  • Line 18: We use the cumsum[] function to obtain the cumulative maximum values running horizontally across columns [axis 1]. We print the result to the console.

CONTRIBUTOR

Onyejiaku Theophilus Chidalu

How do you do a cumulative sum in a DataFrame in Python?

The cumsum[] method returns a DataFrame with the cumulative sum for each row. The cumsum[] method goes through the values in the DataFrame, from the top, row by row, adding the values with the value from the previous row, ending up with a DataFrame where the last row contains the sum of all values for each column.

How do you make a cumulative column in Python?

Cumulative sum of a column in Pandas can be easily calculated with the use of a pre-defined function cumsum[]..
Syntax: cumsum[axis=None, skipna=True, *args, **kwargs].
Parameters:.
axis: {index [0], columns [1]}.
skipna: Exclude NA/null values. ... .
Returns: Cumulative sum of the column..

How do you find the sum of a column in pandas?

Pandas DataFrame sum[] Method The sum[] method adds all values in each column and returns the sum for each column. By specifying the column axis [ axis='columns' ], the sum[] method searches column-wise and returns the sum of each row.

How does cumulative sum work?

The cumulative sum can be defined as the sum of a set of numbers as the sum value grows with the sequence of numbers.

Bài mới nhất

Chủ Đề