Python calculate percentile from histogram

A histogram is a graph of a single continuous variable. The variable is first categorized into bins. Then these bins are listed on the x (horizontal) axis. Then a rectangle is placed over the bin, the height of which is proportional to the frequency of the bin.
The percentiles of a distribution are the values that separate the variable into 100 groups of equal frequency.

    Find the frequency of each bin. You can do this by drawing a horizontal line from the top of each rectangle to the y-axis (the vertical axis) and finding the frequency. You may need to estimate this, if the line is between two tick marks.
    Suppose you have a histogram with 5 bins, and the frequencies are 5, 15, 20, 7 and 3.

    Add the frequencies found in step 1. In the example, the total is 5 + 15 + 20 + 7 + 3 = 50.

    Divide the frequency for each bin by the total frequency. In the example: 5/50, 15/50, 20/50, 7/50 and 3/50.

    Divide 100 by the total frequency. In the example 100/50 = 2.

    Multiply the numerator (top part) of each fraction in step 3 by the quotient in step 4. In the example 5_2 = 10, 15_2 = 30, 20_2 = 40, 7_2 = 14 and 3*2 = 6.

    Sum the results cumulatively. That is, add the first two numbers, the first three and so on until you have added them all. These are the percentiles for upper number in each bin. In the example: 10, 10 + 30 = 40, 40 + 40 = 80, 80 + 14 = 94 and 94 + 6 = 100.

    Warnings

    • The histogram is not really intended for finding percentiles, and you will often have to approximate.


What are Percentiles?

Percentiles are used in statistics to give you a number that describes the value that a given percent of the values are lower than.

Example: Let's say we have an array of the ages of all the people that lives in a street.

ages = [5,31,43,48,50,41,7,11,15,39,80,82,32,2,8,6,25,36,27,61,31]

What is the 75. percentile? The answer is 43, meaning that 75% of the people are 43 or younger.

The NumPy module has a method for finding the specified percentile:

Example

Use the NumPy percentile() method to find the percentiles:

import numpy

ages = [5,31,43,48,50,41,7,11,15,39,80,82,32,2,8,6,25,36,27,61,31]

x = numpy.percentile(ages, 75)

print(x)

Try it Yourself »

Example

What is the age that 90% of the people are younger than?

import numpy

ages = [5,31,43,48,50,41,7,11,15,39,80,82,32,2,8,6,25,36,27,61,31]

x = numpy.percentile(ages, 90)

print(x)

Try it Yourself »



Get a histogram, mean, median, stddev, and percentiles from a pipe on the command line with numpy

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters

#!/usr/bin/env python
"""
Note: requires numpy. `sudo pip install numpy`
Example:
$ echo -e "1\n2\n5\n10\n20\n" | get-stats
mean=7.6
median=5.0
std=6.94550214167
95th=18.0
99th=19.6
2 1.0 - 3.0
1 3.0 - 8.0
2 8.0 - 21.0
0 21.0 - 55.0
0 55.0 - 144.0
0 144.0 - 377.0
0 377.0 - 987.0
0 987.0 - 2584.0
0 2584.0 - 6765.0
0 6765.0 - 17711.0
0 17711.0 - 1000000.0
"""
import argparse
import re
import sys
from numpy import mean
from numpy import median
from numpy import std
from numpy import percentile
from numpy import histogram
stats = ('mean', 'median', 'std')
def csv_list(s):
try:
return [float(i) for i in s.split(',')]
except Exception:
raise argparse.ArgumentTypeError('')
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='compute stats from newline separated stdin')
parser.add_argument('-b', '--bins', type=csv_list, default='1,5,10,20,40,80')
parser.add_argument('-p', '--percentiles', type=csv_list, default='50,95,99')
args = parser.parse_args()
vals = []
for l in sys.stdin:
try:
vals.append(float(l.strip(' \n')))
except ValueError as e:
pass
for stat in stats:
print '%s=%s' % (stat, vars()[stat](vals))
for pct in args.percentiles:
print '%sth=%s' % (pct, percentile(vals, pct))
# TODO make bins configurable on cmdline
hist, bin_edges = histogram(vals, bins=args.bins)
for i, (val, bn) in enumerate(zip(hist, bin_edges)):
print val, '\t', bn, '-', bin_edges[i+1]

How do you find the percentile of a data in Python?

percentile()function used to compute the nth percentile of the given data (array elements) along the specified axis..
Syntax : numpy.percentile(arr, n, axis=None, out=None).
Parameters :.
arr :input array..
n : percentile value..
axis : axis along which we want to calculate the percentile value..

How does Numpy calculate percentile?

Given a vector V of length N, the q-th percentile of V is the q-th ranked value in a sorted copy of V. The values and distances of the two nearest neighbors as well as the interpolation parameter will determine the percentile if the normalized ranking does not match q exactly.

How do you get 75th percentile in pandas?

“how to find the 75th percentile in pandas” Code Answer.
import pandas as pd..
import random..
A = [ random. randint(0,100) for i in range(10) ].
B = [ random. randint(0,100) for i in range(10) ].
df = pd. DataFrame({ 'field_A': A, 'field_B': B }).