Cara menggunakan pmf pdf python

The line df["AGW"].sort_values() doesn't change df. Maybe you meant df.sort_values(by=['AGW'], inplace=True). In that case the full code will be :

import numpy as np
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plt
import scipy.stats as stats

x = np.random.normal(50, 3, 1000)
source = {"Genotype": ["CV1"]*1000, "AGW": x}
df=pd.DataFrame(source)

df.sort_values(by=['AGW'], inplace=True)
df_mean = np.mean(df["AGW"])
df_std = np.std(df["AGW"])
pdf = stats.norm.pdf(df["AGW"], df_mean, df_std)

plt.plot(df["AGW"], pdf)

Which gives :

Edit :

I think here we already have the distribution (x is normally distributed) so we dont need to generate the pdf of x. As the use of the pdf is for something like this :

mu = 50
variance = 3
sigma = math.sqrt(variance)
x = np.linspace(mu - 5*sigma, mu + 5*sigma, 1000)
plt.plot(x, stats.norm.pdf(x, mu, sigma))
plt.show()

Here we dont need to generate the distribution from x points, we only need to plot the density of the distribution we already have . So you might use this :

Make a Binomial Random variable $X$ and compute its probability mass function (PMF) or cumulative density function (CDF). We love the scipy stats library because it defines all the functions you would care about for a random variable, including expectation, variance, and even things we haven't talked about in CS109, like entropy. This example declares $X \sim \text{Bin}(n = 10, p = 0.2)$. It calculates a few statistics on $X$. It then calculates $P(X = 3)$ and $P(X \leq 4)$. Finally it generates a few random samples from $X$:

from scipy import stats
X = stats.binom(10, 0.2) # Declare X to be a binomial random variable
print X.pmf(3)           # P(X = 3)
print X.cdf(4)           # P(X <= 4)
print X.mean()           # E[X]
print X.var()            # Var(X)
print X.std()            # Std(X)
print X.rvs()            # Get a random sample from X
print X.rvs(10)          # Get 10 random samples form X

From a terminal you can always use the "help" command to see a full list of methods defined on a variable (or for a package):

from scipy import stats
X = stats.binom(10, 0.2) # Declare X to be a binomial random variable
help(X)                  # List all methods defined for X

Poisson

Make a Poisson Random variable $Y$. This example declares $Y \sim \text{Poi}(\lambda = 2)$. It then calculates $P(Y = 3)$:

from scipy import stats
Y = stats.poisson(2) # Declare Y to be a poisson random variable
print Y.pmf(3)       # P(Y = 3)
print Y.rvs()        # Get a random sample from Y

Geometric

Make a Geometric Random variable $X$, the number of trials until a success. This example declares $X \sim \text{Geo}(p = 0.75)$:

from scipy import stats
X = stats.geom(0.75) # Declare X to be a geometric random variable
print X.pmf(3)       # P(X = 3)
print X.rvs()        # Get a random sample from Y


Continuous Random Variables

Normal

Make a Normal Random variable $A$. This example declares $A \sim N(\mu = 3, \sigma^2 = 16)$. It then calculates $f_Y(0)$ and $F_Y(0)$. Very Imporatant!!! In class the second parameter to a normal was the variance ($\sigma^2$). In the scipy library the second parameter is the standard deviation ($\sigma$):

Random variables and the various distribution functions which form the foundations of Machine Learning

Table of contents

  • Introduction
  • Random Variable and its types
  • PDF (probability density function)
  • PMF (Probability Mass function)
  • CDF (Cumulative distribution function)
  • Example
  • Further Reading
Introduction

PDF and CDF are commonly used techniques in the Exploratory data analysis to finding the probabilistic relation between the variables.

Before going through the contents in this page ,first go through the fundamental concepts like random variable, pmf, pdf and cdf.

Random variable

A random variable is a variable whose value is unknown to the function i.e, the value is depends upon the outcome of experiment

For example, while throwing a dice, the variable value is depends upon the outcome.

Mostly random variables are used for regression analysis to determine statistical relationship between each other. There are 2 types of random variable:

1 — Continuous random variable

2 — Discrete random variable

Continuous random variable:- A variable which having the values between the range/interval and take infinite number of possible ways is called Continuous random variable . OR the variables whose values are obtained by measuring is called Continuous random variable. For e.g, A average height of 100 peoples, measurement of rainfall

Discrete Random Variable:-A variable which takes countable number of distinct values. OR the variables whose values are obtained by counting is called Discrete Random Variable. For e.g, number of students present in class

PDF (Probability Density Function):-

The formula for PDF

PDF is a statistical term that describes the probability distribution of the continues random variable

PDF most commonly follows the Gaussian Distribution. If the features / random variables are Gaussian distributed then PDF also follows Gaussian Distribution. On PDF graph the probability of single outcome is always zero, this happened because the single point represents the line which doesn’t cover the area under the curve.

PMF (Probability Mass Function):-

Fig:- Formula for PMF

PMF is a statistical term that describes the probability distribution of the Discrete random variable

People often get confused between PDF and PMF. The PDF is applicable for continues random variable while PMF is applicable for discrete random variable For e.g, Throwing a dice (You can only select 1 to 6 numbers (countable) )

CDF (Cumulative Distribution Function):-

Fig:- Formula for CDF

PMF is a way to describe distribution but its only applicable for discrete random variables and not for continuous random variables. The cumulative distribution function is applicable for describing the distribution of random variables either it is continuous or discrete

For example, if X is the height of a person selected at random then F(x) is the chance that the person will be shorter than x. If F(180 cm)=0.8. then there is an 80% chance that a person selected at random will be shorter than 180 cm (equivalently, a 20% chance that they will be taller than 180cm)