Prerequisites: Matplotlib
Matplotlib is a library in Python and it is a numerical — mathematical extension for the NumPy library. The cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Properties of CDF:
- Every cumulative distribution function F(X) is non-decreasing
- If maximum value of the cdf function is at x, F(x) = 1.
- The CDF ranges from 0 to 1.
Method 1: Using the histogram
CDF can be calculated using PDF (Probability Distribution Function). Each point of random variable will contribute cumulatively to form CDF.
Example :
A combination set containing 2 balls which can be either red or blue can be in the following set.
{RR, RB, BR, BB}
t -> No of red balls.
P(x = t) -> t = 0 : 1 / 4 [BB]
t = 1 : 2 / 4 [RB, BR]
t = 2 : 1 / 4 [RR]
CDF :
F(x) = P(x<=t)
x = 0 : P(0) -> 1 / 4
x = 1 : P(1) + P(0) -> 3 / 4
x = 2 : P(2) + P(1) + P(0) -> 1
Approach
- Import modules
- Declare number of data points
- Initialize random values
- Plot histogram using above data
- Get histogram data
- Finding PDF using histogram data
- Calculate CDF
- Plot CDF
Example:
Python3
| # defining the librariesimportnumpy as npimportmatplotlib.pyplot as pltimportpandas as pd%matplotlib inline # No of Data pointsN =500 # initializing random valuesdata =np.random.randn(N) # getting data of the histogramcount, bins_count =np.histogram(data, bins=10) # finding the PDF of the histogram using count valuespdf =count /sum(count) # using numpy np.cumsum to calculate the CDF# We can also find using the PDF values by looping and addingcdf =np.cumsum(pdf) # plotting PDF and CDFplt.plot(bins_count[1:], pdf, color="red", label="PDF")plt.plot(bins_count[1:], cdf, label="CDF")plt.legend() | 
Output:
Histogram plot of the PDF and CDF :
Plotted CDF:
 
CDF plotting
Method 2: Data sort
This method depicts how CDF can be calculated and plotted using sorted data. For this, we first sort the data and then handle further calculations.
Approach
- Import module
- Declare number of data points
- Create data
- Sort data in ascending order
- Get CDF
- Plot CDF
- Display plot
Example:
Python3
| # defining the librariesimportnumpy as npimportmatplotlib.pyplot as pltimportpandas as pd%matplotlib inline # No of data points usedN =500 # normal distributiondata =np.random.randn(N) # sort the data in ascending orderx =np.sort(data) # get the cdf values of yy =np.arange(N) /float(N) # plottingplt.xlabel('x-axis')plt.ylabel('y-axis') plt.title('CDF using sorting the data') plt.plot(x, y, marker='o') | 
Output:


 
                                    







