Relative Error in the Central Limit Theorem

6 September 2024

0

If you average a large number independent versions of the same random variable, the central limit theorem says the average will be approximately normal. That is the absolute error in approximating the density of the average by the density of a normal random variable will be small. (Terms and conditions apply. See notes here.)

But the central limit theorem says nothing about relative error. Relative error can diverge to infinity while absolute error converges to zero. We’ll illustrate this with an example.

The average of N independent exponential(1) random variables has a gamma distribution with shape N and scale 1/N.

As N increases, the average becomes more like a normal in distribution. That is, the absolute error in approximating the distribution function of gamma random variable with that of a normal random variable decreases. (Note that we’re talking about distribution functions (CDFs) and not densities (PDFs). The previous post discussed a surprise with density functions in this example.)

The following plot shows that the difference between the distributions functions get smaller as N increases.

But when we look at the ratio of the tail probabilities, that is Pr(X > t) / Pr(Y > t) where Xis the average of N exponential r.v.s and Y is the corresponding normal approximation from the central limit theorem, we see that the ratios diverge, and they diverge faster as N increases.

To make it clear what’s being plotted, here is the Python code used to draw the graphs above.

import matplotlib.pyplot as plt
from scipy.stats import gamma, norm
from scipy import linspace, sqrt

def tail_ratio(ns):

    x = linspace(0, 4, 400)
    
    for n in ns:
        gcdf = gamma.sf(x, n, scale = 1/n)
        ncdf = norm.sf(x, loc=1, scale=sqrt(1/n))
        plt.plot(x, gcdf/ncdf

    plt.yscale("log")        
    plt.legend(["n = {}".format(n) for n in ns])
    plt.savefig("gamma_normal_tail_ratios.svg")

def cdf_error(ns):

    x = linspace(0, 6, 400)
    
    for n in ns:
        gtail = gamma.cdf(x, n, scale = 1/n)
        ntail = norm.cdf(x, loc=1, scale=sqrt(1/n))
        plt.plot(x, gtail-ntail)

    plt.legend(["n = {}".format(n) for n in ns])
    plt.savefig("gamma_normal_cdf_diff.svg")
    
ns = [1, 4, 16]
tail_ratio([ns)
cdf_error(ns)

Original Source

Relative Error in the Central Limit Theorem

Run Local AWS Cloud Stack using LocalStack on Linux

Learn Terraform Automation in 3 days using Video Courses

How To Expose Ansible AWX Service using Nginx Ingress

LEAVE A REPLY Cancel reply

Most Popular

Verizon will basically pay you to buy the new, awesome Barbie phone

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Recent Comments

EDITOR PICKS

Verizon will basically pay you to buy the new, awesome Barbie phone

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

POPULAR POSTS

Verizon will basically pay you to buy the new, awesome Barbie phone

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

POPULAR CATEGORY

ABOUT US

FOLLOW US