If you average a large number independent versions of the same random variable, the central limit theorem says the average will be approximately normal. That is the absolute error in approximating the density of the average by the density of a normal random variable will be small. (Terms and conditions apply. See notes here.)
But the central limit theorem says nothing about relative error. Relative error can diverge to infinity while absolute error converges to zero. We’ll illustrate this with an example.
The average of N independent exponential(1) random variables has a gamma distribution with shape N and scale 1/N.
As N increases, the average becomes more like a normal in distribution. That is, the absolute error in approximating the distribution function of gamma random variable with that of a normal random variable decreases. (Note that we’re talking about distribution functions (CDFs) and not densities (PDFs). The previous post discussed a surprise with density functions in this example.)
The following plot shows that the difference between the distributions functions get smaller as N increases.
But when we look at the ratio of the tail probabilities, that is Pr(X > t) / Pr(Y > t) where Xis the average of N exponential r.v.s and Y is the corresponding normal approximation from the central limit theorem, we see that the ratios diverge, and they diverge faster as N increases.
To make it clear what’s being plotted, here is the Python code used to draw the graphs above.
import matplotlib.pyplot as plt from scipy.stats import gamma, norm from scipy import linspace, sqrt def tail_ratio(ns): x = linspace(0, 4, 400) for n in ns: gcdf = gamma.sf(x, n, scale = 1/n) ncdf = norm.sf(x, loc=1, scale=sqrt(1/n)) plt.plot(x, gcdf/ncdf plt.yscale("log") plt.legend(["n = {}".format(n) for n in ns]) plt.savefig("gamma_normal_tail_ratios.svg") def cdf_error(ns): x = linspace(0, 6, 400) for n in ns: gtail = gamma.cdf(x, n, scale = 1/n) ntail = norm.cdf(x, loc=1, scale=sqrt(1/n)) plt.plot(x, gtail-ntail) plt.legend(["n = {}".format(n) for n in ns]) plt.savefig("gamma_normal_cdf_diff.svg") ns = [1, 4, 16] tail_ratio([ns) cdf_error(ns)
Original Source