Saturday, September 21, 2024
Google search engine
HomeData Modelling & AIWhen the bootstrap doesn’t work

When the bootstrap doesn’t work

The bootstrap always works, except sometimes.

By ‘works’ here, I mean in the weakest senses that the large-sample bootstrap variance correctly estimates the variance of the statistic, or that the large-scale percentile bootstrap intervals have their nominal coverage. I don’t mean the stronger sense that someone like Peter Hall might use, that the bootstrap gives higher-order accurate confidence intervals. So the bootstrap ‘works’ for the median, even though not as well as for smooth functions of the mean.

Here are the reasons I know of why the bootstrap might fail

0. Correlation. The one that everyone knows about nowadays.  If your data have structure, such as a time series, a spatial map, a carefully-structured experimental design, a multistage survey, a network, then you can’t hope to get the right distribution by resampling in a way that doesn’t respect that structure.

1. Constraints: Suppose XnN(θ,1)Xn∼N(θ,1) and we know θ0θ≥0. The maximum likelihood estimator of θθ is ^θ=max(¯X,0)θ^=max(X¯,0). If θ>0θ>0 there isn’t a problem asymptotically (or at a more sophisticated analysis, if θ1/nθ≫1/n there isn’t).  But if θ=0θ=0 the sampling distribution of ^θθ^ is a 50:50 mixture of a spike at zero and the positive half of a N(0,n1)N(0,n−1) distribution.  The bootstrap distribution is also a mixture of a spike at zero and and a half-normal, but the mass on the spike does not converge to 0.5 (or to anything else) as the sample size increases. The problem is that the height of the spike is Φ(¯Xn)Φ(X¯n), so the height converges in distribution to U(0,1)U(0,1).

2. Extrema.  Consider XU(θ,1)X∼U(θ,1). The bootstrap replicates θθ∗ have a distribution that puts mass 0.632=1e10.632=1−e−1 on the smallest observation, e1(1e1)0.233e−1(1−e−1)≈0.233 on the second smallest, and so on geometrically. We always have θ^θθ∗≥θ^, and the bootstrap distribution stays very discrete as the sample size increases.

3. Lack of smoothness (cube-root asymptotics) Tukey’s shorth, the mean of the shortest half of the data, converges to the mean at n1/3n−1/3 rate instead of the usual n½n−½. The same is true for the least-median-of-squares regression line, the isotonic

RELATED ARTICLES

Most Popular

Recent Comments