When the bootstrap doesn’t work

16 June 2025

0

The bootstrap always works, except sometimes.

By ‘works’ here, I mean in the weakest senses that the large-sample bootstrap variance correctly estimates the variance of the statistic, or that the large-scale percentile bootstrap intervals have their nominal coverage. I don’t mean the stronger sense that someone like Peter Hall might use, that the bootstrap gives higher-order accurate confidence intervals. So the bootstrap ‘works’ for the median, even though not as well as for smooth functions of the mean.

Here are the reasons I know of why the bootstrap might fail

0. Correlation. The one that everyone knows about nowadays. If your data have structure, such as a time series, a spatial map, a carefully-structured experimental design, a multistage survey, a network, then you can’t hope to get the right distribution by resampling in a way that doesn’t respect that structure.

1. Constraints: Suppose $X_{n} \sim N (θ, 1)$ and we know $θ \geq 0$ . The maximum likelihood estimator of $θ$ is $^θ = max (¯ X, 0)$ . If $θ > 0$ there isn’t a problem asymptotically (or at a more sophisticated analysis, if $θ ≫ 1 / \sqrt{n}$ there isn’t). But if $θ = 0$ the sampling distribution of $^θ$ is a 50:50 mixture of a spike at zero and the positive half of a $N (0, n^{- 1})$ distribution. The bootstrap distribution is also a mixture of a spike at zero and and a half-normal, but the mass on the spike does not converge to 0.5 (or to anything else) as the sample size increases. The problem is that the height of the spike is $Φ (¯ X \sqrt{n})$ , so the height converges in distribution to $U (0, 1)$ .

2. Extrema. Consider $X \sim U (θ, 1)$ . The bootstrap replicates $θ^{*}$ have a distribution that puts mass $0.632 = 1 - e^{- 1}$ on the smallest observation, $e^{- 1} (1 - e^{- 1}) \approx 0.233$ on the second smallest, and so on geometrically. We always have $θ^{*} \geq^θ$ , and the bootstrap distribution stays very discrete as the sample size increases.

3. Lack of smoothness (cube-root asymptotics) Tukey’s shorth, the mean of the shortest half of the data, converges to the mean at $n^{- 1 / 3}$ rate instead of the usual $n^{- ½}$ . The same is true for the least-median-of-squares regression line, the isotonic

When the bootstrap doesn’t work

Adding Persistent Memory to Claude Code with the Lightweight memsearch Plugin

GLM-5 vs. MiniMax M2.5 vs. Gemini 3 Deep Think: Which Model Fits Your AI Agent Stack?

We Extracted OpenClaw’s Memory System and Open-Sourced It (memsearch)

LEAVE A REPLY Cancel reply

Most Popular

I love my Pixel, but I’d trade it for this in a heartbeat

I stopped fighting Google Sheets after Gemini made formulas feel optional

We need to talk about this.

This Galaxy S26 leak highlights a trend that makes me want to skip it

EDITOR PICKS

I love my Pixel, but I’d trade it for this in a heartbeat

I stopped fighting Google Sheets after Gemini made formulas feel optional

We need to talk about this.

POPULAR POSTS

I love my Pixel, but I’d trade it for this in a heartbeat

I stopped fighting Google Sheets after Gemini made formulas feel optional

We need to talk about this.

POPULAR CATEGORY

ABOUT US

FOLLOW US