Why Do Tree Ensembles Work?

16 June 2025

0

Ensembles of decision trees (e.g., the random forest and AdaBoost algorithms) are powerful and well-known methods of classification and regression. We will survey work aimed at understanding the statistical properties of decision tree ensembles, with the goal of explaining why they work.

An elementary probabilistic motivation for ensemble methods comes from Hoeffding’s inequality, which describes the likelihood that a sum of independent random variables deviates by a specified amount from its expectation. The inequality implies that a sufficiently large collection of better-than-random classifiers, whose individual predictions are aggregated e.g. by majority rule, can be expected to have very high accuracy, provided their errors are independent. It is then natural to ask how one can encourage classifiers to make independent errors.

The random forest approach is to use randomness to encourage independence: by training trees on different subsets of the training set, and by injecting some randomness into the training procedure itself, one can construct a collection of decision trees whose majority vote provides far greater accuracy than any single decision tree could.

In the boosting approach, by contrast, one constructs a sequence of (usually short) trees on weighted forms of the training set, and misclassified training examples have their weights increased. Thus the next classifier is encouraged to focus on those training examples which have so far proven difficult to classify correctly. Weighted voting is used to aggregate the predictions.

In terms of the bias-variance decomposition of classifier error, the random forest apparently succeeds by averaging a collection of low-bias, high-variance classifiers (reducing the variance without disturbing the bias too much), whereas boosting – or at least some initial versions – forms a weighted combination of high-bias, low-variance classifiers (decision stumps or similar).

My talk at ODSC East 2019, “Why Do Tree Ensembles Work?” will attempt to shed some light on mysteries that are not quite explained by the preceding paragraphs. Some are as follows. Why can it happen that a boosted ensemble’s generalization performance can continue to improve even after the training error is already zero, and is it possible to overfit by performing too many boosting iterations? If it helps to grow the individual trees to large depth for either classifier type, what are the implications? Very broadly, why do these methods work so well?

To this end, we will explore several more advanced topics around interpretation and explanation of tree ensembles, such as:

the role of the margin: boosting can be seen as a margin equalizer, but the margin view does not perfectly align with observed generalization error;
a kernel interpretation of random forest: this provides a geometric interpretation of different regimes in the parameter space and the tradeoff between correlation (among the trees) and strength (of the individual trees);
a statistical characterization of the boosting procedure; and
interpolation (the ability to fit the training set perfectly): the idea that a well-chosen average of interpolating models can be resistant to overfitting; this may serve as a possible explanation for the success of both of these methods.

Learn more at ODSC East 2019 this April 30 to May 3 in my talk, “Why Do Tree Ensembles Work?“

Why Do Tree Ensembles Work?

Smarter Retrieval for RAG: Late Chunking with Jina Embeddings v2 and Milvus

From Word2Vec to LLM2Vec: How to Choose the Right Embedding Model for RAG

How to Debug Slow Search Requests in Milvus

LEAVE A REPLY Cancel reply

Most Popular

Android 16 QPR2 Beta 3 lands with a flurry of bug fixes

Google is working on dedicated ‘Bills’ and ‘Travel’ folders for Gmail

Mint Mobile’s big bet on 5G home internet might change everything

Interviewed With Kyle Smith – Founder and CEO of Escalated by Shauli Zacks

EDITOR PICKS

Android 16 QPR2 Beta 3 lands with a flurry of bug fixes

Google is working on dedicated ‘Bills’ and ‘Travel’ folders for Gmail

Mint Mobile’s big bet on 5G home internet might change everything

POPULAR POSTS

Android 16 QPR2 Beta 3 lands with a flurry of bug fixes

Google is working on dedicated ‘Bills’ and ‘Travel’ folders for Gmail

Mint Mobile’s big bet on 5G home internet might change everything

POPULAR CATEGORY

ABOUT US

FOLLOW US