Home/Chart Types/Statistical/Caterpillar plot

StatisticalIntermediate

Caterpillar Plot

A chart that ranks random effects and their confidence intervals along a horizontal axis — sorted vertically by effect size, its pattern resembles a caterpillar with hairy legs stretching outward from a central body.

// 01 — The chart

What it looks like

Example — School performance random effects12 schools ranked

A caterpillar plot ranking 12 schools by their random effect estimates. Each dot shows the point estimate and the horizontal line shows the 95% confidence interval. The vertical dashed line marks the overall mean.

// 02 — Definition

What is a caterpillar plot?

A caterpillar plot is a visualization that displays random effects from a hierarchical (multilevel) model, sorted by their estimated values. Each unit — a school, hospital, region, or any grouping variable — appears as a dot (point estimate) with a horizontal line extending outward to show the confidence or credible interval. When all units are stacked vertically and sorted from lowest to highest, the resulting shape resembles a caterpillar.

The chart’s central feature is a vertical reference line at zero (or the overall mean), which represents the average effect. Units whose intervals do not cross this line are considered statistically distinguishable from average. Units on the left perform below average, and units on the right perform above average.

Caterpillar plots are especially popular in education research (ranking schools), healthcare (comparing hospital performance), and ecology (comparing site-level effects). They provide a compact, honest way to rank many units while showing the uncertainty around each estimate — preventing the misleading practice of ranking by point estimates alone.

Origin: The caterpillar plot emerged from the Bayesian multilevel modeling literature in the 1990s. Harvey Goldstein and colleagues at the Institute of Education (London) popularized the chart for school league tables, arguing that rankings without uncertainty intervals are irresponsible. The name “caterpillar” was coined informally by statisticians who noticed the visual resemblance of the sorted intervals to a caterpillar’s body and legs.

// 03 — Anatomy

Parts of a caterpillar plot

A — Unit labels: Each row identifies a unit (school, hospital, region) being compared, sorted by effect size

B — Reference line: The vertical dashed line at zero (or the overall mean) separating below-average from above-average units

C — Point estimate: The dot showing each unit's estimated random effect — its deviation from the grand mean

D — Confidence interval: The horizontal whisker showing the 95% interval; wider intervals mean more uncertainty

E — Sorted order: Units sorted from lowest to highest effect, creating the characteristic caterpillar shape

// 04 — Usage

When to use it — and when not to

&check;Use a caterpillar plot when…

Ranking units (schools, hospitals, regions) from a multilevel model
You need to show that many units are statistically indistinguishable from average
Presenting random effects with their uncertainty intervals from a hierarchical model
Arguing against naive league-table rankings that ignore statistical uncertainty
Comparing performance of 10–200+ units on a common scale
Visualizing shrinkage — how random effect estimates are pulled toward the grand mean

×Avoid a caterpillar plot when…

You have fewer than 5 units — too few points to form a meaningful ranking
Your data comes from a fixed-effects model with no random grouping structure
You need to show change over time — use a line chart or spaghetti plot
Your audience expects a simple league table and won't appreciate uncertainty
The units are not exchangeable (e.g., comparing countries with vastly different contexts)
You want to emphasize a pooled overall result — use a forest plot instead

// 05 — Reading guide

How to read a caterpillar plot

Follow these steps whenever you encounter a caterpillar plot in a report or publication.

Locate the reference line

Find the vertical dashed line at zero (or the grand mean). This is the overall average across all units. Every unit's random effect is measured as a deviation from this average.

Scan the sorted order

Units are sorted from bottom (highest effect) to top (lowest effect) — or vice versa. This sorting creates the caterpillar shape and makes it easy to identify the best and worst performers at the extremes.

Read the intervals, not just the dots

The horizontal whiskers are the confidence (or credible) intervals. If an interval crosses the reference line, that unit is not statistically distinguishable from the overall average. Don't over-interpret small ranking differences when intervals heavily overlap.

Compare interval widths

Wider intervals mean less certainty, often because the unit has a small sample size (e.g., a small school). Narrow intervals mean more precise estimates. Be wary of ranking a unit with a wide interval above one with a narrow interval.

Look for clusters

Often you'll see a dense cluster of units near the center (statistically average) with only a few outliers at the extremes. This is expected in hierarchical models due to shrinkage — estimates are pulled toward the mean. The truly distinctive performers are those whose intervals don't overlap with the bulk of the others.

// 06 — Pitfalls

Common mistakes

Ranking by point estimates and ignoring intervals

Fix: The whole purpose of a caterpillar plot is to show that ranking by point estimates alone is unreliable. Always display the confidence intervals and interpret overlapping intervals as indicating no meaningful difference in rank.

Using raw means instead of shrunken estimates

Fix: Random effects from a hierarchical model are 'shrunken' toward the grand mean. If you plot raw unit means, extreme values (often from small samples) will be over-represented. Always use the model-based estimates.

Cramming too many units without adequate spacing

Fix: With hundreds of units, labels become unreadable and intervals blur together. Use interactive tooltips, highlight only outlier units, or split into faceted panels for clarity.

Not sorting by effect size

Fix: An unsorted caterpillar plot loses its defining characteristic — the smooth shape that makes patterns visible. Always sort units by their point estimate so the reader can immediately see the ranking.

Omitting the overall-mean reference line

Fix: Without the reference line, readers cannot judge whether a unit is above or below average. This line is essential context for interpreting each random effect as a deviation.

// 07 — In the wild

Real-world examples

School league tables in England

The UK Department for Education and researchers like Harvey Goldstein have used caterpillar plots to show school exam performance with uncertainty intervals. These plots demonstrate that most schools are statistically indistinguishable from average, undermining simplistic league-table rankings that parents and media often rely on.

Hospital mortality comparisons

Healthcare regulators use caterpillar plots to compare risk-adjusted mortality rates across hospitals. The plots show that only a handful of hospitals are statistically 'outliers' — either significantly better or worse than expected — while the vast majority fall within the expected range of variation.

Ecological site-level variation

In ecology, caterpillar plots display site-level random effects from species abundance models. Each monitoring site appears as a dot with an interval, revealing which locations have unusually high or low species counts after accounting for environmental covariates.

// 08 — Quick reference

Key facts

Also known asRanked random effects plot, interval plot

Best forRanking units from a hierarchical/multilevel model

Data typesRandom effects with confidence or credible intervals

Key elementsSorted dots, CI whiskers, reference line at grand mean

ScaleLinear (standard deviations or original units)

Classic useSchool performance, hospital comparisons, site effects

Common toolsR (lme4, brms, ggplot2), Stata, MLwiN, Python (statsmodels)

Common mistakesRanking by point estimate alone, unsorted units, missing reference line

// 09 — Variations

Types of caterpillar plots

The basic caterpillar adapts to different modeling contexts while keeping the sorted-intervals core.

Standard caterpillar plot

The classic format with sorted dots and 95% CI whiskers. Each unit shows a single random effect estimate from a multilevel model.

Nested interval caterpillar

Shows both 50% and 95% intervals as thick and thin lines. Common in Bayesian analyses where credible intervals at multiple levels are informative.

Comparative caterpillar

Overlays two sets of random effects (e.g., before and after an intervention) for the same units, enabling direct comparison of changes.

Color-coded caterpillar

Colors intervals by statistical significance — e.g., red for units significantly different from the mean and muted for non-significant units.

// 10 — FAQs

Frequently asked questions

What is a caterpillar plot?+

A caterpillar plot is a visualization that displays random effects from a hierarchical (multilevel) model, sorted by their estimated values. Each unit — a school, hospital, region, or any grouping variable — appears as a dot (point estimate) with a horizontal line extending outward to show the confidence or credible interval. When all units are stacked vertically and sorted from lowest to highest, the resulting shape resembles a caterpillar.

When should you use a caterpillar plot?+

Use a caterpillar plot when ranking units (schools, hospitals, regions) from a multilevel model. It also works well when you need to show that many units are statistically indistinguishable from average, and when presenting random effects with their uncertainty intervals from a hierarchical model.

When should you avoid a caterpillar plot?+

Avoid a caterpillar plot when you have fewer than 5 units — too few points to form a meaningful ranking. It is also a poor fit when your data comes from a fixed-effects model with no random grouping structure, or when you need to show change over time — use a line chart or spaghetti plot.

Is a caterpillar plot suitable for dashboards?+

Yes — a caterpillar plot can work well in dashboards as long as the panel is large enough for readers to perceive the encoded values, has a clear title, and includes the legend or axis labels needed to interpret it.

What category of chart is a caterpillar plot?+

Caterpillar Plot belongs to the Statistical family of charts. Charts in that family are designed to answer the same kind of question, so they often work as alternatives when one doesn't quite fit your data.

How do you read a caterpillar plot?+

Start with the axis labels and legend, then look at the overall shape before zooming into individual marks. Compare prominent features against the rest of the data, and verify any conclusion against the underlying numbers when precision matters.

← Previous: L’Abbé Plot

1 of 80+ chart types

Next: Dot-and-Whisker Plot →