Caterpillar Plot
A chart that ranks random effects and their confidence intervals along a horizontal axis — sorted vertically by effect size, its pattern resembles a caterpillar with hairy legs stretching outward from a central body.
// 01 — The chart
What it looks like
A caterpillar plot ranking 12 schools by their random effect estimates. Each dot shows the point estimate and the horizontal line shows the 95% confidence interval. The vertical dashed line marks the overall mean.
// 02 — Definition
What is a caterpillar plot?
A caterpillar plot is a visualization that displays random effects from a hierarchical (multilevel) model, sorted by their estimated values. Each unit — a school, hospital, region, or any grouping variable — appears as a dot (point estimate) with a horizontal line extending outward to show the confidence or credible interval. When all units are stacked vertically and sorted from lowest to highest, the resulting shape resembles a caterpillar.
The chart’s central feature is a vertical reference line at zero (or the overall mean), which represents the average effect. Units whose intervals do not cross this line are considered statistically distinguishable from average. Units on the left perform below average, and units on the right perform above average.
Caterpillar plots are especially popular in education research (ranking schools), healthcare (comparing hospital performance), and ecology (comparing site-level effects). They provide a compact, honest way to rank many units while showing the uncertainty around each estimate — preventing the misleading practice of ranking by point estimates alone.
Origin: The caterpillar plot emerged from the Bayesian multilevel modeling literature in the 1990s. Harvey Goldstein and colleagues at the Institute of Education (London) popularized the chart for school league tables, arguing that rankings without uncertainty intervals are irresponsible. The name “caterpillar” was coined informally by statisticians who noticed the visual resemblance of the sorted intervals to a caterpillar’s body and legs.
// 03 — Anatomy
Parts of a caterpillar plot
// 04 — Usage
When to use it — and when not to
- Ranking units (schools, hospitals, regions) from a multilevel model
- You need to show that many units are statistically indistinguishable from average
- Presenting random effects with their uncertainty intervals from a hierarchical model
- Arguing against naive league-table rankings that ignore statistical uncertainty
- Comparing performance of 10–200+ units on a common scale
- Visualizing shrinkage — how random effect estimates are pulled toward the grand mean
- You have fewer than 5 units — too few points to form a meaningful ranking
- Your data comes from a fixed-effects model with no random grouping structure
- You need to show change over time — use a line chart or spaghetti plot
- Your audience expects a simple league table and won't appreciate uncertainty
- The units are not exchangeable (e.g., comparing countries with vastly different contexts)
- You want to emphasize a pooled overall result — use a forest plot instead
// 05 — Reading guide
How to read a caterpillar plot
Follow these steps whenever you encounter a caterpillar plot in a report or publication.
Locate the reference line
Find the vertical dashed line at zero (or the grand mean). This is the overall average across all units. Every unit's random effect is measured as a deviation from this average.
Scan the sorted order
Units are sorted from bottom (highest effect) to top (lowest effect) — or vice versa. This sorting creates the caterpillar shape and makes it easy to identify the best and worst performers at the extremes.
Read the intervals, not just the dots
The horizontal whiskers are the confidence (or credible) intervals. If an interval crosses the reference line, that unit is not statistically distinguishable from the overall average. Don't over-interpret small ranking differences when intervals heavily overlap.
Compare interval widths
Wider intervals mean less certainty, often because the unit has a small sample size (e.g., a small school). Narrow intervals mean more precise estimates. Be wary of ranking a unit with a wide interval above one with a narrow interval.
Look for clusters
Often you'll see a dense cluster of units near the center (statistically average) with only a few outliers at the extremes. This is expected in hierarchical models due to shrinkage — estimates are pulled toward the mean. The truly distinctive performers are those whose intervals don't overlap with the bulk of the others.
// 06 — Pitfalls
Common mistakes
Ranking by point estimates and ignoring intervals
Fix: The whole purpose of a caterpillar plot is to show that ranking by point estimates alone is unreliable. Always display the confidence intervals and interpret overlapping intervals as indicating no meaningful difference in rank.
Using raw means instead of shrunken estimates
Fix: Random effects from a hierarchical model are 'shrunken' toward the grand mean. If you plot raw unit means, extreme values (often from small samples) will be over-represented. Always use the model-based estimates.
Cramming too many units without adequate spacing
Fix: With hundreds of units, labels become unreadable and intervals blur together. Use interactive tooltips, highlight only outlier units, or split into faceted panels for clarity.
Not sorting by effect size
Fix: An unsorted caterpillar plot loses its defining characteristic — the smooth shape that makes patterns visible. Always sort units by their point estimate so the reader can immediately see the ranking.
Omitting the overall-mean reference line
Fix: Without the reference line, readers cannot judge whether a unit is above or below average. This line is essential context for interpreting each random effect as a deviation.
// 07 — In the wild
Real-world examples
School league tables in England
The UK Department for Education and researchers like Harvey Goldstein have used caterpillar plots to show school exam performance with uncertainty intervals. These plots demonstrate that most schools are statistically indistinguishable from average, undermining simplistic league-table rankings that parents and media often rely on.
Hospital mortality comparisons
Healthcare regulators use caterpillar plots to compare risk-adjusted mortality rates across hospitals. The plots show that only a handful of hospitals are statistically 'outliers' — either significantly better or worse than expected — while the vast majority fall within the expected range of variation.
Ecological site-level variation
In ecology, caterpillar plots display site-level random effects from species abundance models. Each monitoring site appears as a dot with an interval, revealing which locations have unusually high or low species counts after accounting for environmental covariates.
// 08 — Quick reference
Key facts
// 09 — Variations
Types of caterpillar plots
The basic caterpillar adapts to different modeling contexts while keeping the sorted-intervals core.
Standard caterpillar plot
The classic format with sorted dots and 95% CI whiskers. Each unit shows a single random effect estimate from a multilevel model.
Nested interval caterpillar
Shows both 50% and 95% intervals as thick and thin lines. Common in Bayesian analyses where credible intervals at multiple levels are informative.
Comparative caterpillar
Overlays two sets of random effects (e.g., before and after an intervention) for the same units, enabling direct comparison of changes.
Color-coded caterpillar
Colors intervals by statistical significance — e.g., red for units significantly different from the mean and muted for non-significant units.
// 10 — FAQs
Frequently asked questions
What is a caterpillar plot?+
A caterpillar plot is a visualization that displays random effects from a hierarchical (multilevel) model, sorted by their estimated values. Each unit — a school, hospital, region, or any grouping variable — appears as a dot (point estimate) with a horizontal line extending outward to show the confidence or credible interval. When all units are stacked vertically and sorted from lowest to highest, the resulting shape resembles a caterpillar.
When should you use a caterpillar plot?+
Use a caterpillar plot when ranking units (schools, hospitals, regions) from a multilevel model. It also works well when you need to show that many units are statistically indistinguishable from average, and when presenting random effects with their uncertainty intervals from a hierarchical model.
When should you avoid a caterpillar plot?+
Avoid a caterpillar plot when you have fewer than 5 units — too few points to form a meaningful ranking. It is also a poor fit when your data comes from a fixed-effects model with no random grouping structure, or when you need to show change over time — use a line chart or spaghetti plot.
Is a caterpillar plot suitable for dashboards?+
Yes — a caterpillar plot can work well in dashboards as long as the panel is large enough for readers to perceive the encoded values, has a clear title, and includes the legend or axis labels needed to interpret it.
What category of chart is a caterpillar plot?+
Caterpillar Plot belongs to the Statistical family of charts. Charts in that family are designed to answer the same kind of question, so they often work as alternatives when one doesn't quite fit your data.
How do you read a caterpillar plot?+
Start with the axis labels and legend, then look at the overall shape before zooming into individual marks. Compare prominent features against the rest of the data, and verify any conclusion against the underlying numbers when precision matters.