Box Plot
A compact chart that summarizes a dataset’s distribution using five key numbers — minimum, first quartile, median, third quartile, and maximum — while flagging outliers. The statistician’s workhorse for comparing groups since John Tukey introduced it in the 1970s.
// 01 — The chart
What it looks like
Three box plots comparing salary distributions across departments. The boxes show the middle 50% of salaries, the whiskers extend to 1.5 × IQR, and circles mark outliers.
// 02 — Definition
What is a box plot?
A box plot (also called a box-and-whisker plot) is a standardised way of displaying the distribution of a numeric variable based on the five-number summary: minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. It packs the centre, the spread, the skewness, and the outliers of a dataset into a single compact rectangle with two whiskers and a few dots, so a reader can compare many groups in the same vertical space that a single histogram would consume.
The box spans Q1 to Q3 — the interquartile range, or IQR — and contains the middle 50% of the data. A line inside the box marks the median, the value that divides the data into two equal halves. The whiskers reach from the box to the most extreme observations within 1.5 × IQR of the box edge under the standard Tukey convention. Any point that falls beyond the whiskers is drawn individually and counted as an outlier. Different tools use different whisker rules — min/max, 5th/95th percentiles, or 3 × IQR — so the convention should always travel with the chart.
Box plots excel at one specific job: comparing distributions across groups. Lining up five or ten boxes side by side lets the eye instantly read which group has the highest median, which is the most variable, which is the most skewed, and which has the most outliers — in roughly the space a single bar chart would use. The price of that compactness is that the chart hides the precise shape of each distribution, which is why box plots fail loudly on bimodal data and why modern variants like the violin plot, beeswarm, and letter-value plot exist.
The chart is at its best in the hands of a technical audience who already think in quartiles. For executives, journalists, and the general public, a bar chart with error bars or a violin plot with direct labels usually communicates more clearly. The rest of this guide is about how to live in the box plot’s sweet spot — comparing many groups in compact space — and how to recognise when you have walked outside it.
Origin: The box plot was introduced by American statistician John W. Tukey, first as the “schematic plot” in a 1970 technical report and then more widely in his 1977 book Exploratory Data Analysis. Tukey wanted a quick, sketch-friendly chart that summarized distributions without computers; the design was deliberately simple enough to draw by hand on graph paper, which is why the box, whiskers, and outlier dots are still its only required parts.
// 03 — When to use
When a box plot is the right call
Reach for a box plot whenever the question is about comparing the shape of several distributions and you have enough data per group for the quartiles to be stable. Below are the situations where it consistently wins against the alternatives.
- Comparing distributions across 3 or more groups in compact space (departments, regions, treatment arms)
- You need a fast read on center, spread, skew, and outliers in a single chart
- Detecting and highlighting outliers in a numeric variable
- Checking whether distributions are symmetric, left-skewed, or right-skewed
- You have a technical audience comfortable with median, quartiles, and IQR
- Comparing before/after distributions for the same groups with a paired layout
- Each group has at least ~20 observations so quartiles are stable
// 04 — When not to use
When a box plot is the wrong call
A box plot can technically display almost any distribution, but technically possible is not the same as good idea. Below are the cases where the box plot actively hides information you need to communicate.
- Your audience is non-technical — most readers don’t intuit quartiles
- You suspect a bimodal distribution — box plots collapse two peaks into one
- You need to see the detailed shape — use a histogram or violin plot
- Each group has fewer than ~20 observations — use a strip plot or beeswarm
- You need precise individual values — box plots show summaries, not exact numbers
- You only have one or two groups — a histogram or density plot tells more
- You have many narrow groups (>15) — boxes get too thin to compare
- Your data is non-numeric — box plots require a continuous value
// 05 — Data requirements
What your data needs to look like
Before building the chart, your dataset needs to fit a specific shape. Use this checklist to confirm yours does.
Shape
One row per observation, with a group label column and a numeric value column. Long-form data: do not pre-aggregate to means or medians.
Minimum rows
~20 observations per group for stable quartiles. Below 10 per group, prefer a strip plot or beeswarm.
Maximum rows
No upper bound, but past ~10 groups consider small multiples; past 100,000 rows per group consider a letter-value plot.
A label that defines which box each observation belongs to — a department, region, treatment arm, or time bucket. The chart draws one box per distinct group value.
The numeric measurement for the observation — salary, response time, lab result, sensor reading. All values across all groups must share the same unit so quartiles are comparable.
Optional sampling weight used when each row represents a different number of underlying observations. Most tools accept a weights argument that adjusts the quartile calculation accordingly.
Optional flag that draws one or two boxes in the accent color so the eye lands on them first. Useful for callouts but never required.
| department | salary | highlight |
|---|---|---|
| Engineering | 120,000 | false |
| Engineering | 155,000 | false |
| Sales | 82,000 | false |
| Sales | 115,000 | false |
| Design | 78,000 | true |
| Design | 175,000 | true |
Tip: if your raw data is already aggregated to one mean per group, you cannot draw a box plot — you need the underlying observations. Reach back to the source table or switch to a bar-with-error-bars chart that shows mean and standard deviation. Tools like SQL UNNEST or pandas .melt() are how you get from wide aggregates back to long-form rows.
// 06 — Anatomy
Parts of a box plot
Every box plot is built from the same six parts. Knowing the names makes it easier to talk about which whisker rule you are using, which marks to keep, and which decorations you can drop without losing information.
// 07 — Step-by-step
Step-by-step: how to build a good box plot
A ten-step recipe that works regardless of the tool. Walk through it the first few times and the moves become automatic; skip steps and the chart usually shows it.
- 1
Pick the question you want the chart to answer
A box plot answers “how do these distributions compare?” — their typical value, their spread, their skew, and their outliers. Write that question down before you draw anything. If your question is about exact totals, change over time, or part-to-whole ratios, switch chart types now. - 2
Aggregate to one row per observation
Box plots plot raw observations, not pre-aggregated means. Reshape your data so every row is one observation with two columns: a group label and a numeric value. If your data is already summarised, you may need to reach back to the underlying samples. - 3
Pick or order the groups
Order the groups by median (or by mean) so the boxes read as a ranking, unless the categories have an inherent order such as months or treatment dose. Drop or merge groups with fewer than ~10 observations into an “Other” bucket so noisy quartiles do not dominate. - 4
Choose the orientation
Use vertical boxes for short labels and time-like sequences. Switch to horizontal boxes when group names are long or you have more than seven groups so labels read left-to-right rather than at a 45° rotation. - 5
Pick the whisker convention and lock it
Decide whether whiskers extend to 1.5 × IQR (Tukey, the default), to the 5th/95th percentiles, or to the data minimum and maximum, and state the rule in the caption. Different tools default differently, and unannotated whiskers are the single biggest source of misinterpretation. - 6
Draw outliers as individual dots
Plot points beyond the whiskers as individual marks so the reader can see their density and direction. Resist the temptation to clip them — outliers are often the most interesting story in the chart and the easiest to spot at a glance. - 7
Overlay the raw points when sample size is small
When any group has fewer than ~30 observations, jitter the underlying points or switch to a beeswarm beneath the box. Plain boxes on small samples hide bimodality and make wildly different datasets look identical. - 8
Choose color with intent
Default to a single neutral box outline and a single accent for the median line. Reserve fill color for grouping or for highlighting the one box your headline is about — multi-color rainbows make the chart harder to read, not easier. - 9
Add a takeaway title and the unit
“Salary by department” is a label. “Design salaries vary 4× wider than Sales” is a takeaway. Lead with the takeaway, put the descriptive label as a subtitle, and put the unit on the value axis (“Annual salary, USD”). - 10
Annotate, then ship
Mark the median and any notable outlier with a short caption, state which whisker rule you used, and verify the chart still works at the size readers will see it. Add a screen-reader-friendly table of the five-number summary per group beneath the chart.
// 08 — Real-world examples
Where you’ll see box plots used
Box plots show up in four places more than anywhere else: clinical research, manufacturing quality control, education analytics, and software performance work. Each context has its own conventions, and they all reward the same fundamentals.
Medical research: Drug trial response times
Researchers compare patient response distributions across treatment groups and a placebo. Side-by-side boxes instantly show whether the treatment group has a lower median response time and tighter spread than control. Outlier dots flag patients who responded much better or worse than the rest, prompting follow-up case-by-case investigation.
Clinical ResearchManufacturing: Defect rates by production line
Quality teams compare daily defect distributions across production lines. A line with a high median or many outliers signals a process that needs attention. The compact format lets a plant manager compare a dozen lines on one chart, sorted by median so the worst-performing line sits at the top.
ManufacturingEducation: Test scores across schools
School administrators compare standardized test distributions across schools or districts. Box plots reveal not just which school has the highest median, but which has the tightest box (most consistent results) and which has the widest achievement gap, making them a much fairer chart than a bar of means.
EducationSoftware performance: Latency across deployments
Site reliability engineers compare per-request latency distributions across deployment versions. The box shows the typical experience, the upper whisker shows the bad-day case, and the outlier dots are the user-facing incidents. A new deploy that shifts the entire box up is much more obvious than a single mean-latency number.
Engineering// 09 — Variations
Types of box plots
The classic Tukey box plot has several modern variants, each addressing a specific weakness of the original. The headline rule is the same: pick the variant whose strengths match your question and your sample size.
Violin plot
Wraps the box in a mirrored kernel-density curve so the full shape of the distribution is visible. Reveals bimodality and tail behaviour the plain box hides.
Beeswarm plot
Plots every observation packed without overlap. Shows the actual distribution shape and sample size, useful when groups are small.
Notched box plot
A notch around the median shows a 95% confidence interval. If two notches don’t overlap, the medians are roughly significantly different.
Letter-value (boxen) plot
Nested boxes show extra quantiles beyond Q1/Q3. Designed for very large samples where Tukey whiskers misclassify too many points as outliers.
// 10 — Comparisons
Box plot vs other distribution charts
Box plots are easy to confuse with several neighbouring chart types because they all show the spread of a numeric variable. The differences matter — picking the wrong one changes what your reader is allowed to conclude.
Box plot vs violin plot
Both compare distributions across groups. A box plot summarises with five numbers; a violin plot wraps that summary in a kernel-density curve that reveals the full shape of the distribution — including bimodality that the box hides.
Box plot
Shows min, Q1, median, Q3, max, and outliers. Compact, fast to read, easy to draw on a whiteboard. Hides the shape of the distribution.
- Best for >5 groups in compact space
- Works well in print and grayscale
- Hides multi-modal data
Violin plot
Shows the full kernel-density curve mirrored around a vertical axis, often with a slim box plot inside. Reveals shape, peaks, and gaps that the box plot collapses.
- Best for 2–6 groups with enough data per group
- Reveals bimodality and tail shape
- Needs more vertical space per group
Box plot vs strip / beeswarm plot
A box plot summarises; a strip or beeswarm plot shows every individual observation. Strip and beeswarm plots win when sample size is small, when you need to see the actual data, or when readers are non-technical.
Box plot
Five-number summary plus outliers. Identical-looking boxes can hide very different underlying datasets, especially with fewer than ~30 observations per group.
- Compact summary across many groups
- Stable with large samples (n > 100)
- Hides sample size unless you label it
Strip / beeswarm plot
Plots every observation along the value axis, jittered (strip) or packed without overlap (beeswarm). Shows sample size, density, and shape directly, but gets crowded past ~200 points.
- Best for <100 observations per group
- Shows true sample size at a glance
- Less compact in dashboard tiles
Box plot vs histogram
A histogram shows the full shape of one distribution; a box plot summarises many distributions side by side. Use a histogram when shape matters; switch to a box plot when you need to compare four or more groups in the same space.
Box plot
One compact summary per group, easy to lay out side by side. Loses the shape of each distribution but keeps the spread, skew, and outliers visible.
- Compares many groups in one chart
- Hides multiple peaks and tail shape
- Compact, dashboard-friendly
Histogram
One bar per bin shows the precise shape of a single distribution. Best for one variable at a time; comparing groups requires overlay or small multiples.
- Best for one or two distributions
- Reveals shape, peaks, and gaps
- Bin width affects perceived shape
Box plot vs bar with error bars
Bars with error bars show one mean and one spread per group; box plots show five numbers plus outliers. Error-bar bars are friendlier for non-technical audiences but they collapse skew, outliers, and the median into a single mean point.
Box plot
Encodes median, quartiles, whiskers, and outliers. Robust to skew because it uses quantiles rather than the mean and standard deviation.
- Robust to skewed data and outliers
- Shows asymmetry directly
- Requires technical literacy
Bar with error bars
A bar shows the mean and a thin whisker shows ± 1 standard deviation, standard error, or 95% CI. Easier to read for general audiences but distorted by skew.
- Friendly for non-technical readers
- Distorts skewed data
- Always state which spread the bars show
// 11 — Common mistakes
Mistakes to watch out for
Almost every broken box plot in the wild fails the same handful of ways. If you only memorize a few rules, make them these.
Hiding bimodal distributions
A box plot of two overlapping subgroups (say, male and female heights pooled together) shows a single symmetric box with no hint that the data has two peaks. The five-number summary literally cannot represent multi-modal distributions. Always preview the data with a histogram or violin first; switch chart type if the histogram has more than one mode.
Using box plots for non-technical audiences
Most people don’t intuitively understand quartiles, the IQR, or what the box edges mean. If your audience includes executives, journalists, or the general public, consider simpler alternatives such as a bar chart with error bars, a dot plot of medians, or a violin plot with explicit labels for the median and tails.
Ignoring sample-size differences
Two boxes can look identical even if one represents 20 observations and the other 20,000. Print the sample size beneath each group, use varwidth = TRUE in ggplot2 so box width tracks √n, or overlay raw points as a strip plot so the reader can see the underlying counts at a glance.
Over-interpreting overlapping IQRs
If the boxes of two groups overlap substantially, the groups are usually not meaningfully different no matter how different their medians look. Use notched boxes to make the median confidence interval explicit, or run a formal test before claiming two distributions differ.
Not labelling the whisker convention
Different tools use different whisker rules: 1.5 × IQR (Tukey, the default in ggplot2 and Matplotlib), min/max (the default in some Excel versions), or the 5th and 95th percentiles (common in journalism). Without a caption, two readers can look at the same chart and disagree about which points are outliers.
Hiding outliers to compress the axis
Turning off outlier marks (showfliers=False, outlier.shape=NA) makes the boxes look bigger but erases the most newsworthy story in the chart. Show outliers by default; only hide them with a clearly labelled “outliers removed” annotation.
Plotting box plots from pre-aggregated means
If your data is already collapsed to one mean per group, you cannot draw a box plot — you need the underlying observations. Reach back to the raw rows or switch to a bar-with-error-bars chart that shows mean and standard deviation explicitly.
// 12 — Accessibility
Accessibility checklist
Run through this list before publishing. The chart should still communicate its message to readers using assistive technology, color-blind users, keyboard navigation, and reduced-motion settings.
- ✓
Provide a text alternative listing the five-number summary
WCAG 1.1.1Add an accessible name (alt text or aria-label) and a hidden table that lists, for each group, the minimum, Q1, median, Q3, maximum, and outlier count. This is the box plot’s text equivalent — not “a box plot of salaries.” - ✓
Write an accessible name that describes skew and outliers
WCAG 1.1.1The summary should call out direction: “Design has the widest spread (IQR $40k–$120k) and two high outliers above $200k. Sales is symmetric and tightly clustered around $90k.” Skew, spread, and outliers are the three things a sighted reader sees first. - ✓
Color contrast for whiskers and the median line
WCAG 1.4.3Whisker strokes and the median line must reach at least 3:1 contrast against the chart background and the box fill. The median line is the single most important mark in the chart — do not let pale gray-on-cream wash it out. - ✓
Color contrast for box fill and outline
WCAG 1.4.3Use a clear outline around each box (3:1 against the background) so the box edges remain visible when the fill is light or when readers print the chart in grayscale. - ✓
Do not rely on color alone to encode group
WCAG 1.4.1If color encodes a second variable (treatment vs control, before vs after), reinforce it with a hatch pattern, a different outline weight, or a direct text label so colorblind readers can still tell the boxes apart. - ✓
Expose the underlying data in a screen-reader-friendly table
WCAG 1.3.1Place a real HTML table beneath the chart (or behind a “View data” toggle) listing every group’s five-number summary, sample size, and outlier count. Many readers will copy this rather than re-key it from the visual. - ✓
State the whisker rule in the caption
WCAG 3.3.2Add a one-line caption such as “Whiskers extend to 1.5 × IQR; points beyond are outliers (Tukey).” Without this, two readers can look at the same chart and reach different conclusions about which points count as outliers. - ✓
Keyboard-accessible focus on each box
WCAG 2.1.1If the chart is interactive, every box and every outlier dot should be reachable with the Tab key, with a visible focus ring and a tooltip that announces the group, the five-number summary, and the sample size on focus, not only on hover. - ✓
Respect prefers-reduced-motion
WCAG 2.3.3If boxes animate in on load, gate the animation behind a prefers-reduced-motion: no-preference media query so motion-sensitive readers see the final state immediately. - ✓
Make the chart resizable and zoomable
WCAG 1.4.4Use a responsive viewBox so the chart stays legible at 200% browser zoom and on narrow mobile columns. Avoid baking the SVG to a fixed pixel width that crops on small screens.
// 13 — Best practices
Design and craft tips
The mistakes section above tells you what to avoid. The list below is the positive version: the small set of habits that separate a good box plot from a passable one.
Order boxes by median
Hide outliers
Overlay individual points when sample size is small
Use box plots without stating the whisker rule
Use a single neutral fill
Use a box plot when you suspect bimodality
State the unit and the whisker rule
Crowd more than ~10 boxes onto one chart
// 15 — Tool instructions
How to build it in your tool of choice
The box plot ships in every modern statistical tool, with subtle but important differences in defaults. The recipes below get you to a clean, sorted, point-overlay box plot in each of the most common platforms.
Microsoft Excel
Spreadsheet — ~4 min- 01Lay each group's numeric values out in its own column with a clear header row that names the group.
- 02Select the entire data range, including the headers, and choose Insert → Insert Statistic Chart → Box and Whisker.
- 03Right-click any box, choose Format Data Series, and tick Show outlier points and Show inner points so individual values are visible.
- 04In the same panel, choose Quartile calculation → Exclusive median for Tukey-style boxes (Excel’s default Inclusive method matches the QUARTILE.INC function).
- 05Sort the source columns by median (or any chosen statistic) so the resulting boxes read as a ranking.
- 06Replace the default chart title with a takeaway sentence and add a caption stating the whisker rule and the sample size per group.
Tip: Excel’s built-in Box and Whisker chart was added in Excel 2016. In older versions you have to fake one with a stacked column chart and error bars — upgrade if you can.
Google Sheets
Spreadsheet — ~5 min- 01Sheets does not ship a native box plot, so build one as a Candlestick chart: arrange columns in the order group, min, Q1, Q3, max.
- 02Use QUARTILE(range, 0–4) formulas to compute the five numbers per group from the underlying data.
- 03Select the assembled summary table and choose Insert → Chart, then change Chart type to Candlestick.
- 04Map the columns: low → Q1, open → min, close → max, high → Q3 (the candlestick body becomes the box and the wicks become the whiskers).
- 05Use a separate Series of dots, plotted on the same axis, to mark outliers identified with the 1.5 × IQR rule.
- 06Add chart title, axis label with units, and a footnote stating the whisker convention.
Tip: if your team owns Looker Studio, switch there — it has a true box plot chart and reads from Sheets directly without the candlestick hack.
Python (Matplotlib / seaborn)
Code — ~5 min- 01Install matplotlib and seaborn with pip install matplotlib seaborn pandas if they are not already in your environment.
- 02Reshape your data into long form: a DataFrame with one column for the group label and one for the numeric value.
- 03Order the group factor by median with a sort_values + Categorical step so seaborn draws the boxes as a ranking.
- 04Call seaborn.boxplot(x='group', y='value', data=df, showfliers=True) for a quick chart, or plt.boxplot(values_per_group, labels=labels) when you want full control.
- 05Overlay raw points with seaborn.stripplot(..., color='black', alpha=0.4, size=3) to show sample size and density.
- 06Set the y-axis label to include the unit, set a takeaway title with ax.set_title(loc='left'), and add ax.spines[['top','right']].set_visible(False) for a clean look.
Tip: seaborn 0.13+ adds .boxenplot() (the letter-value plot) which is a strict upgrade for very large samples — it shows extra quantiles without much extra ink.
R (ggplot2)
Code — ~4 min- 01Install ggplot2 with install.packages('ggplot2') and load it with library(ggplot2).
- 02Build a long-form data frame with two columns: a factor for the group and a numeric column for the value.
- 03Reorder the group factor with reorder(group, value, FUN = median) so the boxes read as a ranking.
- 04Call ggplot(df, aes(x = group, y = value)) + geom_boxplot(notch = TRUE, varwidth = TRUE).
- 05Add geom_jitter(width = 0.15, alpha = 0.4, size = 1) when group sizes are small so individual points are visible.
- 06Polish with labs() for the title, subtitle, and unit-tagged y-axis label, then theme_minimal() for a clean default look.
Tip: notch = TRUE adds median confidence intervals; varwidth = TRUE makes box width proportional to √n, so readers can see sample-size differences at a glance.
JavaScript (D3.js)
Code — ~12 min- 01Install D3 with npm i d3 or include the CDN script tag in your HTML.
- 02Group your data by the category field with d3.group() and compute Q1, median, Q3, IQR, and the whisker bounds with d3.quantile().
- 03Identify outliers as any point outside [Q1 − 1.5 × IQR, Q3 + 1.5 × IQR]; keep them in a separate array.
- 04Build a band scale for the category axis and a linear scale for the value axis, with both drawn via d3.axisBottom and d3.axisLeft.
- 05For each group, draw the IQR box as a <rect>, the median as a horizontal <line>, the whiskers as two <line> elements, and the outliers as <circle> elements.
- 06Add tooltips on focus that announce the five-number summary and the sample size; gate any animation behind prefers-reduced-motion.
Tip: Observable Plot ships a one-liner Plot.boxX() / Plot.boxY() if you want a working box plot in five lines of code and don’t need full SVG control.
Tableau
BI — ~5 min- 01Drop the categorical dimension on Columns and the numeric measure on Rows; Tableau will default to a SUM aggregation that you must change.
- 02On the value pill, choose Dimension so individual rows are plotted instead of the aggregated total.
- 03Right-click the value axis and choose Add Reference Line → Box Plot. Tableau adds whiskers, quartiles, and outliers in one step.
- 04Open Edit Reference Line and choose the Whiskers extent (Tukey 1.5 × IQR is the default) and the Plot Options → hide underlying marks toggle.
- 05Drag the underlying mark layer back to alpha 30% so the dots are visible behind the box — this gives you a strip-and-box hybrid.
- 06Use Format → Reference Line to set the box outline and median line to a high-contrast accent color.
Tip: Tableau’s box plot is implemented as a reference line on top of marks, not a standalone chart type — that is why you need raw rows, not aggregates.
Power BI
BI — ~6 min- 01Power BI does not ship a native box plot, so import one from AppSource: search for “Box and Whisker chart” (the MAQ Software or Microsoft custom visual works).
- 02Add the imported visual to the report, then drag your group field to the Category bucket and your numeric field to the Sampling bucket.
- 03In the Format pane, switch Quartile calculation between Inclusive (Excel default) and Exclusive (Tukey) and pick the convention you will state in the caption.
- 04Toggle Show outliers on, set outlier color to a contrasting accent, and tick Show data points to overlay individual observations.
- 05Set the Y-axis title to include the unit and write a takeaway-style chart title in the visual header.
- 06If a custom visual is not allowed in your tenant, fall back to the built-in error-bar visual on a clustered column chart and label the spread clearly.
Tip: keep an eye on AppSource for newer free box-plot visuals — the Microsoft default still misses notched boxes and varwidth scaling that ggplot2 has had for years.
// 16 — Code examples
Working code in the most common stacks
Three runnable snippets that produce visually equivalent charts — a sorted, neutral-fill box plot of three departments’ salaries with the median in the brand accent and individual points overlaid. Copy, paste, and replace the data with yours.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
# Three departments, salaries in thousands of USD.
data = pd.DataFrame({
"department": ["Engineering"] * 8 + ["Sales"] * 8 + ["Design"] * 8,
"salary": [ 95, 110, 120, 135, 140, 155, 170, 200,
72, 80, 82, 88, 90, 95, 100, 115,
58, 70, 78, 92, 105, 118, 140, 175],
})
# Order departments by median salary so boxes read as a ranking.
order = (
data.groupby("department")["salary"].median()
.sort_values(ascending=False).index.tolist()
)
fig, ax = plt.subplots(figsize=(8, 4.5))
sns.boxplot(data=data, x="department", y="salary",
order=order, color="#f5ede9",
medianprops={"color": "#c94a2e", "linewidth": 2.5},
flierprops={"markerfacecolor": "none",
"markeredgecolor": "#c94a2e",
"markersize": 7},
ax=ax)
sns.stripplot(data=data, x="department", y="salary", order=order,
color="#1a1a18", alpha=0.5, size=3, jitter=0.18, ax=ax)
ax.set_title("Salary spread by department — 2025", loc="left", fontsize=14)
ax.set_ylabel("Annual salary (USD, thousands)")
ax.set_xlabel("")
ax.spines[["top", "right"]].set_visible(False)
ax.figure.text(0.01, -0.02,
"Whiskers: 1.5 × IQR (Tukey). Dots overlay individual observations.",
fontsize=9, color="#6b6b67")
plt.tight_layout()
plt.savefig("box_plot.png", dpi=200, bbox_inches="tight")
plt.show()
// 17 — FAQs
Frequently asked questions
What is a box plot?+
A box plot (also called a box-and-whisker plot) is a standardized way of displaying the distribution of data based on the five-number summary: minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. It provides a compact visual snapshot of a dataset's center, spread, skewness, and outliers, and is the statistician's default chart for comparing distributions across groups.
When should you use a box plot?+
Use a box plot when comparing distributions across three or more groups in limited space. They work well for spotting outliers, checking skewness, and confirming whether group medians and spreads differ. Box plots reward technical audiences who already understand quartiles and the interquartile range.
When should you avoid a box plot?+
Avoid box plots when your audience is non-technical, when you suspect a multi-modal (bimodal) distribution that the five-number summary will hide, when each group has fewer than ~20 observations, or when you need exact values rather than summary statistics. In those cases prefer a histogram, violin plot, strip plot, or beeswarm.
What is the difference between a box plot and a violin plot?+
Both compare distributions across groups, but a box plot summarizes them with five numbers (min, Q1, median, Q3, max) while a violin plot wraps the box in a mirrored kernel-density curve that shows the full shape of the distribution. Violins reveal multi-modal data that box plots flatten; box plots are simpler to read and faster to draw.
What is the difference between a box plot and a histogram?+
A histogram shows the distribution of one variable across many bins, exposing the precise shape, peaks, and gaps. A box plot collapses that shape into five numbers and a few outlier dots. Use a histogram when shape matters; use a box plot when comparing many groups side by side and you have only enough space for compact summaries.
What is the interquartile range (IQR) on a box plot?+
The interquartile range is the distance between the first quartile (Q1, 25th percentile) and the third quartile (Q3, 75th percentile) — the height (or width) of the box itself. It captures the middle 50% of the data and is robust to outliers, which is why Tukey used it instead of the standard deviation.
How are outliers defined on a box plot?+
By Tukey's convention, an outlier is any point that falls more than 1.5 × IQR below Q1 or above Q3. Whiskers extend to the most extreme non-outlier value. Some tools and journals use 3 × IQR for 'extreme' outliers, or the 5th/95th percentiles instead of Tukey fences — always state which rule you are using.
What category of chart is a box plot?+
Box Plot belongs to the Distribution family of charts, alongside histograms, density plots, violin plots, strip plots, and beeswarm plots. Charts in that family answer the same kind of question — 'what does my data look like?' — so they often work as alternatives when one doesn't fit your audience or your data shape.
How many observations do you need per group for a box plot?+
Aim for at least 20 observations per group so the quartiles are stable. With fewer points, Q1 and Q3 become noisy and the box can shift dramatically with one or two extra rows. For very small samples (n < 10), prefer a strip plot or dot plot that shows every individual point.
What is a notched box plot?+
A notched box plot adds a narrow notch around the median that represents a 95% confidence interval for the median. If the notches of two boxes do not overlap, that is informal evidence the medians differ at roughly the 0.05 level. The notches can extend past the box edges in small samples, which is why some tools clip them.
Should a box plot show individual points?+
Yes — when space allows, overlay the individual points (jittered or in a beeswarm) on top of the box. The combination shows both the summary and the actual sample size, which addresses the biggest weakness of a plain box plot: identical-looking boxes that hide very different underlying datasets.
Are box plots good for dashboards?+
Box plots can work in dashboards when the panel is large enough to show the box, whiskers, and outliers without overlap, when group counts stay below ~10, and when readers know the five-number summary. For a general business dashboard, consider bar charts with error bars or labelled medians as a friendlier alternative.
What's the best library for box plots in code?+
For static publication figures, ggplot2's geom_boxplot in R and Matplotlib/seaborn's boxplot in Python are the standards. For interactive web charts, D3 (with d3.quantile to compute the five numbers manually), Plotly, Vega-Lite, and Observable Plot all support box plots out of the box.
// 18 — References
References and further reading
Primary sources, reference texts, and the official documentation for the libraries and tools referenced throughout this guide.
- Wikipedia — Box plotReferenceEncyclopedia entry covering the history, variants, and visual encoding of box plots, with citations to Tukey, Wickham, and the modern letter-value plot.https://en.wikipedia.org/wiki/Box_plot
- John W. Tukey — Exploratory Data Analysis (1977)Primary sourceThe book that introduced the box-and-whisker plot as part of Tukey’s broader EDA toolkit. Hosted by the Internet Archive; chapter 2 has the original sketch.https://archive.org/details/exploratorydataa00tuke_0
- PDF that surveys the modern variants of the box plot — violin, beeswarm, letter-value — and the cases where each one beats Tukey’s original.https://vita.had.co.nz/papers/boxplots.pdf
- Hands-on tutorial with annotated real-world examples. Especially useful for the whisker-rule debate and the “overlay the points” recommendation.https://academy.datawrapper.de/article/325-what-to-consider-when-creating-a-box-plot
- Financial Times — Visual VocabularyReferenceOpen-source poster categorizing chart types by intent. Box plots sit firmly in the Distribution family, alongside histograms, violins, and beeswarms.https://github.com/Financial-Times/chart-doctor/tree/main/visual-vocabulary
- Tufte’s foundational text on data graphics. The chapter on data-ink ratio explains why minimalist Tukey-style boxes beat ornamented spreadsheet defaults.https://www.edwardtufte.com/book/the-visual-display-of-quantitative-information/
- WAI — Complex Images: Charts and GraphsAccessibilityWeb Accessibility Initiative guidance on making charts accessible: text alternatives, long descriptions, and data tables. Use this when building the five-number summary table beneath your chart.https://www.w3.org/WAI/tutorials/images/complex/
- Official API reference for the Python box plot helper used in this guide’s code sample, including the showfliers, notch, and whis arguments.https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.boxplot.html
- Tidyverse documentation for the ggplot2 box plot geometry. Covers the notch, varwidth, and outlier styling options used in the R sample.https://ggplot2.tidyverse.org/reference/geom_boxplot.html
- seaborn API reference for boxplot and the related boxenplot (letter-value plot) used as a strict upgrade for very large samples.https://seaborn.pydata.org/generated/seaborn.boxplot.html
- Maintained Observable notebook that mirrors the JavaScript code sample in this guide, with d3.quantile-based five-number summary computation.https://observablehq.com/@d3/box-plot