DistributionIntermediate

Violin Plot

A box plot that shows its work — the full shape of a distribution, mirrored symmetrically, revealing peaks, gaps, and clusters that summary statistics hide.

// 01 — The chart

What it looks like

Example — Response time by service tierMilliseconds
5004003002001000FreeBasicPremiumBimodal!

Three violin plots comparing response time distributions. The highlighted “Basic” tier reveals a bimodal distribution — two separate peaks — that a box plot would completely hide.

// 02 — Definition

What is a violin plot?

A violin plot is a hybrid visualization that combines a box plot with a kernel density estimate (KDE). The shape of the “violin” is a mirrored density curve: at any given value on the vertical axis, the width of the shape shows how many data points fall near that value. Wider sections mean more data; narrower sections mean less.

This is the key advantage over a box plot: while a box plot summarizes data into five numbers (min, Q1, median, Q3, max), a violin plot reveals the full distributional shape. It shows whether data is unimodal, bimodal, or multimodal; whether it’s skewed left or right; and where the true concentration of data lies — details that summary statistics erase.

Violin plots are especially valuable when comparing distributions across groups. Placing several violins side by side lets you instantly see which group is tightly clustered versus widely spread, and whether different groups have fundamentally different shapes.

Origin: The violin plot was introduced by Jerry Hintze and Ray Nelson in 1998 in their paper “Violin Plots: A Box Plot-Density Trace Synergism.” They combined the information-rich density trace with the familiar summary of a box plot, naming the result for its resemblance to the instrument.

// 03 — Anatomy

Parts of a violin plot

ABCDE
A — Density curve (KDE): The outer shape — width at any point shows how many data points fall near that value
B — Median line: The middle value of the dataset, shown as a horizontal line or dot inside the violin
C — Interquartile range (IQR): The inner box spans Q1 to Q3 — the middle 50% of data points
D — Width = density: Wider sections have more data concentrated at that value; narrower sections have less
E — Category axis: Each violin represents one group or category being compared

// 04 — Usage

When to use it — and when not to

✓Use a violin plot when…
  • You need to see the full shape of a distribution, not just a five-number summary
  • Your data may be bimodal or multimodal and you need to detect multiple peaks
  • Comparing distributions across 2–10 categories side by side
  • You have a large enough sample (n > 30) to produce a smooth density estimate
  • Your audience is comfortable with slightly technical visualizations
  • A box plot hides important details about your data’s shape
×Avoid a violin plot when…
  • Your sample size is very small (n < 20) — the density estimate will be unreliable
  • Your audience is non-technical and unfamiliar with density curves
  • You have more than 10–12 groups — the chart becomes too crowded
  • A simple box plot already tells the story you need
  • You’re comparing categorical proportions rather than continuous distributions
  • You need to show individual data points — consider a strip plot or beeswarm instead

// 05 — Reading guide

How to read a violin plot

Follow these steps whenever you encounter a violin plot in the wild.

1

Read the axes and identify the groups

Understand what’s being measured on the value axis and what categories are being compared. Each violin represents one group’s full distribution.

2

Look at the overall shape

Is the violin symmetric or skewed? A long tail upward means right-skewed data. A bulge in the middle means data is concentrated around the median. Two bulges means bimodal data — often the most interesting finding.

3

Find the widest point

The widest part of the violin is the mode — where data is most concentrated. If there are multiple wide points, the data has multiple peaks, suggesting distinct subgroups.

4

Check the median and IQR

Most violin plots include a small box plot or markers inside. The median line shows the center. The box (IQR) shows where the middle 50% of data lies. Compare these across violins for quick group-level comparison.

5

Compare violins side by side

Look at the range (top to bottom of each violin), the shape (symmetric vs. skewed), and the concentration (wide vs. narrow). These three comparisons tell you how groups differ in spread, central tendency, and distributional shape.

// 06 — Common mistakes

Mistakes to watch out for

Using with too few data points

A kernel density estimate needs enough data to produce a reliable curve. With fewer than 20–30 points, the violin shape is mostly mathematical smoothing, not a reflection of the actual data. Use a strip plot or jittered dot plot for small samples instead.

Implied density beyond the data range

KDE smoothing can create “tails” that extend beyond the actual data range — for example, showing density below zero for data that can’t be negative. Always check whether the violin extends into impossible values and trim if necessary.

Choosing a poor bandwidth

The bandwidth parameter controls how smooth the density curve is. Too smooth and you erase real peaks. Too rough and you see noise instead of signal. Most tools use sensible defaults, but always sanity-check whether the shape matches your data.

Comparing violins on different scales

If each violin is independently scaled to fill the same width, a group with high density at its peak looks the same as one with low density. Use a consistent scale across violins so width comparisons are meaningful.

Omitting the internal box plot

Without median and IQR markers inside the violin, readers lose the familiar summary statistics that make comparisons quick. Always include at least a median marker — and ideally the IQR box — inside each violin.

// 07 — Real-world examples

Where you’ll see violin plots used

01

Biomedical research: Gene expression across tissues

Researchers plot the expression level of a gene across different tissue types (brain, liver, muscle). The violin shape instantly reveals whether expression is uniform or bimodal — a crucial distinction when identifying tissue-specific genes. Box plots would mask this bimodality entirely.

Genomics
02

Tech: API response time by region

A site reliability team compares p50/p95/p99 latency across data centers. Violin plots reveal that one region has a bimodal distribution — most requests are fast, but a subpopulation hits a slow cache path. This insight drives targeted optimization.

Performance Engineering
03

Education: Test score distributions by school

An education policy study compares standardized test scores across 8 schools. Violin plots show that two schools with similar median scores have dramatically different shapes — one has a tight cluster around the median while another is bimodal with distinct high- and low-performing groups.

Education Research

// 08 — At a glance

Quick reference

Also known asViolin chart, density box plot, KDE plot
Invented byJerry Hintze & Ray Nelson, 1998
Best forShowing the full distributional shape of continuous data across categories
Data typesContinuous numeric variable grouped by a categorical variable
Recommended groups2 – 10 categories for readability
Min sample size~30+ per group for reliable density estimation
Common toolsSeaborn (Python), ggplot2, D3.js, Plotly, Matplotlib, Vega-Lite
Common mistakesToo few data points, poor bandwidth, density beyond range, missing median

// 09 — Variations

Types of violin plots

The basic violin plot has several important variants, each suited to slightly different data situations.

Split violin plot

Shows two groups in a single violin — one on each side of the center line. Perfect for direct A/B comparisons.

Raincloud plot

Combines a half-violin, a box plot, and jittered raw data points. Shows the density, summary stats, and individual observations all at once.

Violin with strip plot

Overlays individual data points inside the violin. Best for moderate sample sizes where seeing raw data adds insight.

Horizontal violin plot

Rotates violins to run left-to-right. Useful when category labels are long or when comparing many groups vertically.

// 10 — FAQs

Frequently asked questions

What is a violin plot?+

A violin plot is a hybrid visualization that combines a box plot with a kernel density estimate (KDE). The shape of the "violin" is a mirrored density curve: at any given value on the vertical axis, the width of the shape shows how many data points fall near that value. Wider sections mean more data; narrower sections mean less.

When should you use a violin plot?+

Use a violin plot when you need to see the full shape of a distribution, not just a five-number summary. It also works well when your data may be bimodal or multimodal and you need to detect multiple peaks, and when comparing distributions across 2–10 categories side by side.

When should you avoid a violin plot?+

Avoid a violin plot when your sample size is very small (n < 20) — the density estimate will be unreliable. It is also a poor fit when your audience is non-technical and unfamiliar with density curves, or when you have more than 10–12 groups — the chart becomes too crowded.

How is a violin plot different from a box plot?+

Both a violin plot and a box plot can look similar at first glance, but they answer different questions. Reach for a violin plot when the comparisons and patterns it was designed to reveal match what you need to communicate, and choose a box plot when its particular strengths better fit your data and audience.

Is a violin plot suitable for dashboards?+

Yes — a violin plot can work well in dashboards as long as the panel is large enough for readers to perceive the encoded values, has a clear title, and includes the legend or axis labels needed to interpret it.

What category of chart is a violin plot?+

Violin Plot belongs to the Distribution family of charts. Charts in that family are designed to answer the same kind of question, so they often work as alternatives when one doesn't quite fit your data.