DistributionIntermediate

Histogram

A chart that groups one continuous variable into adjacent bins and shows how often values fall in each range. The default tool for understanding how data is distributed — and the chart most often confused with a bar chart.

// 01The chart

What it looks like

Example — Customer age distributionn = 2,400 customers, 7-year bins
50040030020010001825323946536067+Age (years, 7-year bins)442Peak bin

A histogram showing customer ages grouped into 7-year bins. The bars touch because the underlying scale is continuous — there is no gap between bins.

// 02Definition

What is a histogram?

A histogram is a chart that takes a single continuous variable, slices its range into adjacent intervals called bins, and draws a bar over each bin whose height encodes the number (or density) of observations that fall inside it. The chart answers a single question: where do my values pile up, and how spread out are they? Everything else — bin width, axis scale, overlay versus small multiples — is in service of getting an honest answer to that one question.

The single most important visual difference between a histogram and a bar chart is that histogram bars touch each other. There is no gap between the bin that ends at age 32 and the bin that starts at age 32 because the underlying scale is continuous — the two bins are adjacent intervals on the same number line. Bar charts, by contrast, always have gaps between bars to communicate that their categories are discrete and unrelated. Mixing the two conventions is the most common mistake in beginner work, and it is invisible until someone who knows the rule notices.

Because a histogram compresses every observation into a count per bin, it can reveal features that no other simple chart can. A symmetric bell shape suggests a process well-described by a normal distribution. A long right tail flags incomes, file sizes, response times — phenomena where a small number of cases dominate the average. Two distinct peaks (bimodality) suggests that the dataset is actually two populations mixed together, which has implications for everything from segmentation to hypothesis testing. None of this is visible in a five-number summary or a single mean.

The cost of that diagnostic power is sensitivity to one parameter: the bin width. Too few bins hides real structure; too many turns the chart into noise. There is no universally correct choice, only a small family of rules of thumb (Sturges, square-root, Freedman–Diaconis) and the discipline of trying two or three before committing. The rest of this guide is about how to live inside the histogram’s sweet spot, how to choose bins on purpose, and how to recognise when a density plot, box plot, or violin plot would tell the story better.

Origin: The word histogram was coined by Karl Pearson in 1895, in his paper Contributions to the Mathematical Theory of Evolution. The name combines the Greek histos (mast or web) with gramma (drawing). Pearson developed it as a tool for fitting frequency curves to biological data, and it quickly became one of the foundational charts of modern statistics.

// 03When to use

When a histogram is the right call

Reach for a histogram whenever the question is about the shape, spread, or centre of one continuous variable, and the answer is informative at the level of bins rather than individual rows. Below are the situations where it consistently wins against the alternatives.

✓Use a histogram when…
  • Inspecting how a single continuous variable is distributed (ages, prices, response times, scores)
  • Checking whether data follows a normal (bell-curve) distribution before running a parametric test
  • Looking for skewness — does the data pile up on the left or right?
  • Spotting multiple peaks (bimodality) that suggest mixed populations
  • Identifying outliers or unusual gaps in a moderate-to-large dataset (n ≥ 50)
  • Quality control — is a process centred on the target specification with acceptable spread?
  • Exploring a new dataset before any modelling or formal analysis

// 04When not to use

When a histogram is the wrong call

A histogram can be drawn from almost any numeric column, but “technically possible” is not the same as “good idea.” Below are the cases where the histogram actively hides information you need to communicate.

×Avoid a histogram when…
  • The variable is categorical (countries, products, payment methods) — use a bar chart instead
  • You want to compare values across categories rather than inspect a distribution
  • You need to show change over continuous time — use a line chart
  • You have fewer than ~30 observations — too few for stable bin counts
  • You want to highlight individual observations — use a strip plot, dot plot, or beeswarm
  • You need to compare distributions of more than three groups — switch to small multiples, ridgeline, or violin plots
  • You need a compact summary of many groups in one chart — use a box plot
  • Bin widths must vary and you cannot switch the y-axis to density

// 05Data requirements

What your data needs to look like

Before building the chart, your dataset needs to fit a specific shape. Use this checklist to confirm yours does.

Shape

One row per observation, with a single continuous numeric column. Optionally a second column for grouping and a weight column for survey-weighted data.

Minimum rows

~30 observations. Below that, individual bins are dominated by sampling noise rather than the underlying distribution.

Maximum rows

No upper limit. Histograms scale gracefully — millions of rows still produce a clean shape, though you may want to switch to a density plot or hexbin for very large samples.

Required fields
valuerequired
number (continuous)

One numeric measurement per observation. The whole histogram is built from this single column — ages, prices, response times, lab measurements. The values must share a unit and a scale; mixing units silently produces a meaningless chart.

group
string (optional)

Optional second column used to split the data when comparing two or three distributions. Each group becomes its own histogram, either overlaid with transparency or shown as small multiples — keep groups to a handful so the visual stays readable.

weight
number (optional)

Optional importance weight per observation, used when each row represents more than one underlying unit (survey weights, population estimates, sampling probabilities). Most libraries accept a weights argument that scales each observation’s contribution to its bin.

bin_edges
number[] (optional)

Optional explicit list of bin edges. Override the default uniform binning when you want bins aligned to meaningful breakpoints (age brackets, salary bands) or when the data is bounded and the default rule overshoots.

Example data
customer_idagesegment
1042127new
1042234returning
1042341returning
1042429new
1042558returning
1042672lapsed

Tip: if your raw data is already aggregated into bin counts (e.g., a column of bin labels and a column of frequencies), most plotting libraries can’t draw it directly. Either expand the rows back into the original observations, or use the lower-level bar geometry with the bins as bar widths and frequencies as bar heights.

// 06Anatomy

Parts of a histogram

Every histogram is built from the same five parts. Recognising them by name is the fastest way to read someone else’s chart and to spot when one of the parts has been mislabelled or omitted.

ABCDE
A — Y-axis (frequency or density): Counts of observations per bin, or density (count divided by bin width and total) when bin widths vary
B — X-axis (continuous variable): The measurement scale, sliced into equal-width intervals — always continuous, never categorical
C — Bar height: Proportional to how many observations fall in that bin — taller means more data points in that range
D — Bin width: The range each bar covers — the single most consequential choice you make when building a histogram
E — No gaps between bars: Bars touch on purpose because the underlying scale is continuous — the only chart type where bar gaps are forbidden

// 07Step-by-step

Step-by-step: how to build a good histogram

A ten-step recipe that works regardless of the tool. The early steps are about understanding the data; the middle steps are about choosing bins on purpose; the last steps are about making the chart trustworthy at the size your readers will see it.

  1. 1

    Confirm the variable is continuous

    A histogram only makes sense for a continuous (or quasi-continuous integer) variable. If the column is categorical — country, product, payment method — stop and reach for a bar chart instead. Continuous means values sit on a number line where any value between two observed values is meaningful.
  2. 2

    Inspect the raw data first

    Before binning, look at the count, the min, the max, and any obvious outliers (a one-line summary in pandas, dplyr, or Excel’s STATISTICS panel). This tells you the natural range and warns you about extreme values that will distort an automatic bin width.
  3. 3

    Pick a bin-width rule

    Start with Sturges’ rule (⌈log₂ n⌉ + 1) for normal-ish data, the square-root rule (√n) as a quick default, or the Freedman–Diaconis rule (2 · IQR · n^(−1/3)) for skewed or heavy-tailed data. The right rule is the one whose shape matches what the data actually does — try two and compare.
  4. 4

    Sketch with three bin counts

    Draw the histogram with too few bins, with too many, and with your chosen count. Too few hides a second peak; too many turns it into static. The middle option is usually right, and the comparison is what tells you so. Save the chosen bin width — you will mention it in the caption.
  5. 5

    Decide between counts and density on the y-axis

    Use raw counts when the absolute number matters or all bin widths are equal. Switch to density (count divided by bin width and total) when bin widths vary or when you intend to overlay a smooth density curve. Density makes the area under the bars sum to one.
  6. 6

    Handle skew and long tails

    If the data piles up at one end and trails off, consider a logarithmic x-axis or trimming the tail at a published quantile (and labelling that you did so). A linear axis on heavy-tailed data wastes the visual on a few outliers and squashes the bulk of the distribution into a single column.
  7. 7

    Annotate centre, spread, and notable bars

    Add a vertical line for the mean or median, label the modal bin with its count, and call out any outlier-bin worth discussing. The reader can then anchor the shape to a number rather than estimating heights from the y-axis.
  8. 8

    Caption the bin width and sample size

    Write “n = 2,400 customers, 7-year bins” below the chart. The bin width and sample size are the two pieces of metadata that change how readers should interpret the shape, and they belong in the caption rather than buried in code.
  9. 9

    Lead with a takeaway title

    “Age distribution of customers” is a label. “Most customers are 32–46, with a long tail into early retirement” is a takeaway. Lead with the takeaway and use a smaller subtitle for the descriptive label and the dataset metadata.
  10. 10

    Verify the chart at the size readers will see

    A histogram with 20 bins reads beautifully at 800 pixels wide and turns into a wall of black at 320 pixels. Re-render the chart at the smallest target width — mobile column, dashboard tile, slide thumbnail — and adjust the bin count or aspect ratio if bars get unreadable.

// 08Real-world examples

Where you’ll see histograms used

Histograms show up wherever someone needs to inspect a single distribution: product analytics dashboards, scientific papers, quality-control reports, and journalism on inequality. Each context has its own conventions, and they all reward the same fundamentals.

01

Product: Page-load time distribution

A web performance team plots the histogram of page-load times across millions of sessions. The shape reveals a long right tail of slow loads that the median (a single number) hides completely. The 95th and 99th percentile lines are drawn in the brand accent color, framing the conversation around tail latency rather than averages.

Web Analytics
02

Science: Distribution of measurement error

A laboratory paper plots the histogram of differences between measured and reference values across 1,200 calibration runs. A roughly normal distribution centred on zero validates the instrument; a skew or a second peak would flag a systematic bias. The bin width matches the instrument’s smallest precision step.

Research
03

Journalism: Income distribution by country

A newsroom publishes a small-multiples grid: one mini-histogram per country showing household income distribution. All charts share an x-axis on a logarithmic scale (incomes are heavy-tailed) and a common y-axis (density). The reader can compare shapes across countries at a glance and see where each country’s middle class sits.

Data Journalism
04

Manufacturing: Quality control on a production line

A factory plots the histogram of part diameters from a sample of 500 produced per shift. Vertical lines mark the upper and lower spec limits. Anything piling up against either limit, or a second peak away from the target, is an early warning of process drift before any individual part fails inspection.

Quality Control

// 09Variations

Types of histogram

The plain histogram has several common variants, each suited to a slightly different data situation. The headline rule is the same as ever: pick the variant whose strengths match your question.

Frequency polygon

Connects bin midpoints with a line instead of drawing bars. Same information as a histogram, but overlays cleanly when comparing two or three groups.

Cumulative histogram

Each bar shows the cumulative count up to its right edge. Reads percentiles directly off the y-axis and is useful for spec-limit and SLA reporting.

Overlaid (group) histogram

Two transparent histograms on the same axes for comparing two groups. Limit to two or three groups before switching to small multiples.

Bimodal / fine-binned histogram

Many narrow bins reveal subtle structure such as two peaks in the data. Useful for diagnostics; switch to a density plot when the noise becomes overwhelming.

// 10Comparisons

Histogram vs other chart types

Histograms get confused with several other chart types because they all visualise distributions or use rectangles. The differences matter — picking the wrong one changes what your reader is allowed to conclude.

Histogram vs bar chart

Histograms and bar charts look almost identical, but they answer different questions. A histogram bins one continuous variable; a bar chart compares discrete categories. The visual giveaway is whether the bars touch.

Histogram

Each bar is a bin of one continuous variable. Bars touch because the underlying scale is continuous, and the bin order is fixed by the axis.

  • Variable is continuous (numeric)
  • Bars are flush with no gaps
  • Bin order is fixed; you can’t resort it

Bar chart

Each bar is a separate, unrelated category. Bars are usually drawn with gaps and can be sorted by value or by an inherent category order.

  • Categories are nominal or ordinal
  • Bars are drawn with gaps
  • Bars can be reordered freely

Histogram vs density plot (KDE)

A histogram counts observations into discrete bins. A kernel density estimate (KDE) draws a smooth curve over the same data. Histograms are honest about raw counts and bin choices; KDEs are easier to overlay and read.

Histogram

Discrete bars built from raw counts. Reveals exactly how many observations sit in each bin, so it cannot oversmooth real structure.

  • Honest about raw counts and bin choices
  • Shape depends on bin width
  • Hard to overlay multiple groups

Density plot (KDE)

A smooth curve estimated from the same data with a kernel and a bandwidth. Easier to overlay several groups, but hides the underlying sample size.

  • Smooth, continuous curve
  • Easy to overlay 2–3 groups
  • Bandwidth choice can oversmooth

Histogram vs box plot

A box plot summarises a distribution to five numbers; a histogram shows the full shape. Box plots scale to many groups in one chart but hide multimodality; histograms reveal multimodality but only show one (or two) groups at a time.

Histogram

Full distributional shape, including modes, gaps, and tails. Best when the shape is the story — “there are two peaks, not one.”

  • Reveals multimodality and gaps
  • Shows full shape, not just summary
  • One or two groups per chart

Box plot

Five-number summary (min, Q1, median, Q3, max) plus outliers. Best when comparing many groups or when the median and IQR are the headline.

  • Compact summary per group
  • Compares many groups easily
  • Hides multimodality and gaps

Histogram vs frequency polygon

A frequency polygon connects the midpoints of histogram bins with straight lines. It carries the same information as a histogram but emphasises continuity, and it overlays more cleanly when comparing groups.

Histogram

Discrete bars per bin. Easier to read individual bin counts and to spot gaps in the data.

  • Per-bin counts are obvious
  • Bars highlight individual bins
  • Cluttered when overlaid

Frequency polygon

Line through bin midpoints. Emphasises overall shape and overlays multiple groups without the visual mess of stacked transparent bars.

  • Smooth shape emphasis
  • Overlays cleanly across groups
  • Hides individual bin counts

// 11Common mistakes

Mistakes to watch out for

Almost every broken histogram in the wild fails the same handful of ways. If you only memorise five rules, make them these.

Trusting the default bin count

Most plotting libraries silently default to ten or thirty bins regardless of the dataset. Ten bins hides bimodality on small samples; thirty bins oversmooths on large ones. The default is rarely right — override it on purpose with Sturges, square-root, or Freedman–Diaconis as a starting point and try at least two values before committing.

Mixing raw counts with unequal bin widths

If you customise bin edges and leave the y-axis as raw counts, the visual area lies about the underlying frequency. A wide bin with the same count as a narrow bin looks much taller, suggesting a peak that does not exist. The fix is to switch the y-axis to density (count divided by bin width and total) whenever bins vary in width.

Drawing the bars with gaps

Gaps between bars communicate that the categories are discrete and unrelated — the exact opposite of what a histogram needs to say. Tools that default to bar-chart styling (Excel’s legacy Data Analysis ToolPak histogram, some Tableau templates) need to have the gap explicitly set to zero so the bars touch.

Forgetting to handle the long tail

Heavy-tailed data — incomes, file sizes, network latencies, social media followers — squashes the bulk of the distribution into a single bar at the left when plotted on a linear axis. The fix is a logarithmic x-axis (or a log transform of the variable) and a caption noting the change in scale.

Overlaying too many groups

Two transparent histograms overlay readably; three is the upper limit; four or more turns into a muddy stack where no individual group is legible. For more groups, switch to small multiples (one mini-histogram per group on a shared axis), a ridgeline plot, or a violin plot.

Treating the histogram as a bar chart

Categorical data plotted as a histogram is the most common beginner error. “Count of customers by country” is a bar chart — the categories are unrelated and the bar order is arbitrary. “Count of customers by age” is a histogram — the order is fixed by the number line and the bars touch.

Forgetting the bin width and sample size in the caption

Bin width and sample size are the two pieces of metadata that most change how a reader should interpret the shape. Without them, the chart is impossible to reproduce or critique. Write “n = 2,400, 7-year bins” under the title — it costs nothing and saves the chart from accidental misuse.

// 12Accessibility

Accessibility checklist

Run through this list before publishing. The chart should still communicate its message to readers using assistive technology, color-blind users, keyboard navigation, and reduced-motion settings.

  • ✓

    Color contrast meets WCAG AA

    WCAG 1.4.3
    Bar fill against the chart background should reach at least 3:1 contrast for graphical objects, and any axis text, captions, or callouts should reach 4.5:1 for body text or 3:1 for large text. Filled histograms with very light pastel bars often fail this check on white backgrounds.
  • ✓

    Do not rely on color alone

    WCAG 1.4.1
    When overlaying two or three group histograms, distinguish each group with a pattern, hatching, or border style as well as fill color. Roughly 1 in 12 men and 1 in 200 women have some form of color-vision deficiency and cannot tell two transparent fills apart by hue.
  • ✓

    Provide a text alternative for the chart

    WCAG 1.1.1
    Add an aria-label or alt text that describes the shape of the distribution, not the chart type. “Histogram of customer ages” is weak; “Customer ages are roughly normal with a peak around 40, range 18–85, n = 2,400” is strong and works for screen-reader users who cannot see the bars.
  • ✓

    Expose the bin edges and counts as a table

    WCAG 1.3.1
    Place a screen-reader-friendly table next to or beneath the chart listing each bin’s lower edge, upper edge, and count. Many readers will copy this data rather than re-key it from the visual, and a histogram is unusual in that the entire chart compresses to a small table without losing any information.
  • ✓

    State the bin-width rule in the axis caption

    WCAG 3.3.2
    Bin width is part of the data, not just a styling decision. Include the chosen bin width and (optionally) the rule used to pick it (“Freedman–Diaconis,” “Sturges,” or “5-year bins”) in the axis label or caption so readers can reproduce or critique the chart.
  • ✓

    Make tooltips keyboard-accessible

    WCAG 2.1.1
    If the chart is interactive, every bar should be focusable with Tab and its tooltip should appear on focus, not only on hover. The tooltip text must include the bin range and the count or density, in plain language, with a visible focus ring on the bar.
  • ✓

    Respect prefers-reduced-motion

    WCAG 2.3.3
    If bars animate in on load, gate the animation behind a prefers-reduced-motion: no-preference media query so motion-sensitive readers see the final state immediately. Animating bin heights from zero is especially nausea-inducing on long-tailed distributions.
  • ✓

    Make the chart resizable and zoomable

    WCAG 1.4.4
    Let the histogram container scale with the viewport and stay legible at 200% browser zoom. Use a responsive viewBox rather than a fixed pixel size, and make sure axis ticks adjust so they don’t overlap when the chart is shrunk to a mobile column.

// 13Best practices

Design and craft tips

The mistakes section above tells you what to avoid. The list below is the positive version: the small set of habits that separate a defensible histogram from a passable one.

Do

Try several bin widths before committing

Render the histogram at three bin counts — a coarse one, a fine one, and your chosen middle — and ship the version whose shape matches the underlying data. The right bin count is rarely the library default.
×Don’t

Trust the default bin count

Most libraries pick 10 bins regardless of the dataset. Ten bins hides bimodality on small samples and oversmooths on large ones. Override the default with Sturges, square-root, or Freedman–Diaconis as a starting point.
Do

Caption the bin width and sample size

Write “n = 2,400, 7-year bins” directly under the chart. Both numbers change how the reader should interpret the shape, and they are too important to bury inside the source code.
×Don’t

Mix raw counts with unequal bin widths

If you customize bin edges, switch the y-axis to density. Raw counts with variable bin widths make the visual area lie about the underlying frequency, the one job a histogram exists to do.
Do

Use a log axis for skewed data

Heavy-tailed distributions (incomes, file sizes, network response times) almost always read better on a log x-axis or after a log transform. Label the axis ticks in the original units to keep the chart legible.
×Don’t

Add gaps between bars

Histogram bars touch on purpose: the underlying scale is continuous. Drawing them with gaps falsely suggests the variable jumps in steps and turns the chart into a (badly labelled) bar chart.
Do

Lead with a takeaway title

Use the chart title to state the conclusion (“Most customers are 32–46 with a long tail into retirement”) and a smaller subtitle for the descriptive label (“Customer ages, 7-year bins, n = 2,400”).
×Don’t

Compare more than three groups by overlay

Two or three transparent histograms overlay readably; four or more turns into a muddy stack. For more groups, switch to small multiples, a ridgeline plot, or a violin plot.

// 15Tool instructions

How to build it in your tool of choice

Histograms are built into every modern analysis tool, but each one buries the bin-width control in a different place. The recipes below get you to a clean histogram with a deliberate bin width, a captioned axis, and an honest title in each of the most common platforms.

Microsoft Excel

Spreadsheet — ~4 min
  1. 01Place your raw continuous values in a single column with a header row — Excel’s Histogram needs the underlying observations, not pre-aggregated bin counts.
  2. 02Select the column, then choose Insert → Charts → Statistical → Histogram.
  3. 03Right-click the x-axis and choose Format Axis. Under “Axis options,” set Bin width or Number of bins to a value that reveals the distribution’s real shape — try a few before committing.
  4. 04Use the Overflow bin and Underflow bin fields to cap extreme tails (e.g., everything above 200 becomes a single “200+” bin) so a few outliers don’t flatten the rest of the chart.
  5. 05For skewed data, add a logarithmic axis: Format Axis → Logarithmic scale → base 10. Note the change in the caption.
  6. 06To compare two distributions, build a second histogram on a separate copy of the chart with a contrasting fill, and align the x-axis ranges manually so the small multiples are comparable.
  7. 07Edit the title to a takeaway sentence and add an axis label that includes the chosen bin width and the sample size.

Excel hides the bin-width control behind two right-clicks. If you don’t see Bin width in Format Axis, make sure the chart type is the new “Histogram” under Statistical, not the legacy Data Analysis ToolPak histogram.

Google Sheets

Spreadsheet — ~4 min
  1. 01Lay out your continuous values in a single column with one header row.
  2. 02Select the column, then choose Insert → Chart. Sheets usually picks Histogram automatically; if not, switch the Chart type to Histogram in the Setup tab.
  3. 03Open the Customize tab and expand the Histogram section. Set Bucket size (Sheets’ word for bin width) to a meaningful value rather than the default Auto.
  4. 04Toggle “Show item dividers” on so the bin edges are visible — it makes the chart noticeably easier to read.
  5. 05Under Vertical axis, switch to Logarithmic scale for heavy-tailed data and update the caption to call out the change.
  6. 06For two groups, build small multiples by inserting a second histogram below the first with the same Bucket size and aligned axis ranges — Sheets cannot natively overlay two histograms.
  7. 07Use Chart & axis titles to write a takeaway and to label the x-axis with units and the bucket size.

Google’s default “Auto” bucket sizing rounds aggressively. Override it to a clean number (1, 2, 5, 10, 20…) to keep the bin labels readable.

Python (Matplotlib)

Code — ~5 min
  1. 01Install Matplotlib and NumPy: pip install matplotlib numpy.
  2. 02Read your continuous variable into a one-dimensional NumPy array or pandas Series and drop NaNs with .dropna() before plotting.
  3. 03Call plt.hist(values, bins='fd') to use the Freedman–Diaconis rule, or pass an integer count or an explicit list of edges for full control.
  4. 04Switch to density: plt.hist(values, bins=..., density=True) when you want the area under the bars to sum to 1 — useful when overlaying a KDE or comparing groups of different sizes.
  5. 05Apply a log x-axis for skewed data with ax.set_xscale('log') and use plt.gca().xaxis.set_major_formatter for human-readable tick labels.
  6. 06Compare two distributions by calling plt.hist() twice with alpha=0.5 and a label= argument, then plt.legend() — or use plt.subplots() for small multiples when you have more than three groups.
  7. 07Set plt.title(), plt.xlabel(), plt.ylabel(), and add plt.figtext() with the bin width and sample size before plt.tight_layout() and plt.show() / plt.savefig().

Use seaborn.histplot if you want a histogram and a KDE in one call — it wraps Matplotlib with sensible defaults for both.

R (ggplot2)

Code — ~5 min
  1. 01Install ggplot2 with install.packages('ggplot2') and load it with library(ggplot2).
  2. 02Pass your data frame to ggplot() and add geom_histogram(aes(x = value)). Without a binwidth argument, ggplot uses 30 bins and warns you to override it.
  3. 03Set binwidth = explicitly (e.g., binwidth = 5 for 5-unit bins) or use bins = N — try several values and pick the one whose shape matches the data.
  4. 04Add scale_x_log10() with a labels = scales::comma argument when the variable is heavy-tailed, and update the axis label to flag the log scale.
  5. 05Switch to density on the y-axis with aes(y = after_stat(density)) when you intend to overlay a geom_density() curve or compare groups of unequal size.
  6. 06For two or three groups, add fill = group inside aes() and use geom_histogram(position = 'identity', alpha = 0.5). For more groups, switch to facet_wrap(~ group) for small multiples.
  7. 07Polish with labs() for a takeaway title, x-axis units and bin width, and theme_minimal() for a clean default look.

ggplot’s warning about “`stat_bin()` using `bins = 30`” is a feature, not a bug — it’s telling you to choose a bin width on purpose. Set binwidth or bins explicitly to silence it.

JavaScript (D3.js)

Code — ~10 min
  1. 01Install D3 (npm i d3) or include the CDN script tag in your HTML.
  2. 02Build a linear x-scale across [d3.min(data), d3.max(data)] and a linear y-scale you’ll set after binning.
  3. 03Use d3.bin() to compute bins: const bin = d3.bin().domain(x.domain()).thresholds(x.ticks(40)); const bins = bin(data);. Replace 40 with your chosen bin count or pass an explicit array of thresholds.
  4. 04Set the y-domain after binning: y.domain([0, d3.max(bins, b => b.length)]).nice();.
  5. 05Render bars with selectAll('rect').data(bins).join('rect') and set x to x(b.x0), width to x(b.x1) - x(b.x0) - 1, and y/height from the count.
  6. 06For a log axis on heavy-tailed data, swap x to d3.scaleLog() and use d3.bin().thresholds(d3.scaleLog().ticks(40)).
  7. 07Render axes with d3.axisBottom(x) and d3.axisLeft(y), append a <title> for accessibility, and add a transition() guarded by prefers-reduced-motion.

If you don’t need full control, Observable Plot (Plot.rectY(data, Plot.binX({y: 'count'}, {x: 'value'}))) gets you to a working histogram in a single line.

Tableau

BI — ~5 min
  1. 01Connect to your data and drag the continuous measure to the Columns shelf. Tableau will offer a Histogram in the Show Me panel — click it.
  2. 02Tableau auto-creates a bin field with a default size. Right-click the bin field in the Data pane, choose Edit, and set Size of bins to a meaningful value (or use the formula at the bottom for Freedman–Diaconis).
  3. 03Drag the original measure to Rows as COUNT() to confirm the y-axis is counts; switch to PCT_OF_TOTAL or build a calculated density field if you need density.
  4. 04Right-click the x-axis, choose Edit Axis, and switch to Logarithmic for heavy-tailed data — then add a note in the worksheet caption.
  5. 05To compare two distributions, drag the group dimension to Color on the Marks card and set Mark transparency to ~50%. For more than three groups, build a small-multiples grid by dragging the group dimension to Rows or Columns.
  6. 06Click Format → Lines to remove unnecessary gridlines, and edit the worksheet title to a takeaway sentence with the bin size baked into the subtitle.

Tableau’s default bin field is computed once when you create it. If your data refreshes and the range changes, edit the bin field again or it will silently use the original size.

Power BI

BI — ~5 min
  1. 01In Power BI Desktop, drag your continuous column onto the canvas as a numeric field. Right-click the field and choose New group → Bin to define bin size.
  2. 02In the Groups dialog, set the Bin size manually — for example, 5 for 5-unit bins. Power BI automatically creates a new field named “Your column (bins).”
  3. 03Add the new bin field to the X-axis well of a Clustered column chart, and add Count of <your column> to the Values well.
  4. 04Sort the visual by the bin axis (not by value!) using the More options menu so the bin order on the x-axis is preserved.
  5. 05Open Format → X-axis → Type and switch to Logarithmic if your data is heavy-tailed.
  6. 06Toggle Format → Data labels on and choose Outside end so each bar shows its count, and write a title that states the takeaway.
  7. 07For two or more groups, place a Small multiples field on the visual or use Group instead of Bin and then drag a separator dimension into Legend.

If the built-in binning isn’t flexible enough, install the Histogram custom visual from AppSource — it exposes the Freedman–Diaconis and Sturges rules directly.

// 16Code examples

Working code in the most common stacks

Three runnable snippets that produce the same chart — a histogram of a roughly bimodal age distribution with the modal bin highlighted. Copy, paste, and replace the synthetic data with your real values.

histogram.py
import matplotlib.pyplot as plt
import numpy as np

rng = np.random.default_rng(42)
ages = np.concatenate([
    rng.normal(loc=32, scale=8,  size=1400),  # younger cluster
    rng.normal(loc=58, scale=10, size=1000),  # older cluster
])
ages = ages[(ages >= 18) & (ages <= 90)]

fig, ax = plt.subplots(figsize=(8, 4.5))

# 'fd' = Freedman-Diaconis rule; switch to an integer to override.
counts, edges, patches = ax.hist(
    ages, bins="fd", color="#e8c4b8", edgecolor="#c94a2e", linewidth=0.6,
)

# Highlight the modal bin in the brand accent color.
modal = counts.argmax()
patches[modal].set_facecolor("#c94a2e")
patches[modal].set_edgecolor("#c94a2e")

bin_width = edges[1] - edges[0]
ax.set_title("Customer age distribution", loc="left", fontsize=14)
ax.set_xlabel(f"Age (years, {bin_width:.1f}-year bins)")
ax.set_ylabel("Customers")
ax.spines[["top", "right"]].set_visible(False)
ax.figure.text(
    0.99, 0.01, f"n = {len(ages):,}", ha="right", va="bottom",
    fontsize=10, color="#6b6b67",
)

plt.tight_layout()
plt.savefig("histogram.png", dpi=200)
plt.show()
$ python histogram.py

// 17 — FAQs

Frequently asked questions

What is a histogram?+

A histogram is a chart that divides one continuous variable into equal-width intervals called bins, then draws adjacent bars whose heights show how many observations fall in each bin. It answers the question ‘how is my data distributed?’ — where the bulk of the values sit, how spread out they are, and whether the shape is symmetric, skewed, or has multiple peaks.

When should you use a histogram?+

Use a histogram when you need to inspect the distribution of a single continuous variable: ages, prices, response times, test scores, sensor readings. It is the default exploratory chart for checking whether data looks normal, identifying skew or outliers, and seeing whether a process sits inside a quality-control specification.

When should you avoid a histogram?+

Avoid a histogram for categorical data — a bar chart is correct there. Skip it when you have fewer than ~30 observations (the bars become noise rather than signal), when you need to compare more than two or three groups (overlay plots get cluttered — use small multiples or a violin plot instead), or when the focus is on individual data points rather than aggregate shape.

What is the difference between a histogram and a bar chart?+

A bar chart compares discrete, unrelated categories — the bar order can be changed and bars are usually drawn with gaps. A histogram bins one continuous variable, so the bars touch (the scale is continuous), the bin order is fixed by the axis, and you cannot reorder the bars without breaking the chart.

How is a histogram different from a density plot (KDE)?+

Both show the shape of a distribution, but a histogram counts observations into discrete bins while a kernel density estimate (KDE) draws a smooth curve. Histograms are honest about the raw counts and bin choices; density plots smooth over those decisions and are easier to overlay when comparing groups, but they hide the underlying sample size.

How do you choose the number of bins?+

Start with one of the rules of thumb — Sturges' rule (⌈log₂ n⌉ + 1), the square-root rule (√n), or the Freedman–Diaconis rule (2 · IQR · n^(−1/3)). Then iterate: too few bins hide important features, too many turn the chart into noise. Most plotting libraries pick a reasonable default; override it when the shape clearly suffers.

Should the y-axis show counts or density?+

Use counts when the absolute number of observations matters or you are comparing two histograms with the same sample size. Use density (frequency divided by bin width and total count) when bin widths vary or when you want the histogram to overlay cleanly with a density curve — density makes the area under the bars sum to 1.

Can a histogram have unequal bin widths?+

Yes, but you must plot density on the y-axis, not raw counts. With variable bin widths, raw counts make the visual area lie about the underlying frequency. Many tools default to counts, so if you customize bin edges, switch the axis explicitly.

What is a histogram vs a box plot?+

A box plot summarises a distribution into five numbers (min, Q1, median, Q3, max plus outliers); a histogram shows the full shape. Box plots scale to many groups in one chart but hide multimodality; histograms reveal multimodality but only show one (or a few overlaid) groups at a time.

How do I show two distributions on the same histogram?+

Overlay two histograms with transparent fills (“alpha” around 0.5) and a different color or pattern per group, or use small multiples — one mini-histogram per group, sharing axes. Overlay works for two or three groups; beyond that, use a ridgeline plot, violin plot, or faceted small multiples.

What category of chart is a histogram?+

Histograms belong to the Distribution family of charts. Charts in that family share the same goal — revealing the shape, spread, and centre of one variable — so density plots, box plots, violin plots, and beeswarm plots are common alternatives when a histogram doesn’t quite fit.

How do you read a histogram?+

Start with the title and axis labels, especially the bin width if it is stated. Look at the overall shape (symmetric, skewed, bimodal, uniform), then locate the centre and the spread. Finally, scan for unusually tall bars (modes), unusually short bars (gaps), and bars far from the bulk (outliers).

Why do histogram bars touch?+

The bars touch because the underlying scale is continuous. There is no gap between the bin that ends at 30 and the bin that starts at 30 — they are adjacent intervals on the same number line. Drawing them with gaps would falsely suggest the variable jumps in steps.

// 18References

References and further reading

Primary sources, reference texts, and the official documentation for the libraries and tools referenced throughout this guide.

  • Encyclopedia entry covering the history, terminology, bin-width rules, and visual conventions of histograms. A solid neutral starting point with citations to primary sources.
    https://en.wikipedia.org/wiki/Histogram
  • Pearson coined the term “histogram” in this paper, in the context of fitting frequency curves to biological data. Hosted by the Royal Society publishing archive.
    https://royalsocietypublishing.org/doi/10.1098/rsta.1895.0010
  • Tukey’s foundational EDA text. Treats the histogram as a working tool for understanding the shape of a single variable and discusses bin choices long before the term “Freedman–Diaconis” existed.
    https://archive.org/details/exploratorydataa0000tuke_b8s3
  • The original paper deriving the Freedman–Diaconis bin-width rule (2 · IQR · n^(−1/3)). Useful when justifying a bin choice for a publication or report.
    https://link.springer.com/article/10.1007/BF01025868
  • Hands-on tutorial with real published examples. Especially useful for bin-width tradeoffs, log axes for skewed data, and overlaying two distributions.
    https://academy.datawrapper.de/article/231-what-to-consider-when-creating-histograms
  • Open-source poster that organises chart types by intent. Histograms sit firmly in the Distribution family alongside density plots, box plots, and violin plots.
    https://github.com/Financial-Times/chart-doctor/tree/main/visual-vocabulary
  • Web Accessibility Initiative guidance on making charts accessible: text alternatives, long descriptions, and data tables. Use this when building the bin-edge data table for a screen-reader fallback.
    https://www.w3.org/WAI/tutorials/images/complex/
  • Official API reference for the Python histogram helper used in this guide’s code sample, including the ‘fd’, ‘sturges’, and ‘scott’ bin rules.
    https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html
  • Tidyverse documentation for the ggplot2 histogram geometry. Explains the binwidth, bins, breaks, and aes(y = after_stat(density)) options.
    https://ggplot2.tidyverse.org/reference/geom_histogram.html
  • Official documentation for d3-array’s d3.bin() helper used in the JavaScript code sample. Covers thresholds, domain, and value accessors.
    https://d3js.org/d3-array/bin
  • Maintained Observable notebook from the D3 team that mirrors the JavaScript code sample in this guide and includes interactive bin-count controls.
    https://observablehq.com/@d3/histogram/2