CorrelationIntermediate

Scatter Plot

A chart that plots individual data points on two axes to reveal relationships, correlations, and clusters between two continuous variables — the workhorse of exploratory analysis since Herschel sketched the first one in 1833.

// 01The chart

What it looks like

Example — Study hours vs. exam scoren = 24 students
1008060402002h4h6h8h10hStudy hoursExam scoreOutlier

A scatter plot showing a positive correlation between study hours and exam scores. The dashed line shows the linear trend, and one outlier is highlighted.

// 02Definition

What is a scatter plot?

A scatter plot (also called a scatter chart, scattergram, or XY plot) displays individual data points on a two-dimensional plane, where one variable is plotted on the horizontal axis and another on the vertical axis. Each dot represents a single observation — one student, one country, one transaction — positioned by its X and Y values. The cloud of dots, taken together, is what answers the analyst’s question: do these two variables move together, and if so how tightly?

The primary purpose of a scatter plot is to reveal the relationship between two continuous variables. When the dots form a pattern — sloping upward, sloping downward, curving, clustering, or scattering randomly — the shape tells you how one variable behaves as the other changes. A tight upward cloud means strong positive correlation; a wide, shapeless cloud means little or no correlation. The slope of any fitted trend line is the average effect; the spread around the line is the noise.

Unlike bar charts (which compare categories) or line charts (which show change over time), scatter plots are uniquely designed to answer: “Does Y change as X changes, and how reliably?” They are the working tool of regression analysis, the screening device for outlier detection, and the visual anchor for almost every introductory statistics course because position on a common axis is the encoding humans read most accurately. Where they break down is at extremes of size: under ten points the pattern is noise, and past several thousand the markers overlap into a blob and you need a hex-bin or density variant.

The price of that simplicity is a single, persistent trap: correlation is not causation. A scatter plot can show that ice cream sales and drowning rates rise together, but it cannot say that ice cream causes drowning. Both rise with summer heat. Every scatter plot is a question (“why do these move together?”) rather than an answer, and the rest of this guide is about building scatters that reward careful questioning instead of inviting the wrong one.

Origin: The scatter plot was first used by English scientist John Frederick W. Herschel in 1833 to analyze the orbits of double stars. Francis Galton popularized it in the 1880s for studying human traits, plotting parents’ against children’s heights — the chart that gave us the word “regression.” Frank Anscombe’s 1973 quartet later cemented the scatter plot as the chart you must look at before trusting any summary statistic.

// 03When to use

When a scatter plot is the right call

Reach for a scatter plot whenever the question is about how two continuous variables move together and you have enough observations for a pattern to emerge. Below are the situations where it consistently wins against the alternatives.

✓Use a scatter plot when…
  • You want to explore the relationship between two continuous variables
  • You’re looking for correlations — do values of X and Y tend to move together?
  • You need to identify outliers or unusual observations that deserve investigation
  • You have at least ~20 observations so a pattern can emerge above the noise
  • You’re checking whether a linear regression model fits the data before running one
  • You want to compare clusters or subgroups by coloring or shaping the markers
  • You’re building exploratory visuals where individual rows still need to be visible

// 04When not to use

When a scatter plot is the wrong call

Scatter plots can technically display many kinds of data, but “technically possible” is not the same as “good idea.” Below are the cases where the scatter plot actively hides information you need to communicate.

×Avoid a scatter plot when…
  • One axis is categorical (countries, products) — use a bar chart or strip plot instead
  • You want to show change over time — use a line chart so the temporal sequence is visible
  • You have very few data points (fewer than ~10) — patterns won’t be statistically meaningful
  • Both variables are categorical — use a mosaic plot or contingency table
  • Your data is so dense that points overlap into a single blob — use a hex bin or 2D density plot
  • You need exact values rather than patterns — a small data table communicates better
  • You want to show parts of a whole — use a pie, donut, or stacked bar instead
  • You have one continuous variable and want to show its distribution — use a histogram or density plot

// 05Data requirements

What your data needs to look like

Before building the chart, your dataset needs to fit a specific shape. Use this checklist to confirm yours does.

Shape

One row per observation, with two numeric columns for the X and Y coordinates. Optional columns for group (color/shape), size (bubble), and label (annotation).

Minimum rows

10 observations to start, ~20+ for a credible pattern. With fewer, the cloud is just noise.

Maximum rows

~5,000 observations before overplotting forces a switch to alpha blending or a hex bin / density variant.

Required fields
xrequired
number (continuous)

The horizontal coordinate of each marker. Typically the explanatory or predictor variable — the thing you suspect drives the outcome. Must be numeric; categorical X belongs on a strip plot or bar chart instead.

yrequired
number (continuous)

The vertical coordinate of each marker. Typically the outcome or response variable. Must be numeric and ideally measured on a scale where small differences are meaningful.

group
string (optional)

Optional categorical column used to color or shape markers so subgroups can be compared on the same axes. Keep groups to roughly 2–6; past that, marker styles become indistinguishable.

size
number (optional)

Optional positive quantitative column mapped to the marker area. Mapping a third variable to size promotes the chart from scatter plot to bubble chart — use only when the third variable answers a question.

label
string (optional)

Optional row label used to annotate notable points (e.g., country names on a Gapminder-style chart). Annotate two or three points by hand rather than every marker.

Example data
studenthoursscoregroup
S011.542A
S022.048A
S033.762B
S045.570B
S057.084C
S069.295C

Tip: if your raw data is pre-aggregated (means per group), un-aggregate it back to one row per observation before plotting. Tools like Tableau and Power BI silently aggregate by default — use the Detail well or disaggregate measures so each row becomes its own marker.

// 06Anatomy

Parts of a scatter plot

Every scatter plot is built from the same handful of parts. Knowing the names makes it easier to talk about what to keep, what to drop, and what most templates are getting wrong.

ABCDE
A — Y-axis: The vertical axis representing the dependent (outcome) variable
B — X-axis: The horizontal axis representing the independent (predictor) variable
C — Data point: Each dot represents one observation, positioned by its X and Y values
D — Trend line: A fitted line (often linear regression) showing the general direction of the relationship
E — Outlier: A point far from the cloud — may indicate an error, a special case, or an interesting anomaly

// 07Step-by-step

Step-by-step: how to build a good scatter plot

A nine-step recipe that works regardless of the tool. Walk through it the first few times and the moves become automatic; skip steps and the chart usually shows it.

  1. 1

    Pick the question you want the chart to answer

    A scatter plot answers “does Y change as X changes, and how tightly?” Write that question down before drawing anything. If your question is about ranking categories, change over time, or share of a whole, switch to a bar, line, or part-to-whole chart now — not after you have built it.
  2. 2

    Get one row per observation

    Each row in your data must represent one observation — one student, one country, one transaction. If your raw data is pre-aggregated (averages per group), un-aggregate it or accept that the chart will show group means, not the underlying spread.
  3. 3

    Decide which variable goes on which axis

    By convention, X is the explanatory variable and Y is the outcome. If you would describe the relationship as “Y depends on X,” put the candidate cause on X. If neither variable obviously depends on the other, the assignment is yours — just stay consistent across related charts.
  4. 4

    Choose the axis ranges

    Unlike a bar chart, scatter plots do not need to start at zero. Crop both axes to the data range plus a small margin so the cloud fills the chart and small differences become visible. Always show the actual numbers on the axis so the reader knows they are looking at a zoomed view.
  5. 5

    Draw the markers and tame overplotting

    If your dataset has fewer than ~500 points, draw plain filled circles. Past 500, lower marker alpha to roughly 0.2–0.4 so overlapping points darken into local density. Past several thousand, switch to a hex bin or 2D density plot — the scatter has stopped being legible.
  6. 6

    Add a trend line if a relationship is plausible

    A linear regression line summarizes the average direction of the cloud. Print the R² value next to it so the reader can judge fit quality. If the cloud curves, fit a LOESS or polynomial smoother instead — a straight line through curved data hides the real pattern.
  7. 7

    Encode a third variable only when it earns its place

    Color, shape, and size each let you add a third variable. Use color for nominal groups (with a colorblind-safe palette), shape for two or three groups, and size for a positive quantitative variable. If the third variable doesn’t change what the reader concludes, leave it out.
  8. 8

    Annotate the points that tell the story

    Label two or three notable observations with their row identifier (a country name, a player, a date). Don’t label everything — a sea of text removes the focus you just created. Outliers, the headline observation, and one or two extremes are enough.
  9. 9

    Write a takeaway title and ship

    “Study hours vs exam score” is a label. “Each extra hour of study correlated with about 6 more points on the exam” is a takeaway. Put the takeaway on the chart and the descriptive label as a subtitle, then verify the chart still works at the size your readers will see it.

// 08Real-world examples

Where you’ll see scatter plots used

Scatter plots show up in three places more than anywhere else: scientific papers, public-policy storytelling, and business analytics. Each context has its own conventions, and they all reward the same fundamentals.

01

Medicine: Drug dosage vs patient response

Pharmaceutical researchers use scatter plots to visualize the relationship between dosage levels and a measured outcome — blood pressure drop, antibody titer, time to remission. Points cluster tightly when the dose-response relationship is strong; outliers are flagged for chart review because they may be adverse reactions or non-responders. Trial reports almost always include the underlying scatter as well as the fitted curve.

Medical Research
02

Economics: GDP per capita vs life expectancy

The famous Gapminder visualization, popularized by Hans Rosling, shows countries as dots with GDP per capita on the X-axis and life expectancy on the Y-axis. Marker area encodes population and color encodes continent — a scatter plot that has graduated to a bubble chart. Two centuries of data played as an animation rewrote how lay audiences think about global development.

Economics
03

Sports: Player salary vs performance

Sports analysts use scatter plots to identify overpaid and underpaid players by plotting salary against a key performance metric (WAR for baseball, expected goals for soccer). Players above the trend line are outperforming their pay; those below are underperforming. The same chart drives front-office trade decisions and the analytics blogs that scrutinize them.

Sports Analytics
04

Business: Customer acquisition cost vs lifetime value

A SaaS dashboard showing one dot per acquisition channel, with cost per customer on X and lifetime value on Y. A 1:1 reference line splits the chart into healthy channels (above the line) and unprofitable ones (below). Operators glance at the chart, identify the channel that needs attention, and reallocate budget without ever opening the underlying spreadsheet.

Business Analytics

// 09Variations

Types of scatter plots

The basic scatter plot has several important variants, each suited to a slightly different data situation. The headline rule is the same as ever: pick the variant whose strengths match your question.

Bubble chart

Encodes a third variable through the area of each marker. Use only when the third variable is meaningful and positive.

Connected scatter

Lines connect points in time order, showing how the relationship between two variables evolves over time.

Hex bin / density scatter

Aggregates points into hexagonal cells whose color encodes count. Solves overplotting past several thousand points.

Jitter / strip plot

Adds small random offsets to points to prevent overlap when one axis is categorical or has only a few unique values.

// 10Comparisons

Scatter plot vs other chart types

Scatter plots get confused with several other chart types because they all live in or near the Correlation family. The differences matter — picking the wrong one changes what your reader is allowed to conclude.

Scatter plot vs bubble chart

Both place markers in a continuous X–Y plane. A scatter plot encodes two variables; a bubble chart encodes a third by mapping it to marker area. Use the bubble version only when the extra variable answers a question — otherwise the size channel becomes decorative noise.

Scatter plot

One marker per observation. Position encodes two variables; markers are typically the same size. The cleanest way to show a two-variable relationship.

  • Two variables: X and Y
  • All markers same size
  • Easiest to read at a glance

Bubble chart

One marker per observation, but marker area encodes a third quantitative variable. Color often encodes a fourth, categorical variable. Powerful but easy to overload.

  • Three or four variables encoded at once
  • Map size to area, never radius
  • Best with a strong takeaway and few markers

Scatter plot vs hex bin plot

Scatter plots show every observation; hex bin plots aggregate observations into hexagonal cells whose color encodes the count. Use a scatter for hundreds to a few thousand markers; switch to hex bins past that, where overplotting hides the underlying density.

Scatter plot

One mark per row. Outliers and individual points stay visible. Breaks down when markers overlap into a single solid blob.

  • Best for ~50 to ~5,000 observations
  • Outliers are obvious
  • Use alpha blending past ~1,000 points

Hex bin plot

Markers aggregated into hexagonal cells; cell color encodes the count. Density patterns become legible even with millions of points, but individual rows disappear.

  • Best past ~5,000 observations
  • Density patterns become readable
  • Outliers are smoothed away

Scatter plot vs line chart (correlation vs trend)

A scatter plot answers “how are X and Y related?”; a line chart answers “how does Y change over time (or another ordered variable)?” The visual cue is whether marks are connected. Connect only when the X-axis has a meaningful order.

Scatter plot

Unconnected markers. The X axis is any continuous variable. The reader’s eye searches for a cloud shape: rising, falling, curved, or random.

  • X is any continuous variable
  • No implied order between points
  • Highlights correlation and outliers

Line chart

Markers (or just a polyline) connected in X order. The X axis almost always represents time. The reader’s eye follows trend, slope, and inflection points.

  • X is ordered (usually time)
  • Connecting line implies sequence
  • Highlights trend and change

Scatter plot vs scatter plot matrix

A scatter plot shows one X–Y pair; a scatter plot matrix (also called a SPLOM or pair plot) shows every pairwise combination of several variables in a small-multiples grid. Reach for a SPLOM when you have three or more numeric variables and want to compare relationships at once.

Scatter plot

Single panel. One pair of variables. Best when you already know which two variables you want to compare.

  • One X, one Y
  • Easiest to read at large sizes
  • Best for a single hypothesis

Scatter plot matrix

Grid of small scatter plots, one per pair of variables. Useful for exploratory analysis when you don’t yet know which pairs are interesting.

  • Compares 3–10 variables at once
  • Diagonal often shows distributions
  • Each panel is small — use for screening

// 11Common mistakes

Mistakes to watch out for

Almost every misleading scatter plot in the wild fails the same handful of ways. If you only memorize six rules, make them these.

Overplotting (too many overlapping points)

When thousands of markers pile on top of each other, the chart becomes a solid blob and you cannot see the underlying density. The fix is the cheapest one in visualization: drop marker alpha to roughly 0.2–0.4. Past several thousand points, switch to a hex-bin or 2D density plot, or aggregate to one marker per group with whiskers.

Assuming correlation means causation

This is the most dangerous misinterpretation a scatter plot invites. A chart showing that ice cream sales and drownings rise together does not mean ice cream causes drowning — both rise with summer heat. Always look for confounding variables, write “associated with” instead of “causes” in your title, and resist the temptation to sell a story the chart cannot support.

Ignoring scale and aspect ratio

Stretching one axis can make a weak correlation look strong, and compressing it can make a strong one look weak. Keep aspect ratios proportional to the data range, never crop one axis to exaggerate a trend, and never start one axis at zero just to make the points look closer together. If the relationship is real it will survive an honest aspect ratio.

Fitting a straight line to curved data

If the cloud of points curves — say, in a U or an exponential — a linear regression line is the wrong summary. The reader will see a slope and conclude “small positive effect” when the truth is “strong non-linear effect.” Always look at the cloud shape first, and reach for a LOESS smoother or a polynomial fit when the data is curved.

Using too few data points

With fewer than ten observations, almost any pattern you see could be random noise. Even a beautiful upward trend through five points is one data collection bug away from disappearing. Wait for at least ~20 observations before drawing conclusions, and report sample size in the title or caption so the reader can calibrate trust.

Hiding subgroups inside one cloud

If the data contains two distinct groups (men/women, treatment/control, before/after), a single-color scatter can hide a Simpson’s paradox where the overall trend goes one way and each subgroup goes the other. Color or shape by the subgroup variable whenever there is a meaningful split, and always check at least one breakdown before trusting an aggregate trend line.

Encoding too many variables at once

A scatter that uses position for X and Y, color for one variable, shape for a second, and size for a third asks the reader to juggle five channels at once. The chart becomes a puzzle. If you find yourself reaching for the fourth channel, split the data into small multiples instead — the reader’s working memory will thank you.

// 12Accessibility

Accessibility checklist

Run through this list before publishing. The chart should still communicate its message to readers using assistive technology, color-blind users, keyboard navigation, and reduced-motion settings.

  • ✓

    Provide a text alternative for the chart

    WCAG 1.1.1
    Add an accessible name (alt text or aria-label) that summarizes the takeaway, not the chart type. “Scatter plot of two variables” is weak; “Study hours and exam scores show a positive correlation: each extra hour of study lined up with about six more exam points (R² = 0.71, n = 24)” is strong.
  • ✓

    Do not rely on color alone to encode groups

    WCAG 1.4.1
    If markers are color-coded by category, also vary their shape (circle, triangle, square, plus) so colorblind readers and grayscale printers can still tell groups apart. Roughly 1 in 12 men and 1 in 200 women have some form of color-vision deficiency.
  • ✓

    Marker contrast meets WCAG AA

    WCAG 1.4.3
    Marker fill against the chart background should reach at least 3:1 contrast for graphical objects, and any text labels (titles, axes, point annotations) should reach 4.5:1 for body text or 3:1 for large text.
  • ✓

    Expose the underlying data

    WCAG 1.3.1
    Place the raw X, Y (and group, if present) values in a screen-reader-friendly table next to or beneath the chart. For dense scatters, also expose a per-group summary table so screen-reader users have a navigable equivalent of what sighted readers see at a glance.
  • ✓

    Provide a trend line and R² text alternative

    WCAG 1.1.1
    When a regression or smoother is drawn, always print the slope and R² (or the correlation coefficient) in plain text near the chart. A reader who can’t see the line still needs to know how strong the relationship is.
  • ✓

    Focusable points with visible focus rings

    WCAG 2.4.7
    If the scatter is interactive, every marker should be reachable with the Tab key in a sensible order, expose its X, Y, and identifier through aria-label, and gain a visible focus ring (outline, glow, or enlarged stroke) when the keyboard lands on it. Tooltips must appear on focus, not only on hover.
  • ✓

    Respect prefers-reduced-motion

    WCAG 2.3.3
    If markers fade or fly into position, gate the animation behind a prefers-reduced-motion: no-preference media query so motion-sensitive readers see the final state immediately. Zoom and pan transitions should also be skippable.
  • ✓

    Make the chart resizable and zoomable

    WCAG 1.4.4
    Let the chart container scale with the viewport and stay legible at 200% browser zoom. Avoid baking the SVG to a fixed pixel size; use a responsive viewBox so axis ticks stay readable on narrow screens.
  • ✓

    Label both axes with units

    WCAG 3.3.2
    “$1.2k” is fine in display, but the axis title or a nearby caption must state the unit (“customer lifetime value, USD”) so a reader who can’t see the chart still understands what is being measured on each axis.

// 13Best practices

Design and craft tips

The mistakes section above tells you what to avoid. The list below is the positive version: the small set of habits that separate a good scatter plot from a passable one.

Do

Use alpha blending when points overlap

Drop marker alpha to roughly 0.2–0.4 so overlapping markers darken into local density. The eye recovers the underlying distribution that opaque markers would hide.
×Don’t

Connect the dots with a line

If you join scatter points with a line, you have implied a temporal or sequential order that doesn’t exist. Save line segments for connected scatter plots where the points are ordered in time.
Do

Show a trend line and its R²

A regression line summarizes the average relationship and the R² tells the reader how tightly the cloud follows it. Always show both, and use a smoother instead of a straight line when the cloud curves.
×Don’t

Force the axes to start at zero

Bar charts need a zero baseline; scatter plots don’t. Forcing both axes to zero often shoves the data into a corner and hides the relationship. Crop to the data range with a small visual margin.
Do

Encode subgroups with both color and shape

When markers are color-coded by category, give each group a distinct shape too (circle, triangle, square). Colorblind readers and grayscale prints can still tell groups apart.
×Don’t

Imply causation in the title

A scatter plot can show that two variables move together, never that one causes the other. Write “associated with” or “correlates with,” not “causes” or “drives,” unless you have an experimental result to back it up.
Do

Annotate two or three notable points

Label the headline observation, the most extreme outlier, and maybe one comparison case. A handful of pointed annotations directs attention; labelling every marker erases the signal.
×Don’t

Use a bubble chart by accident

Mapping marker size to a fourth variable is fine — if you mean it. If marker size is decorative or maps to nothing meaningful, set every marker to the same size; otherwise readers will read magnitude into the area.

// 15Tool instructions

How to build it in your tool of choice

Every major analysis tool ships a scatter plot. The recipes below get you to a clean, alpha-blended scatter plot with a sensible trend line in each of the most common platforms.

Microsoft Excel

Spreadsheet — ~3 min
  1. 01Place X values in column A and Y values in column B, with a header row. Both columns must be numeric.
  2. 02Highlight both columns including the headers.
  3. 03Open the Insert tab, choose Charts, then XY (Scatter), and pick the first preset (markers only, no connecting line).
  4. 04Right-click any marker and choose Format Data Series; under Marker Options set the size to ~5 and the fill transparency to 60–80% if points overlap.
  5. 05Right-click a marker again and choose Add Trendline → Linear; tick ‘Display Equation on chart’ and ‘Display R-squared value on chart’.
  6. 06Edit each axis title to include units (‘Study hours’, ‘Exam score / 100’) and replace the default chart title with the takeaway sentence.
  7. 07If you need a third variable, color markers by category using Format Data Series → Vary colors by point or duplicate the chart with conditional series.

Tip: avoid the ‘Scatter with Smooth Lines’ preset. Connecting the markers implies a temporal order that doesn’t exist for most scatter data.

Google Sheets

Spreadsheet — ~3 min
  1. 01Lay out your data with X values in the first column and Y values in the second column, with headers.
  2. 02Select the range, then choose Insert → Chart.
  3. 03In the Chart editor on the right, set Chart type to Scatter chart.
  4. 04Open Customize → Series and set the Point size to 5 and the Point opacity to 0.4 if your markers overlap.
  5. 05Under Customize → Series, tick the Trendline checkbox, choose Linear, and select ‘Use equation’ in the Label dropdown.
  6. 06Edit the chart title under Customize → Chart & axis titles → Chart title and write a takeaway, not just a label.
  7. 07For a third variable, switch to a Bubble chart type and map the third numeric column to bubble size.

Sheets has no built-in jitter. If many of your X values share the same value (e.g., integer scores), add tiny random noise (=A2 + RAND()*0.2 − 0.1) to a helper column and plot that.

Python (Matplotlib)

Code — ~5 min
  1. 01Install Matplotlib with pip install matplotlib (and numpy if it isn’t already in your environment).
  2. 02Import matplotlib.pyplot as plt and numpy as np, and load your X and Y arrays.
  3. 03Call plt.scatter(x, y, alpha=0.4, s=30) — alpha tames overplotting and s sets marker area in points².
  4. 04Add a regression line by computing slope, intercept = np.polyfit(x, y, 1) and plotting plt.plot(x, slope*x + intercept).
  5. 05Print the R² value with plt.text() so the reader can judge how tightly the cloud follows the line.
  6. 06Add plt.xlabel(), plt.ylabel() with units, and plt.title() with the takeaway sentence; call plt.tight_layout() before plt.show() or plt.savefig().
  7. 07For a third variable, pass c=group_array and a colormap to color by group, or use marker= to switch shapes per group.

Use ax.spines[['top','right']].set_visible(False) to drop the chart border for a cleaner, publication-ready look.

R (ggplot2)

Code — ~5 min
  1. 01Install ggplot2 with install.packages('ggplot2') and load it with library(ggplot2).
  2. 02Build a data frame with at least an x and a y column. Add a group column if you want to color or shape by category.
  3. 03Pass the data frame to ggplot(aes(x = x, y = y)) and add geom_point(alpha = 0.4) for the markers.
  4. 04Add geom_smooth(method = 'lm', se = TRUE) for a linear trend line with a confidence ribbon, or method = 'loess' for a smoother.
  5. 05Annotate the R² by computing it with summary(lm(y ~ x, data = d))$r.squared and adding annotate('text', …).
  6. 06Apply labs() with title, x, and y arguments (include units), then theme_minimal(base_size = 12) for a clean default.
  7. 07For a third variable, map color or shape inside aes(): aes(x = x, y = y, color = group, shape = group). Use scale_color_brewer() for colorblind-safe palettes.

ggplot2’s geom_jitter() is the easiest way to recover stacked points when the X axis has only a few unique values — swap it in for geom_point().

JavaScript (D3.js)

Code — ~10 min
  1. 01Install D3 (npm i d3) or include the CDN script tag in your HTML.
  2. 02Create an SVG container and set its viewBox, plus a margin object.
  3. 03Build linear scales for both axes with d3.scaleLinear().domain(d3.extent(data, d => d.x)).nice(); don’t force the domain to start at zero.
  4. 04Bind your data with selectAll('circle').data(data).join('circle') and set cx, cy from the scales, with a small radius (3–5 px) and fill-opacity ~0.5.
  5. 05Render the axes with d3.axisBottom() and d3.axisLeft(); add label <text> elements that include units.
  6. 06Compute and draw a regression line with d3.regressionLinear() from the d3-regression plugin, and print the resulting R² in a label.
  7. 07Make markers focusable: set tabindex=0, give each circle an aria-label like ‘x: 4 hours, y: 72 points’, and add a focus ring via :focus { stroke: black; stroke-width: 2px; }.

If you don’t need full control, Observable Plot, Plotly, ECharts, or Vega-Lite all give you a working scatter plot with tooltips and zoom in fewer lines.

Tableau

BI — ~4 min
  1. 01Connect to your data and drag the first numeric measure to the Columns shelf and the second to the Rows shelf. Tableau will aggregate by default.
  2. 02Disaggregate by toggling Analysis → Aggregate Measures off, so each row in the data becomes its own marker rather than a single mean.
  3. 03Change the Marks card to Circle and reduce opacity to ~50% under the Color property to handle overplotting.
  4. 04Drag a third dimension to the Color property of the Marks card to color markers by category, and to Shape if you want shapes too.
  5. 05Add a trend line by going to Analytics → Trend Line → Linear; right-click the line and choose Describe Trend Line to see R² and p-values.
  6. 06Right-click each axis and choose Edit Axis to crop the range to the data and add a unit-aware title.
  7. 07Use the Highlight tool or annotations to label two or three notable observations rather than every marker.

Tableau’s default ‘SUM aggregation’ is the single most common scatter mistake — it collapses every row into a single point. Always disaggregate first.

Power BI

BI — ~4 min
  1. 01In Power BI Desktop, open the Visualizations pane and choose the Scatter chart visual.
  2. 02Drag your X numeric measure to the X Axis well and your Y numeric measure to the Y Axis well.
  3. 03Drag a row identifier (an ID, country, or product key) to the Details well so each marker represents one observation rather than the aggregated total.
  4. 04Open the Format pane, expand Markers, and lower the marker fill transparency to ~50% to handle overplotting.
  5. 05Under Format → Analytics, add a Trend line and tick the option to display the equation and R² on the chart.
  6. 06Drag a categorical column to the Legend well to color markers by category, and to the Size well only if a meaningful third quantitative variable exists.
  7. 07Edit each axis under Format → X axis / Y axis to set Start and End to the data range plus a small margin, and label both with units.

If you forget the Details well, Power BI silently aggregates every row into one marker per legend group — just like Tableau. Always check the marker count.

// 16Code examples

Working code in the most common stacks

Three runnable snippets that produce the same chart — a scatter of 24 students’ study hours vs exam scores, with a linear trend line and the R² printed in the corner. Copy, paste, and replace the data with yours.

scatter_plot.py
import matplotlib.pyplot as plt
import numpy as np

# Study hours vs exam score for 24 students.
hours  = np.array([1.5, 2.0, 2.2, 2.8, 3.0, 3.3, 3.7, 4.1, 4.4, 4.6,
                   5.0, 5.2, 5.5, 5.8, 6.1, 6.4, 6.7, 7.0, 7.3, 7.6,
                   8.0, 8.4, 8.8, 9.2])
scores = np.array([42, 48, 51, 55, 53, 58, 62, 60, 66, 64,
                   68, 71, 70, 74, 76, 79, 81, 84, 82, 87,
                   89, 92, 91, 95])

fig, ax = plt.subplots(figsize=(8, 4.8))
ax.scatter(hours, scores, s=42, alpha=0.55,
           color="#c94a2e", edgecolor="#c94a2e", linewidth=0.8)

# Linear regression line + R-squared.
slope, intercept = np.polyfit(hours, scores, 1)
xs = np.linspace(hours.min(), hours.max(), 100)
ax.plot(xs, slope * xs + intercept,
        color="#1a1a18", linewidth=1.2, linestyle="--", alpha=0.7)

r2 = np.corrcoef(hours, scores)[0, 1] ** 2
ax.text(0.04, 0.92, f"R² = {r2:.2f}", transform=ax.transAxes,
        fontsize=11, color="#1a1a18")

ax.set_title("Each extra hour of study lined up with about six more exam points",
             loc="left", fontsize=13)
ax.set_xlabel("Study hours per week")
ax.set_ylabel("Exam score (out of 100)")
ax.spines[["top", "right"]].set_visible(False)
ax.set_xlim(hours.min() - 0.5, hours.max() + 0.5)
ax.set_ylim(scores.min() - 5, scores.max() + 5)

plt.tight_layout()
plt.savefig("scatter_plot.png", dpi=200)
plt.show()
$ python scatter_plot.py

// 17 — FAQs

Frequently asked questions

What is a scatter plot?+

A scatter plot (also called a scatter chart, scattergram, or XY plot) displays individual data points on a two-dimensional plane, where one variable is plotted on the horizontal axis and another on the vertical axis. Each dot represents a single observation or measurement, and the cloud of dots reveals whether the two variables are related, how strong the relationship is, and where the unusual cases sit.

When should you use a scatter plot?+

Use a scatter plot when you want to explore the relationship between two continuous variables — for example, study hours versus exam score, or advertising spend versus sales. Scatter plots are also the right choice when you need to spot outliers, check whether a linear regression model is appropriate, or compare clusters across colored or shaped subgroups.

When should you avoid a scatter plot?+

Avoid a scatter plot when one axis is categorical (countries, products, teams) — use a bar chart, dot plot, or strip plot instead. They are also a poor fit when you have fewer than ten observations (the pattern won’t be statistically meaningful), when both variables are categorical (use a mosaic plot), or when the data is so dense that points overlap into a single blob — reach for a hex bin or 2D density plot.

How is a scatter plot different from a line chart?+

A line chart connects observations in time order to emphasize trend; a scatter plot leaves the points unconnected to emphasize relationship. Use a line chart when the X-axis is time and the order matters; use a scatter plot when X is any continuous variable and you care about whether Y goes up, down, or stays flat as X changes.

How is a scatter plot different from a bubble chart?+

A scatter plot encodes two variables — one per axis. A bubble chart adds a third variable encoded as the area (not the radius) of each marker, and often a fourth variable as color. Use a scatter plot when you only have two continuous variables to compare; switch to a bubble chart when a meaningful third quantitative variable would otherwise be hidden.

How is a scatter plot different from a hex bin plot?+

Both show the relationship between two continuous variables, but a scatter plot draws one mark per row while a hex bin plot draws one hexagonal cell per local group of rows and color-codes the count. When you have more than a few thousand points and the marks overlap into a solid blob, swap the scatter for a hex bin so density becomes legible again.

What does correlation mean in a scatter plot?+

Correlation is a number between −1 and 1 that summarizes how tightly the two variables move together. A value near +1 means as X rises, Y reliably rises; a value near −1 means as X rises, Y reliably falls; a value near 0 means there is no consistent relationship. Most scatter plots are read with the Pearson correlation, but Spearman is more robust to outliers.

Does correlation prove causation?+

No. A scatter plot can only show that two variables move together; it cannot show why. Ice cream sales and drownings rise together every summer, but ice cream doesn’t cause drowning — hot weather drives both. Treat every scatter plot as a question (“why do these move together?”) rather than an answer.

How many data points do you need for a scatter plot?+

About 20 is a sensible minimum for visually trusting a pattern. With fewer than 10, almost any pattern you see could be random noise. Past several thousand, individual marks stop being legible and you should switch to alpha blending, jittering, or a density-based variant such as hex bin or 2D KDE.

Should the axes start at zero in a scatter plot?+

Not necessarily. Unlike a bar chart, a scatter plot encodes value with position, not length, so a non-zero baseline does not exaggerate ratios. Crop the axes to the data range so the cloud fills the chart — but always keep the visible range honest and label it clearly so readers know they are looking at a zoomed view.

What is Anscombe’s quartet?+

Anscombe’s quartet is a set of four datasets, published by statistician Frank Anscombe in 1973, that share nearly identical means, variances, correlation, and regression line — yet look completely different when plotted. It is the canonical reason why every analyst should plot the data, not just compute the summary statistics, before reporting a result.

How do you handle overplotting in a scatter plot?+

Three approaches solve overplotting in order of increasing effort: lower the marker alpha (e.g., 0.2) so density emerges from overlap, jitter near-identical values to reveal stacked points, or aggregate into a hex bin or 2D density plot when marker count exceeds ~5,000. Sampling a representative subset is a fourth option for exploratory work.

Can a scatter plot encode a third variable?+

Yes — by mapping a third variable to the marker color, shape, or size. Color works for categorical or sequential variables, shape works for small numbers of categorical groups, and size (i.e., a bubble chart) works for a positive quantitative variable. Only encode a third variable when its presence answers a question your reader actually has.

What category of chart is a scatter plot?+

Scatter Plot belongs to the Correlation family of charts. Charts in that family — bubble chart, hex bin plot, 2D density plot, scatter plot matrix — are all designed to answer how two or more continuous variables relate, so they often work as alternatives when one doesn’t quite fit your data.

What’s the best library for building scatter plots in code?+

For static, publication-quality scatter plots, Matplotlib (Python) and ggplot2 (R) are the standard choices. For interactive web scatter plots, D3.js gives the most control, while Plotly, Observable Plot, ECharts, and Vega-Lite get you to a working chart in fewer lines and ship reasonable defaults for tooltips and zoom.

// 18References

References and further reading

Primary sources, reference texts, and the official documentation for the libraries and tools referenced throughout this guide.

  • Encyclopedia entry covering the history, variants, and visual encoding of scatter plots, including the Herschel and Galton origin stories. A solid neutral starting point with citations.
    https://en.wikipedia.org/wiki/Scatter_plot
  • Galton’s original paper plotting parents’ against children’s heights — the first published scatter plot used for regression analysis. Hosted by galton.org.
    https://galton.org/essays/1880-1889/galton-1886-jaigi-regression-stature.pdf
  • The paper that introduced Anscombe’s quartet: four datasets with identical summary statistics but radically different scatter plots. The canonical reason to plot before you summarize.
    https://www.jstor.org/stable/2682899
  • Tufte’s foundational text on data graphics. The chapters on data-ink ratio and small multiples explain why plain scatters beat decorated ones, and motivate scatter plot matrices.
    https://www.edwardtufte.com/book/the-visual-display-of-quantitative-information/
  • Hands-on tutorial with real published examples. Especially useful for handling overplotting, choosing axis ranges, and adding meaningful annotations.
    https://academy.datawrapper.de/article/255-what-to-consider-when-creating-a-scatterplot
  • Open-source poster categorizing chart types by intent. Scatter plots, bubble charts, and connected scatters all sit in the Correlation family alongside hex bins and density plots.
    https://github.com/Financial-Times/chart-doctor/tree/main/visual-vocabulary
  • Web Accessibility Initiative guidance on making charts accessible: text alternatives, long descriptions, and data tables. Use this when building the accessibility checklist for a scatter plot.
    https://www.w3.org/WAI/tutorials/images/complex/
  • Official API reference for the Python scatter plot helper used in this guide’s code sample, including marker, alpha, and colormap arguments.
    https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html
  • Tidyverse documentation for the ggplot2 point geometry. Covers aesthetics, jittering, and combining with geom_smooth() for trend lines.
    https://ggplot2.tidyverse.org/reference/geom_point.html
  • Maintained Observable notebook from the D3 team that mirrors the JavaScript code sample in this guide, with worked examples for axes, voronoi tooltips, and zoom.
    https://observablehq.com/@d3/scatterplot
  • Landmark experimental study on which visual encodings let humans estimate quantities most accurately. Position along a common axis (the scatter plot’s native encoding) ranks at the top.
    https://www.jstor.org/stable/2288400