CorrelationIntermediate

Pair Plot

A grid of scatter plots for every variable combination, with distributions on the diagonal — the ultimate EDA tool for multivariate datasets.

// 01 — The chart

What it looks like

Example — Iris dataset3 variables × 3
Sepal lenPetal lenPetal wid

A 3×3 pair plot of Iris measurements. Diagonal cells show distributions; off-diagonal cells show scatter plots for each pair.

// 02 — Definition

What is a pair plot?

A pair plot (also called a scatterplot matrix or SPLOM) arranges scatter plots for every combination of variables in a grid. For n variables, it creates an n × n matrix where each off-diagonal cell shows a scatter plot of two variables, and each diagonal cell shows a univariate distribution (typically a histogram or KDE).

The pair plot is the Swiss Army knife of exploratory data analysis (EDA). In a single view, it reveals correlations, clusters, outliers, and distribution shapes across all variables simultaneously.

Unlike the correlogram (which reduces each relationship to a single number), the pair plot shows the actual data — non-linear relationships, clusters, and outliers are all visible.

Python users: The pair plot was popularized by the seaborn.pairplot() function, which creates a complete SPLOM with one line of code. R users know it as pairs() or GGally::ggpairs().

// 03 — Anatomy

Parts of a pair plot

ABCD
A — Variable labels: Each row and column corresponds to one variable in the dataset
B — Diagonal cell: Shows the univariate distribution (histogram or KDE) of each variable
C — Off-diagonal cell: A scatter plot showing the relationship between two variables
D — Lower triangle: Mirror of the upper triangle — some pair plots show different views (e.g., scatter below, correlation above)

// 04 — Usage

When to use it — and when not to

✓Use a pair plot when…
  • You want to see all pairwise relationships in a multivariate dataset
  • Performing EDA before building a model
  • Looking for non-linear relationships that a correlogram would miss
  • Color-coding by a categorical variable to find cluster separability
  • You have 3–10 numeric variables to compare
  • Checking for outliers across multiple dimensions simultaneously
×Avoid a pair plot when…
  • You have more than 10–12 variables — the grid becomes impossibly small
  • Your dataset has millions of rows — each scatter plot will be overplotted
  • You only need one specific pair — make a full-size scatter plot instead
  • The audience isn’t familiar with interpreting multiple small charts
  • Your variables are mostly categorical — use contingency tables
  • You need precise numeric summaries — a correlogram is more compact

// 05 — Reading guide

How to read a pair plot

Follow these steps whenever you encounter a pair plot.

1

Read the diagonal first

Each diagonal cell shows the distribution of one variable. Are they symmetric? Skewed? Bimodal? This tells you about each variable independently.

2

Scan for strong correlations

Look at off-diagonal cells for tight, elongated point clouds. A thin, diagonal band of points means strong correlation; a round cloud means weak or no correlation.

3

Look for clusters

If points naturally group into separate clouds (especially when color-coded), it suggests natural groupings in the data.

4

Check for non-linear patterns

Curved scatter plots reveal relationships that correlation coefficients miss — parabolas, thresholds, or logarithmic curves.

5

Identify outliers

Points far from the main cloud in any cell are potential outliers. Check if they’re outliers in multiple variable pairs.

// 06 — Data format

What your data should look like

// Tidy table — each column is a variable

| sepal_len | petal_len | petal_wid | species   |

|-----------|-----------|-----------|-----------|

| 5.1       | 1.4       | 0.2       | setosa    |

| 7.0       | 4.7       | 1.4       | versicolor|

// 07 — Construction

How to build one

1.

Select 3–10 numeric variables from your dataset.

2.

Create an n×n grid of subplot panels.

3.

For each diagonal cell (i, i), plot the distribution of variable i (histogram, KDE, or rug plot).

4.

For each off-diagonal cell (i, j), plot a scatter plot with variable j on X and variable i on Y.

5.

Optionally color-code points by a categorical variable and add a shared legend.

// 08 — Pitfalls

Common mistakes

Too many variables

A 15×15 pair plot has 225 panels — each one is too small to read. Limit to 8–10 variables maximum.

Overplotted scatter panels

With large datasets, individual points overlap into solid blobs. Use transparency, subsampling, or switch to hexbin/contour in each panel.

Ignoring the diagonal

The distributions along the diagonal contain crucial information about skewness, bimodality, and outliers. Don’t skip them.

Missing group coloring

If your data has a categorical variable (species, class, group), color-coding points reveals cluster separation across all pairs.

Inconsistent axis scales

Each panel should use consistent scales for comparison across rows and columns. Some tools handle this automatically; check yours.

// 09 — In the wild

Real-world examples

Machine learning

Data scientists use pair plots as the first step in feature engineering — identifying which features correlate, which separate classes, and which need transformation.

Biomedical research

Clinicians use pair plots to visualize patient biomarkers (blood pressure, cholesterol, BMI, glucose) and identify risk profiles across multiple health indicators.

Quality control

Manufacturing engineers plot multiple process parameters simultaneously to detect when two variables drift together — a sign of systemic process changes.

// 10 — At a glance

Quick reference

Also known as

Scatterplot matrix, SPLOM

Category

Correlation / EDA

Typical data

3–10 numeric variables

Best for

Multivariate exploration

Difficulty

Intermediate

Tools

seaborn.pairplot(), GGally::ggpairs()

// 11 — Accessibility

Making it accessible

Use colorblind-safe palettes when coloring by group

Ensure axis labels are readable even at small panel sizes

Provide a text summary of key relationships for screen readers

Use shapes (circle, triangle, square) in addition to color for group coding

Consider large-format printing or zoomed panels for presentations

// 12 — Variations

Common variations

Generalized pair plot

Shows different views in upper vs lower triangle — e.g., scatter below, correlation value above, KDE on diagonal.

Hexbin pair plot

Replaces scatter plots with hexbin plots in each panel for large datasets.

Regression pair plot

Adds fitted regression lines to each scatter panel for quick trend assessment.

Interactive pair plot

Brushing a region in one panel highlights the same observations in all other panels.

// 13 — FAQs

Frequently asked questions

What is a pair plot?+

A pair plot (also called a scatterplot matrix or SPLOM) arranges scatter plots for every combination of variables in a grid. For n variables, it creates an n × n matrix where each off-diagonal cell shows a scatter plot of two variables, and each diagonal cell shows a univariate distribution (typically a histogram or KDE).

When should you use a pair plot?+

Use a pair plot when you want to see all pairwise relationships in a multivariate dataset. It also works well when performing EDA before building a model, and when looking for non-linear relationships that a correlogram would miss.

When should you avoid a pair plot?+

Avoid a pair plot when you have more than 10–12 variables — the grid becomes impossibly small. It is also a poor fit when your dataset has millions of rows — each scatter plot will be overplotted, or when you only need one specific pair — make a full-size scatter plot instead.

What data do you need to make a pair plot?+

// Tidy table — each column is a variable

How is a pair plot different from a correlogram?+

Both a pair plot and a correlogram can look similar at first glance, but they answer different questions. Reach for a pair plot when the comparisons and patterns it was designed to reveal match what you need to communicate, and choose a correlogram when its particular strengths better fit your data and audience.

What is another name for a pair plot?+

Pair Plot is also known as Scatterplot matrix, SPLOM. The name varies between fields, but the visualisation technique is the same.

What size of dataset works best for a pair plot?+

Pair Plot works best for Multivariate exploration. Outside that range the chart either looks empty or becomes too cluttered to read clearly.

Are pair plots accessible to screen readers?+

Yes — a pair plot can be made accessible to screen readers by pairing it with a clear text summary of the key insight, ensuring color choices meet WCAG contrast guidelines, adding descriptive alt text or aria-label to the SVG, and offering the underlying data as an HTML table fallback for assistive technologies.