CorrelationIntermediate

Correlogram

A matrix of pairwise correlations that reveals how every variable in a dataset relates to every other — at a glance.

// 01 — The chart

What it looks like

Example — Iris flower measurements4 variables
SLSWPLPWSLSWPLPW1.00-0.120.870.821.00-0.12-0.43-0.371.000.87-0.430.961.000.82-0.370.96

A correlogram of four Iris dataset measurements. Strong positive correlations appear in deep red; weak or negative correlations in light tones.

// 02 — Definition

What is a correlogram?

A correlogram (also called a correlation matrix chart) is a visualization that displays the pairwise correlation coefficients between all variables in a dataset as a color-coded matrix. Each cell in the grid represents how strongly two variables are related, with colors and/or sizes encoding the strength and direction of the relationship.

The correlation coefficient ranges from −1 (perfect negative correlation) to +1 (perfect positive correlation), with 0 meaning no linear relationship. By arranging all variables along both axes, the correlogram lets analysts quickly scan dozens of relationships simultaneously.

Correlograms are a cornerstone of exploratory data analysis (EDA). They help data scientists decide which variables to investigate further, which predictors to include in models, and where multicollinearity might be a concern.

Key insight: A correlogram is symmetric around its diagonal — the cell at (A, B) always equals (B, A). The diagonal itself always shows perfect correlation (1.00) because every variable is perfectly correlated with itself.

// 03 — Anatomy

Parts of a correlogram

AB-0.12CDE
A — Row/column labels: Variable names along both axes of the symmetric matrix
B — Diagonal cells: Always 1.00 — each variable correlates perfectly with itself
C — Color intensity: Encodes correlation strength — deep color = strong, light = weak
D — Mirror symmetry: Upper and lower triangles are identical — some designs show only half
E — Coefficient label: Optional numeric value inside each cell for precise reading

// 04 — Usage

When to use it — and when not to

✓Use a correlogram when…
  • You have many numeric variables and want to scan all pairwise relationships at once
  • Performing exploratory data analysis (EDA) before modeling
  • Checking for multicollinearity among predictor variables
  • Deciding which variables to include in a regression or classification model
  • Communicating a high-level overview of variable relationships to stakeholders
  • Your audience understands correlation coefficients
×Avoid a correlogram when…
  • You have only 2–3 variables — a simple scatter plot is more informative
  • Relationships are non-linear — Pearson correlation will miss them entirely
  • You need to show causation, not just correlation
  • Your audience is non-technical — the matrix format can be intimidating
  • You have too many variables (50+) — the matrix becomes unreadable
  • Data contains mostly categorical variables — use a mosaic plot instead

// 05 — Reading guide

How to read a correlogram

Follow these steps whenever you encounter a correlogram.

1

Read the variable labels

Identify the variables along both axes. Since the matrix is symmetric, each variable appears on both the row and the column.

2

Locate the diagonal

The diagonal always shows 1.00 — every variable is perfectly correlated with itself. This serves as a visual anchor and color reference for the maximum value.

3

Scan for intense colors

The deepest-colored cells indicate the strongest correlations (positive or negative). These are the relationships worth investigating further.

4

Check the sign

If the chart uses a diverging color scale (e.g., red for positive, blue for negative), distinguish between variables that move together and those that move in opposite directions.

5

Note the coefficient values

If numeric values are displayed, use them for precision. A correlation of 0.85 is quite different from 0.50, even if the colors look similar on a saturated scale.

// 06 — Data format

What your data should look like

The input is typically a table where each column is a numeric variable and each row is an observation. The correlogram is computed from the correlation matrix (e.g., using Pearson, Spearman, or Kendall methods).

// Raw data table

| sepal_len | sepal_wid | petal_len | petal_wid |

|-----------|-----------|-----------|-----------|

| 5.1       | 3.5       | 1.4       | 0.2       |

| 7.0       | 3.2       | 4.7       | 1.4       |

| 6.3       | 3.3       | 6.0       | 2.5       |

// 07 — Construction

How to build one

1.

Compute the correlation matrix — calculate pairwise correlation coefficients (Pearson, Spearman, or Kendall) for all numeric variables.

2.

Choose a color scale — use a sequential scale for all-positive correlations, or a diverging scale (e.g., blue-white-red) if negative correlations matter.

3.

Map each cell — assign the correlation value to the color (and optionally the size) of each matrix cell.

4.

Label axes — place variable names along both rows and columns for reference.

5.

Optionally show only half — since the matrix is symmetric, you can display only the upper or lower triangle to reduce redundancy.

// 08 — Pitfalls

Common mistakes

Ignoring non-linear relationships

Pearson correlation only captures linear associations. A U-shaped or circular relationship can have r ≈ 0 while being strongly related.

Overloading with too many variables

A 50×50 matrix becomes a sea of colors with no actionable insight. Group or pre-filter variables first.

Using a poor color scale

A rainbow color scale makes it nearly impossible to compare magnitudes. Use a perceptually uniform sequential or diverging palette.

Confusing correlation with causation

A strong r-value between ice cream sales and drowning deaths doesn't mean one causes the other — both are driven by summer heat.

Not checking for outliers

A single extreme value can inflate or deflate Pearson r dramatically. Always pair correlograms with scatter plots for suspicious pairs.

// 09 — In the wild

Real-world examples

Finance

Correlograms of stock returns show which assets move together — essential for building diversified portfolios and understanding sector contagion during market downturns.

Genomics

Researchers use correlograms to identify co-expressed genes across thousands of samples, revealing regulatory networks and potential drug targets.

Marketing

Campaign analysts use correlograms to see which customer behavior metrics (page views, time on site, clicks, purchases) are correlated, helping them identify leading indicators of conversion.

// 10 — At a glance

Quick reference

Also known as

Correlation matrix chart, correlation heatmap

Category

Correlation

Typical data

Multiple numeric variables

Best for

Scanning all pairwise relationships

Difficulty

Intermediate

Key stat

Pearson r, Spearman ρ, or Kendall τ

// 11 — Accessibility

Making it accessible

Include numeric labels in each cell — color alone is insufficient for colorblind readers

Use a colorblind-safe palette (e.g., viridis or cividis) instead of red-green diverging scales

Provide a text summary or table alternative for screen readers

Add a clear color legend showing the correlation range

Use sufficient contrast between cell border and background for clarity

// 12 — Variations

Common variations

Half-matrix correlogram

Shows only the upper or lower triangle, removing redundant duplicate information.

Circle-size correlogram

Uses circle radius in each cell to encode correlation magnitude alongside color.

Clustered correlogram

Reorders rows and columns using hierarchical clustering to group similar variables together.

Ellipse correlogram

Uses oriented ellipses — the more elongated and tilted, the stronger the correlation.

// 13 — FAQs

Frequently asked questions

What is a correlogram?+

A correlogram (also called a correlation matrix chart) is a visualization that displays the pairwise correlation coefficients between all variables in a dataset as a color-coded matrix. Each cell in the grid represents how strongly two variables are related, with colors and/or sizes encoding the strength and direction of the relationship.

When should you use a correlogram?+

Use a correlogram when you have many numeric variables and want to scan all pairwise relationships at once. It also works well when performing exploratory data analysis (EDA) before modeling, and when checking for multicollinearity among predictor variables.

When should you avoid a correlogram?+

Avoid a correlogram when you have only 2–3 variables — a simple scatter plot is more informative. It is also a poor fit when relationships are non-linear — Pearson correlation will miss them entirely, or when you need to show causation, not just correlation.

What data do you need to make a correlogram?+

The input is typically a table where each column is a numeric variable and each row is an observation. The correlogram is computed from the correlation matrix (e.g., using Pearson, Spearman, or Kendall methods).

How is a correlogram different from a scatter plot?+

Both a correlogram and a scatter plot can look similar at first glance, but they answer different questions. Reach for a correlogram when the comparisons and patterns it was designed to reveal match what you need to communicate, and choose a scatter plot when its particular strengths better fit your data and audience.

What is another name for a correlogram?+

Correlogram is also known as Correlation matrix chart, correlation heatmap. The name varies between fields, but the visualisation technique is the same.

What size of dataset works best for a correlogram?+

Correlogram works best for Scanning all pairwise relationships. Outside that range the chart either looks empty or becomes too cluttered to read clearly.

Are correlograms accessible to screen readers?+

Yes — a correlogram can be made accessible to screen readers by pairing it with a clear text summary of the key insight, ensuring color choices meet WCAG contrast guidelines, adding descriptive alt text or aria-label to the SVG, and offering the underlying data as an HTML table fallback for assistive technologies.