DistributionIntermediate

P-P Plot

The probability-probability plot — compares cumulative probabilities rather than quantiles, making it more sensitive to deviations in the center of a distribution.

// 01 — The chart

What it looks like

Example — Normality check for regression residualsn = 200
Theoretical CDFEmpirical CDF1.001.0

A P-P plot of regression residuals against a normal distribution. Points close to the diagonal indicate a good fit. Deviations near the center are amplified compared to a Q-Q plot.

// 02 — Definition

What is a P-P plot?

A probability-probability (P-P) plot compares the cumulative distribution function (CDF) of your observed data against the CDF of a theoretical reference distribution. For each data point, it plots the empirical CDF value (proportion of data ≤ that value) on one axis against the theoretical CDF value on the other.

If your data matches the reference distribution, all points fall on the 45° diagonal line from (0,0) to (1,1). Unlike a Q-Q plot — which compares quantiles (values) — a P-P plot compares probabilities (proportions). This makes P-P plots more sensitive to deviations in the center of the distribution, while Q-Q plots are more sensitive to deviations in the tails.

Both axes of a P-P plot always range from 0 to 1, making it easy to see how much cumulative probability differs between the observed and expected distributions at any point.

P-P vs Q-Q: Use a P-P plot when you care about fit in the bulk of the data (center). Use a Q-Q plot when tail behavior matters most (e.g., risk analysis, outlier detection).

// 03 — Anatomy

Parts of a P-P plot

ABC: Both axes range 0 to 1
A — Data point: Each dot plots (theoretical CDF, empirical CDF) for one observed value
B — Reference line: The 45° diagonal from (0,0) to (1,1) — perfect distributional match
C — Probability axes: Both axes range from 0 to 1, showing cumulative probabilities

// 04 — Usage

When to use it — and when not to

✓Use a P-P plot when…
  • You want to assess fit in the center of the distribution, not the tails
  • Checking whether data follows a specific theoretical distribution
  • Comparing location and scale differences between observed and expected
  • You’re already familiar with Q-Q plots and want a complementary view
  • Your application cares about the bulk of the data (e.g., average performance, not extremes)
×Avoid a P-P plot when…
  • Tail behavior is your primary concern — use a Q-Q plot instead
  • Your audience is non-technical — P-P plots are even more abstract than Q-Q plots
  • You need to detect heavy tails or outliers — these are compressed near 0 and 1
  • You have very few data points — the plot will look sparse and inconclusive
  • A simple histogram or density plot would communicate the distribution better

// 05 — Reading guide

How to read a P-P plot

Like a Q-Q plot, the diagonal is your reference — but now you’re comparing probabilities, not values.

1

Points on the diagonal = good fit

If the empirical CDF matches the theoretical CDF at every point, all dots land on the 45° line.

2

Points above the line

The empirical CDF exceeds the theoretical — your data reaches higher cumulative probabilities faster, meaning more of your data falls in the lower range.

3

Points below the line

The theoretical CDF exceeds the empirical — data accumulates more slowly, meaning values tend to be larger than expected.

4

S-shaped curve = location shift

If the mean of your data differs from the reference, points trace an S-curve crossing the diagonal once.

5

Spread around the center

Unlike Q-Q plots, P-P plots compress tail deviations. Focus your attention on the middle of the plot (around 0.5, 0.5) for the most readable diagnostic information.

// 06 — Data format

What your data should look like

Same as a Q-Q plot: a single column of continuous numeric values.

residual
-1.82
-0.94
-0.31
0.12
0.55
0.98
1.67
2.41

Code sketch — Python

import scipy.stats as stats
import matplotlib.pyplot as plt
stats.probplot(residuals, dist="norm", plot=plt,
               fit=True, rvalue=True)
# For a true P-P plot, use statsmodels:
# from statsmodels.graphics.gofplots import ProbPlot
# pp = ProbPlot(residuals)
# pp.ppplot(line='45')

// 07 — Construction

How to build one, step by step

01.

Sort your data from smallest to largest.

02.

For each data point, compute its empirical CDF value: i/n (or (i - 0.5)/n for a continuity correction).

03.

For each data point, compute its theoretical CDF value using the reference distribution’s CDF function.

04.

Plot each pair: (theoretical CDF, empirical CDF) as a point.

05.

Draw the 45° reference line from (0,0) to (1,1).

06.

Both axes should range from 0 to 1 with equal scaling.

// 08 — Common mistakes

Mistakes to avoid

Confusing P-P with Q-Q

P-P plots compare probabilities (both axes 0–1); Q-Q plots compare quantiles (axes in data units). They reveal different kinds of departures.

Expecting tail sensitivity

P-P plots compress tail behavior near the corners (0,0) and (1,1). If tails matter, use a Q-Q plot instead.

Not standardizing data

P-P plots are sensitive to location and scale. If you’re testing distributional shape only, standardize your data first (subtract mean, divide by SD).

Unequal axis scaling

Both axes must have the same range (0 to 1) and the same physical length. Otherwise the reference line won’t appear at 45°, making interpretation impossible.

// 09 — In the wild

Real-world examples

01

Regression analysis

SPSS outputs both P-P and Q-Q plots of standardized residuals by default — P-P plots are the standard diagnostic in many social science workflows.

02

Insurance

Actuaries use P-P plots to validate that claim amount models fit well in the bulk of the distribution where most policies fall.

03

Environmental science

Comparing observed rainfall distributions against theoretical models (gamma, Weibull) to calibrate climate predictions.

// 10 — At a glance

Quick reference

Category

Distribution

Data type

Continuous numeric

Best for

Center-of-distribution fit

Axes range

0 to 1 (both)

Difficulty

Intermediate

Complement

Q-Q plot (tail focus)

// 11 — Accessibility

Accessibility notes

✓

Ensure the reference line is visually distinct from the data points (dashed line, different color or width)

✓

Add a text summary describing the overall fit quality and any systematic departures

✓

Provide a companion table of (empirical CDF, theoretical CDF) pairs for screen readers

✓

Use equal-length axes to preserve the 45° reference — unequal scaling misleads all readers

✓

Include tooltips showing each point’s empirical and theoretical probability values

// 12 — Variations

Variations

Detrended P-P plot

Subtracts the reference diagonal so departures are plotted as vertical distances from zero — amplifies subtle fit issues.

P-P plot with bands

Adds confidence bands around the diagonal to indicate acceptable random variation — points outside suggest significant departure.

Two-sample P-P plot

Compares empirical CDFs of two datasets against each other — no theoretical distribution needed.

Stabilized P-P plot

Uses a variance-stabilizing transformation to make the variance of departures constant across the plot — improves readability.

// 13 — FAQs

Frequently asked questions

What is a p-p plot?+

A probability-probability (P-P) plot compares the cumulative distribution function (CDF) of your observed data against the CDF of a theoretical reference distribution. For each data point, it plots the empirical CDF value (proportion of data ≤ that value) on one axis against the theoretical CDF value on the other.

When should you use a p-p plot?+

Use a p-p plot when you want to assess fit in the center of the distribution, not the tails. It also works well when checking whether data follows a specific theoretical distribution, and when comparing location and scale differences between observed and expected.

When should you avoid a p-p plot?+

Avoid a p-p plot when tail behavior is your primary concern — use a Q-Q plot instead. It is also a poor fit when your audience is non-technical — P-P plots are even more abstract than Q-Q plots, or when you need to detect heavy tails or outliers — these are compressed near 0 and 1.

What data do you need to make a p-p plot?+

Same as a Q-Q plot: a single column of continuous numeric values.

What size of dataset works best for a p-p plot?+

P-P Plot works best for Center-of-distribution fit. Outside that range the chart either looks empty or becomes too cluttered to read clearly.

Are p-p plots accessible to screen readers?+

Yes — a p-p plot can be made accessible to screen readers by pairing it with a clear text summary of the key insight, ensuring color choices meet WCAG contrast guidelines, adding descriptive alt text or aria-label to the SVG, and offering the underlying data as an HTML table fallback for assistive technologies.

Previous: Q-Q PlotUp next soon…