P-P Plot
The probability-probability plot — compares cumulative probabilities rather than quantiles, making it more sensitive to deviations in the center of a distribution.
// 01 — The chart
What it looks like
A P-P plot of regression residuals against a normal distribution. Points close to the diagonal indicate a good fit. Deviations near the center are amplified compared to a Q-Q plot.
// 02 — Definition
What is a P-P plot?
A probability-probability (P-P) plot compares the cumulative distribution function (CDF) of your observed data against the CDF of a theoretical reference distribution. For each data point, it plots the empirical CDF value (proportion of data ≤ that value) on one axis against the theoretical CDF value on the other.
If your data matches the reference distribution, all points fall on the 45° diagonal line from (0,0) to (1,1). Unlike a Q-Q plot — which compares quantiles (values) — a P-P plot compares probabilities (proportions). This makes P-P plots more sensitive to deviations in the center of the distribution, while Q-Q plots are more sensitive to deviations in the tails.
Both axes of a P-P plot always range from 0 to 1, making it easy to see how much cumulative probability differs between the observed and expected distributions at any point.
P-P vs Q-Q: Use a P-P plot when you care about fit in the bulk of the data (center). Use a Q-Q plot when tail behavior matters most (e.g., risk analysis, outlier detection).
// 03 — Anatomy
Parts of a P-P plot
// 04 — Usage
When to use it — and when not to
- You want to assess fit in the center of the distribution, not the tails
- Checking whether data follows a specific theoretical distribution
- Comparing location and scale differences between observed and expected
- You’re already familiar with Q-Q plots and want a complementary view
- Your application cares about the bulk of the data (e.g., average performance, not extremes)
- Tail behavior is your primary concern — use a Q-Q plot instead
- Your audience is non-technical — P-P plots are even more abstract than Q-Q plots
- You need to detect heavy tails or outliers — these are compressed near 0 and 1
- You have very few data points — the plot will look sparse and inconclusive
- A simple histogram or density plot would communicate the distribution better
// 05 — Reading guide
How to read a P-P plot
Like a Q-Q plot, the diagonal is your reference — but now you’re comparing probabilities, not values.
Points on the diagonal = good fit
If the empirical CDF matches the theoretical CDF at every point, all dots land on the 45° line.
Points above the line
The empirical CDF exceeds the theoretical — your data reaches higher cumulative probabilities faster, meaning more of your data falls in the lower range.
Points below the line
The theoretical CDF exceeds the empirical — data accumulates more slowly, meaning values tend to be larger than expected.
S-shaped curve = location shift
If the mean of your data differs from the reference, points trace an S-curve crossing the diagonal once.
Spread around the center
Unlike Q-Q plots, P-P plots compress tail deviations. Focus your attention on the middle of the plot (around 0.5, 0.5) for the most readable diagnostic information.
// 06 — Data format
What your data should look like
Same as a Q-Q plot: a single column of continuous numeric values.
| residual |
|---|
| -1.82 |
| -0.94 |
| -0.31 |
| 0.12 |
| 0.55 |
| 0.98 |
| 1.67 |
| 2.41 |
Code sketch — Python
import scipy.stats as stats
import matplotlib.pyplot as plt
stats.probplot(residuals, dist="norm", plot=plt,
fit=True, rvalue=True)
# For a true P-P plot, use statsmodels:
# from statsmodels.graphics.gofplots import ProbPlot
# pp = ProbPlot(residuals)
# pp.ppplot(line='45')// 07 — Construction
How to build one, step by step
Sort your data from smallest to largest.
For each data point, compute its empirical CDF value: i/n (or (i - 0.5)/n for a continuity correction).
For each data point, compute its theoretical CDF value using the reference distribution’s CDF function.
Plot each pair: (theoretical CDF, empirical CDF) as a point.
Draw the 45° reference line from (0,0) to (1,1).
Both axes should range from 0 to 1 with equal scaling.
// 08 — Common mistakes
Mistakes to avoid
Confusing P-P with Q-Q
P-P plots compare probabilities (both axes 0–1); Q-Q plots compare quantiles (axes in data units). They reveal different kinds of departures.
Expecting tail sensitivity
P-P plots compress tail behavior near the corners (0,0) and (1,1). If tails matter, use a Q-Q plot instead.
Not standardizing data
P-P plots are sensitive to location and scale. If you’re testing distributional shape only, standardize your data first (subtract mean, divide by SD).
Unequal axis scaling
Both axes must have the same range (0 to 1) and the same physical length. Otherwise the reference line won’t appear at 45°, making interpretation impossible.
// 09 — In the wild
Real-world examples
Regression analysis
SPSS outputs both P-P and Q-Q plots of standardized residuals by default — P-P plots are the standard diagnostic in many social science workflows.
Insurance
Actuaries use P-P plots to validate that claim amount models fit well in the bulk of the distribution where most policies fall.
Environmental science
Comparing observed rainfall distributions against theoretical models (gamma, Weibull) to calibrate climate predictions.
// 10 — At a glance
Quick reference
Category
Distribution
Data type
Continuous numeric
Best for
Center-of-distribution fit
Axes range
0 to 1 (both)
Difficulty
Intermediate
Complement
Q-Q plot (tail focus)
// 11 — Accessibility
Accessibility notes
Ensure the reference line is visually distinct from the data points (dashed line, different color or width)
Add a text summary describing the overall fit quality and any systematic departures
Provide a companion table of (empirical CDF, theoretical CDF) pairs for screen readers
Use equal-length axes to preserve the 45° reference — unequal scaling misleads all readers
Include tooltips showing each point’s empirical and theoretical probability values
// 12 — Variations
Variations
Detrended P-P plot
Subtracts the reference diagonal so departures are plotted as vertical distances from zero — amplifies subtle fit issues.
P-P plot with bands
Adds confidence bands around the diagonal to indicate acceptable random variation — points outside suggest significant departure.
Two-sample P-P plot
Compares empirical CDFs of two datasets against each other — no theoretical distribution needed.
Stabilized P-P plot
Uses a variance-stabilizing transformation to make the variance of departures constant across the plot — improves readability.
// 13 — FAQs
Frequently asked questions
What is a p-p plot?+
A probability-probability (P-P) plot compares the cumulative distribution function (CDF) of your observed data against the CDF of a theoretical reference distribution. For each data point, it plots the empirical CDF value (proportion of data ≤ that value) on one axis against the theoretical CDF value on the other.
When should you use a p-p plot?+
Use a p-p plot when you want to assess fit in the center of the distribution, not the tails. It also works well when checking whether data follows a specific theoretical distribution, and when comparing location and scale differences between observed and expected.
When should you avoid a p-p plot?+
Avoid a p-p plot when tail behavior is your primary concern — use a Q-Q plot instead. It is also a poor fit when your audience is non-technical — P-P plots are even more abstract than Q-Q plots, or when you need to detect heavy tails or outliers — these are compressed near 0 and 1.
What data do you need to make a p-p plot?+
Same as a Q-Q plot: a single column of continuous numeric values.
What size of dataset works best for a p-p plot?+
P-P Plot works best for Center-of-distribution fit. Outside that range the chart either looks empty or becomes too cluttered to read clearly.
Are p-p plots accessible to screen readers?+
Yes — a p-p plot can be made accessible to screen readers by pairing it with a clear text summary of the key insight, ensuring color choices meet WCAG contrast guidelines, adding descriptive alt text or aria-label to the SVG, and offering the underlying data as an HTML table fallback for assistive technologies.