ECDF Plot
The empirical cumulative distribution function — a step-function chart that answers “what percentage of data falls below this value?” without any binning.
// 01 — The chart
What it looks like
An ECDF of page load times. The curve rises from 0% to 100%, and the median (50th percentile) can be read directly from the chart at 1.5 seconds.
// 02 — Definition
What is an ECDF plot?
The empirical cumulative distribution function (ECDF) plot shows, for every value on the x-axis, the proportion of data points that are less than or equal to that value. The result is a monotonically increasing step function that starts near 0% on the left and reaches 100% on the right.
Unlike histograms and density plots, an ECDF requires no binning decisions. There is no bandwidth parameter or bin width to tune — the ECDF is a direct, unsmoothed representation of the data. This makes it uniquely objective: two analysts looking at the same dataset will always produce the identical ECDF.
ECDFs are the backbone of many statistical tests (Kolmogorov-Smirnov, Anderson-Darling) that compare observed data to theoretical distributions or to each other. They’re also the most direct way to read percentiles: draw a horizontal line at any proportion, and where it hits the curve tells you the corresponding value.
Key advantage: No binning means no arbitrary choices. The ECDF is the only distribution plot that shows the data exactly as it is, with no smoothing, grouping, or approximation.
// 03 — Anatomy
Parts of an ECDF plot
// 04 — Usage
When to use it — and when not to
- You need to read exact percentiles from the chart (median, P95, P99)
- You want to avoid binning decisions — ECDFs require no parameters
- Comparing two or more distributions — overlaid ECDFs make differences obvious
- Performing goodness-of-fit tests (K-S test) — the ECDF is the test statistic
- SLA monitoring — ‘what percentage of requests complete under 200ms?’ is a direct read
- Your audience is unfamiliar with cumulative functions — a histogram is more intuitive
- You want to see the distribution shape (peaks, skew) — density plots are better for that
- You have only a handful of data points — the step function looks jagged and sparse
- You need to see the mode — ECDFs don’t show peaks, only cumulative proportions
- Visual impact matters more than precision — ECDFs look dry compared to filled charts
// 05 — Reading guide
How to read an ECDF plot
ECDFs answer percentile questions directly — no calculation needed.
Pick a value on the x-axis
Draw a vertical line up to the curve. Where it meets the curve, read across to the y-axis — that’s the proportion of data at or below that value.
Pick a proportion on the y-axis
Draw a horizontal line from (say) 0.5 across to the curve. Where it hits, read down to the x-axis — that’s the median. Any proportion gives you the corresponding percentile.
Read the steepness
Where the curve rises steeply, data is densely packed. Where it flattens, data is sparse. A sudden vertical jump = many tied values.
Compare overlaid ECDFs
The curve that is consistently to the left has lower values. The vertical gap between two curves at any x-value shows how much their cumulative proportions differ.
Check for stochastic dominance
If one ECDF is always above or always below another, that distribution is systematically larger or smaller — useful for ranking treatment effects.
// 06 — Data format
What your data should look like
A single column of numeric values. No pre-aggregation needed — the ECDF is computed from raw data.
| load_time_ms |
|---|
| 320 |
| 480 |
| 510 |
| 750 |
| 1020 |
| 1480 |
| 2100 |
| 3200 |
Code sketch — Python
import seaborn as sns sns.ecdfplot(data=df, x="load_time_ms")
// 07 — Construction
How to build one, step by step
Sort the data values from smallest to largest.
Assign each value its cumulative proportion: the i-th value out of n gets proportion i/n.
Plot each (value, proportion) pair as a point.
Connect the points with horizontal-then-vertical step segments (or smooth if sample is large).
Set the y-axis from 0 to 1 (or 0% to 100%). The x-axis spans the data range.
Optionally overlay a theoretical CDF (e.g., normal) for comparison.
// 08 — Common mistakes
Mistakes to avoid
Confusing ECDF with density
The ECDF shows cumulative proportion, not frequency density. A steep section means high density — it’s the derivative, not the curve itself.
Not labeling the y-axis
Always label the y-axis as 'Proportion' or 'Cumulative %'. Without it, readers may misinterpret the chart as a time series.
Overplotting many ECDFs
Beyond 4–5 overlaid ECDFs, lines tangle. Use small multiples or color-code with a clear legend.
Interpolating between steps
With small samples, each step matters. Don’t smooth or interpolate — the step function IS the ECDF.
// 09 — In the wild
Real-world examples
Site reliability engineering
P50, P95, P99 latency SLAs are read directly from ECDF plots of request response times.
Clinical trials
Survival analysis uses the complement of the ECDF (the Kaplan-Meier curve) to estimate patient survival probabilities.
Quality assurance
Manufacturing uses ECDFs to check what proportion of products fall within tolerance limits.
// 10 — At a glance
Quick reference
Category
Distribution
Data type
Continuous numeric
Best for
Percentile reading
Parameters
None (bin-free)
Difficulty
Intermediate
Related test
K-S test
// 11 — Accessibility
Accessibility notes
Use a thick line (2–3px) for the ECDF step function so it remains visible at small sizes
When overlaying groups, use dash patterns in addition to color for colorblind accessibility
Add interactive tooltips showing exact (value, percentile) pairs on hover
Provide a companion table of key percentiles (P10, P25, P50, P75, P90, P95, P99)
Label the y-axis clearly as proportion or percentage to avoid misinterpretation
// 12 — Variations
Variations
Complementary ECDF (CCDF)
Plots 1 - ECDF, answering ‘what proportion exceeds this value?’ Common in survival analysis and tail-risk assessment.
ECDF with confidence band
Adds a shaded confidence region (e.g., Dvoretzky-Kiefer-Wolfowitz band) showing uncertainty around the empirical estimate.
Q-Q variant (ECDF vs. theoretical)
Overlays the theoretical CDF on the ECDF — departures show where the data diverges from the model.
Weighted ECDF
Each data point contributes a different weight to the cumulative sum — used in survey data and importance sampling.
// 13 — FAQs
Frequently asked questions
What is an ecdf plot?+
The empirical cumulative distribution function (ECDF) plot shows, for every value on the x-axis, the proportion of data points that are less than or equal to that value. The result is a monotonically increasing step function that starts near 0% on the left and reaches 100% on the right.
When should you use an ecdf plot?+
Use an ecdf plot when you need to read exact percentiles from the chart (median, P95, P99). It also works well when you want to avoid binning decisions — ECDFs require no parameters, and when comparing two or more distributions — overlaid ECDFs make differences obvious.
When should you avoid an ecdf plot?+
Avoid an ecdf plot when your audience is unfamiliar with cumulative functions — a histogram is more intuitive. It is also a poor fit when you want to see the distribution shape (peaks, skew) — density plots are better for that, or when you have only a handful of data points — the step function looks jagged and sparse.
What data do you need to make an ecdf plot?+
A single column of numeric values. No pre-aggregation needed — the ECDF is computed from raw data.
What size of dataset works best for an ecdf plot?+
ECDF Plot works best for Percentile reading. Outside that range the chart either looks empty or becomes too cluttered to read clearly.
Are ecdf plots accessible to screen readers?+
Yes — a ecdf plot can be made accessible to screen readers by pairing it with a clear text summary of the key insight, ensuring color choices meet WCAG contrast guidelines, adding descriptive alt text or aria-label to the SVG, and offering the underlying data as an HTML table fallback for assistive technologies.