Home/Chart Types/Statistical/Precision-recall curve

StatisticalIntermediate

Precision-Recall Curve

A curve plotting Precision against Recall at various classification thresholds — the go-to evaluation tool when your positive class is rare and every detection matters.

// 01 — The chart

What it looks like

Example — Fraud detection modelAP = 0.84

A Precision-Recall curve for a fraud detection model. The curve starts high (high precision at low recall) and descends as recall increases. The shaded area represents Average Precision (AP = 0.84). The horizontal dashed line shows the class prevalence baseline.

// 02 — Definition

What is a Precision-Recall curve?

A Precision-Recall (PR) curve plots Precision (the fraction of predicted positives that are truly positive) on the y-axis against Recall (the fraction of actual positives that are correctly detected) on the x-axis, as the classification threshold varies.

At a high threshold, the model is conservative: it predicts “positive” only when very confident, yielding high precision but low recall. As the threshold decreases, the model predicts more positives, increasing recall but typically reducing precision as more false positives creep in. The PR curve captures this fundamental trade-off.

The key summary metric is Average Precision (AP), which is the area under the PR curve. Unlike AUC-ROC, AP is highly sensitive to the minority class, making it the preferred metric for imbalanced datasets where positive examples are rare — fraud detection, disease screening, object detection in computer vision.

Key insight: ROC curves can paint an overly optimistic picture on imbalanced data because the False Positive Rate denominator (total negatives) is huge, masking even large numbers of false positives. PR curves use the number of predicted positives as the denominator for precision, making them far more sensitive to the minority class.

// 03 — Anatomy

Parts of a Precision-Recall curve

A — Y-axis (Precision): The fraction of predicted positives that are actually positive — also called positive predictive value

B — X-axis (Recall): The fraction of actual positives that were correctly detected — also called sensitivity or true positive rate

C — PR curve: The curve traced by plotting precision vs recall at every possible classification threshold

D — Baseline (class prevalence): A horizontal line at y = prevalence — the precision of a random classifier that predicts positive at any rate

E — Perfect classifier point: The top-right corner (1,1) where both precision and recall equal 1.0 — the ideal operating point

// 04 — Usage

When to use it — and when not to

&check;Use a PR curve when…

Your classes are highly imbalanced (e.g., 1% fraud, 99% legitimate)
You care more about correctly identifying the positive class than the negative class
Comparing models on tasks like fraud detection, disease screening, or object detection
You want a metric (AP) that is sensitive to minority-class performance
Standard ROC-AUC looks misleadingly high due to many true negatives
You need to choose an operating point that balances precision and recall

×Avoid a PR curve when…

Classes are roughly balanced — ROC curves are simpler and equally informative
You care about true negative rate (specificity) — PR curves ignore true negatives entirely
You need a threshold-independent comparison and class balance is fine — use AUC-ROC
Your problem is multi-class without a clear positive class
You want to evaluate ranking quality rather than classification — use NDCG or MAP
Stakeholders are more familiar with ROC curves and the class balance allows it

// 05 — Reading guide

How to read a Precision-Recall curve

Follow these steps whenever you encounter a Precision-Recall curve in the wild.

Note the axes

Recall (x-axis) goes from 0 to 1 and measures completeness — what fraction of all true positives did the model find? Precision (y-axis) measures exactness — of all items the model called positive, what fraction actually were?

Check the baseline

A horizontal line at y = class prevalence (e.g., 0.01 for 1% positive rate) represents a random classifier. Any curve above this line is doing better than random. The further above, the better.

Look at the top-left region

High precision at low recall (top-left) means the model’s most confident predictions are very accurate. This is critical for applications where false positives are costly (e.g., spam filtering).

Follow the curve to the right

As recall increases, precision typically drops. How quickly it drops reveals the model’s quality: a slow decline means the model maintains accuracy even as it casts a wider net. A steep drop means the model struggles to find more positives without generating many false positives.

Compare the Average Precision (AP)

AP summarizes the entire curve as a single number (the area under the PR curve). Higher is better. When comparing models, the one with the highest AP is generally preferred, but also check whether the curves cross — one model may excel at high precision while another excels at high recall.

// 06 — Pitfalls

Common mistakes

Using linear interpolation between operating points

Fix: PR curves should be interpolated using the step function or the all-points interpolation method. Linear interpolation overestimates performance. Use scikit-learn’s average_precision_score which handles this correctly.

Comparing AP across different prevalence rates

Fix: The baseline of a PR curve depends on class prevalence. A model with AP = 0.5 on 1% prevalence data is much better than AP = 0.5 on 50% prevalence data. Always report the positive class rate alongside AP.

Ignoring the curve shape and relying only on AP

Fix: Two models can have similar AP but very different curve shapes. One might have high precision at low recall (good for conservative systems) while another has moderate precision across all recall levels. Inspect the curve, not just the number.

Confusing PR curves with ROC curves

Fix: In PR space, the ideal point is the top-right corner (1,1), not the top-left. The random baseline is a horizontal line (not a diagonal). The axes measure different things. Don’t apply ROC intuitions to PR curves.

Not reporting confidence intervals

Fix: AP estimates vary with the test set. Use bootstrapped confidence intervals or cross-validation to show uncertainty. Report AP ± standard error rather than a single number.

// 07 — In the wild

Real-world examples

Object detection in computer vision

The COCO and PASCAL VOC benchmarks evaluate object detectors using mean Average Precision (mAP), which is the mean of AP scores across object classes. PR curves reveal how well a detector balances finding all objects (recall) against avoiding false detections (precision) at various confidence thresholds.

Medical screening for rare diseases

When screening for a disease with 0.1% prevalence, ROC curves can show AUC > 0.99 even for mediocre tests. PR curves reveal the true picture: precision may drop to 10% at high recall, meaning 90% of positive results are false alarms — critical information for clinicians.

Information retrieval and search engines

Search engines evaluate ranking quality using precision at different recall levels. The PR curve shows how many relevant documents the system retrieves (recall) and what fraction of retrieved documents are actually relevant (precision). The standard 11-point interpolated PR curve has been used in TREC evaluations since the 1990s.

// 08 — Quick reference

Key facts

Also known asPR curve, Precision vs Recall plot

Best forEvaluating classifiers on imbalanced datasets

Data typesPredicted scores and binary ground-truth labels

Key metricAP (Average Precision) — area under the PR curve

Perfect scoreAP = 1.0 (curve stays at top-right corner)

Random baselineHorizontal line at y = class prevalence

Common toolsscikit-learn, R (PRROC), matplotlib, TensorBoard

Common mistakesLinear interpolation, ignoring prevalence, no confidence intervals

// 09 — Variations

Types of Precision-Recall curves

PR analysis extends beyond the basic single-model curve in several important ways.

Multi-model comparison

Multiple PR curves on the same axes to compare classifiers. The model with the highest AP (curve closest to top-right) is generally preferred.

Step-function interpolation

The correct interpolation for PR curves uses horizontal and vertical steps rather than linear segments. This avoids overestimating the area under the curve.

PR curve with confidence band

A shaded band around the PR curve showing variability from bootstrapping. Essential for assessing whether differences in AP are statistically significant.

Per-class PR (multi-label)

In multi-label settings, a separate PR curve is computed for each class. Mean AP (mAP) across all classes gives an overall performance summary, standard in object detection benchmarks.

// 10 — FAQs

Frequently asked questions

What is a precision-recall curve?+

A Precision-Recall (PR) curve plots Precision (the fraction of predicted positives that are truly positive) on the y-axis against Recall (the fraction of actual positives that are correctly detected) on the x-axis, as the classification threshold varies.

When should you use a precision-recall curve?+

Use a precision-recall curve when your classes are highly imbalanced (e.g., 1% fraud, 99% legitimate). It also works well when you care more about correctly identifying the positive class than the negative class, and when comparing models on tasks like fraud detection, disease screening, or object detection.

When should you avoid a precision-recall curve?+

Avoid a precision-recall curve when classes are roughly balanced — ROC curves are simpler and equally informative. It is also a poor fit when you care about true negative rate (specificity) — PR curves ignore true negatives entirely, or when you need a threshold-independent comparison and class balance is fine — use AUC-ROC.

Is a precision-recall curve suitable for dashboards?+

Yes — a precision-recall curve can work well in dashboards as long as the panel is large enough for readers to perceive the encoded values, has a clear title, and includes the legend or axis labels needed to interpret it.

What category of chart is a precision-recall curve?+

Precision-Recall Curve belongs to the Statistical family of charts. Charts in that family are designed to answer the same kind of question, so they often work as alternatives when one doesn't quite fit your data.

How do you read a precision-recall curve?+

Start with the axis labels and legend, then look at the overall shape before zooming into individual marks. Compare prominent features against the rest of the data, and verify any conclusion against the underlying numbers when precision matters.

← Previous: ROC Curve

1 of 80+ chart types

Next: Bland-Altman Plot →