Precision-Recall Curve
A curve plotting Precision against Recall at various classification thresholds — the go-to evaluation tool when your positive class is rare and every detection matters.
// 01 — The chart
What it looks like
A Precision-Recall curve for a fraud detection model. The curve starts high (high precision at low recall) and descends as recall increases. The shaded area represents Average Precision (AP = 0.84). The horizontal dashed line shows the class prevalence baseline.
// 02 — Definition
What is a Precision-Recall curve?
A Precision-Recall (PR) curve plots Precision (the fraction of predicted positives that are truly positive) on the y-axis against Recall (the fraction of actual positives that are correctly detected) on the x-axis, as the classification threshold varies.
At a high threshold, the model is conservative: it predicts “positive” only when very confident, yielding high precision but low recall. As the threshold decreases, the model predicts more positives, increasing recall but typically reducing precision as more false positives creep in. The PR curve captures this fundamental trade-off.
The key summary metric is Average Precision (AP), which is the area under the PR curve. Unlike AUC-ROC, AP is highly sensitive to the minority class, making it the preferred metric for imbalanced datasets where positive examples are rare — fraud detection, disease screening, object detection in computer vision.
Key insight: ROC curves can paint an overly optimistic picture on imbalanced data because the False Positive Rate denominator (total negatives) is huge, masking even large numbers of false positives. PR curves use the number of predicted positives as the denominator for precision, making them far more sensitive to the minority class.
// 03 — Anatomy
Parts of a Precision-Recall curve
// 04 — Usage
When to use it — and when not to
- Your classes are highly imbalanced (e.g., 1% fraud, 99% legitimate)
- You care more about correctly identifying the positive class than the negative class
- Comparing models on tasks like fraud detection, disease screening, or object detection
- You want a metric (AP) that is sensitive to minority-class performance
- Standard ROC-AUC looks misleadingly high due to many true negatives
- You need to choose an operating point that balances precision and recall
- Classes are roughly balanced — ROC curves are simpler and equally informative
- You care about true negative rate (specificity) — PR curves ignore true negatives entirely
- You need a threshold-independent comparison and class balance is fine — use AUC-ROC
- Your problem is multi-class without a clear positive class
- You want to evaluate ranking quality rather than classification — use NDCG or MAP
- Stakeholders are more familiar with ROC curves and the class balance allows it
// 05 — Reading guide
How to read a Precision-Recall curve
Follow these steps whenever you encounter a Precision-Recall curve in the wild.
Note the axes
Recall (x-axis) goes from 0 to 1 and measures completeness — what fraction of all true positives did the model find? Precision (y-axis) measures exactness — of all items the model called positive, what fraction actually were?
Check the baseline
A horizontal line at y = class prevalence (e.g., 0.01 for 1% positive rate) represents a random classifier. Any curve above this line is doing better than random. The further above, the better.
Look at the top-left region
High precision at low recall (top-left) means the model’s most confident predictions are very accurate. This is critical for applications where false positives are costly (e.g., spam filtering).
Follow the curve to the right
As recall increases, precision typically drops. How quickly it drops reveals the model’s quality: a slow decline means the model maintains accuracy even as it casts a wider net. A steep drop means the model struggles to find more positives without generating many false positives.
Compare the Average Precision (AP)
AP summarizes the entire curve as a single number (the area under the PR curve). Higher is better. When comparing models, the one with the highest AP is generally preferred, but also check whether the curves cross — one model may excel at high precision while another excels at high recall.
// 06 — Pitfalls
Common mistakes
Using linear interpolation between operating points
Fix: PR curves should be interpolated using the step function or the all-points interpolation method. Linear interpolation overestimates performance. Use scikit-learn’s average_precision_score which handles this correctly.
Comparing AP across different prevalence rates
Fix: The baseline of a PR curve depends on class prevalence. A model with AP = 0.5 on 1% prevalence data is much better than AP = 0.5 on 50% prevalence data. Always report the positive class rate alongside AP.
Ignoring the curve shape and relying only on AP
Fix: Two models can have similar AP but very different curve shapes. One might have high precision at low recall (good for conservative systems) while another has moderate precision across all recall levels. Inspect the curve, not just the number.
Confusing PR curves with ROC curves
Fix: In PR space, the ideal point is the top-right corner (1,1), not the top-left. The random baseline is a horizontal line (not a diagonal). The axes measure different things. Don’t apply ROC intuitions to PR curves.
Not reporting confidence intervals
Fix: AP estimates vary with the test set. Use bootstrapped confidence intervals or cross-validation to show uncertainty. Report AP ± standard error rather than a single number.
// 07 — In the wild
Real-world examples
Object detection in computer vision
The COCO and PASCAL VOC benchmarks evaluate object detectors using mean Average Precision (mAP), which is the mean of AP scores across object classes. PR curves reveal how well a detector balances finding all objects (recall) against avoiding false detections (precision) at various confidence thresholds.
Medical screening for rare diseases
When screening for a disease with 0.1% prevalence, ROC curves can show AUC > 0.99 even for mediocre tests. PR curves reveal the true picture: precision may drop to 10% at high recall, meaning 90% of positive results are false alarms — critical information for clinicians.
Information retrieval and search engines
Search engines evaluate ranking quality using precision at different recall levels. The PR curve shows how many relevant documents the system retrieves (recall) and what fraction of retrieved documents are actually relevant (precision). The standard 11-point interpolated PR curve has been used in TREC evaluations since the 1990s.
// 08 — Quick reference
Key facts
// 09 — Variations
Types of Precision-Recall curves
PR analysis extends beyond the basic single-model curve in several important ways.
Multi-model comparison
Multiple PR curves on the same axes to compare classifiers. The model with the highest AP (curve closest to top-right) is generally preferred.
Step-function interpolation
The correct interpolation for PR curves uses horizontal and vertical steps rather than linear segments. This avoids overestimating the area under the curve.
PR curve with confidence band
A shaded band around the PR curve showing variability from bootstrapping. Essential for assessing whether differences in AP are statistically significant.
Per-class PR (multi-label)
In multi-label settings, a separate PR curve is computed for each class. Mean AP (mAP) across all classes gives an overall performance summary, standard in object detection benchmarks.
// 10 — FAQs
Frequently asked questions
What is a precision-recall curve?+
A Precision-Recall (PR) curve plots Precision (the fraction of predicted positives that are truly positive) on the y-axis against Recall (the fraction of actual positives that are correctly detected) on the x-axis, as the classification threshold varies.
When should you use a precision-recall curve?+
Use a precision-recall curve when your classes are highly imbalanced (e.g., 1% fraud, 99% legitimate). It also works well when you care more about correctly identifying the positive class than the negative class, and when comparing models on tasks like fraud detection, disease screening, or object detection.
When should you avoid a precision-recall curve?+
Avoid a precision-recall curve when classes are roughly balanced — ROC curves are simpler and equally informative. It is also a poor fit when you care about true negative rate (specificity) — PR curves ignore true negatives entirely, or when you need a threshold-independent comparison and class balance is fine — use AUC-ROC.
Is a precision-recall curve suitable for dashboards?+
Yes — a precision-recall curve can work well in dashboards as long as the panel is large enough for readers to perceive the encoded values, has a clear title, and includes the legend or axis labels needed to interpret it.
What category of chart is a precision-recall curve?+
Precision-Recall Curve belongs to the Statistical family of charts. Charts in that family are designed to answer the same kind of question, so they often work as alternatives when one doesn't quite fit your data.
How do you read a precision-recall curve?+
Start with the axis labels and legend, then look at the overall shape before zooming into individual marks. Compare prominent features against the rest of the data, and verify any conclusion against the underlying numbers when precision matters.