Home/Chart Types/Scientific/Manhattan plot
ScientificAdvanced

Manhattan Plot

A chart that displays statistical significance (−log10 p-values) of genetic variants across chromosomal positions — the skyscrapers of the genome, revealing which regions harbor disease-associated signals.

// 01 — The chart

What it looks like

Example — GWAS for type 2 diabetes22 chromosomes
1296310−log10(p)Chr1Chr2Chr3Chr4Chr5Chr6Chr7Chr8Chr9Chr10p = 5×10⁻⁸TCF7L2 locus

A Manhattan plot from a genome-wide association study for type 2 diabetes. Each dot represents a genetic variant; dots above the red significance threshold line indicate genome-wide significant associations.

// 02 — Definition

What is a Manhattan plot?

A Manhattan plot is a specialized scatter plot used primarily in genome-wide association studies (GWAS) to display the statistical significance of genetic variants across the entire genome. The name comes from the plot’s resemblance to the Manhattan skyline — most points cluster near the ground (non-significant), while a few peaks rise dramatically like skyscrapers.

The X-axis represents genomic position, organized by chromosome from left to right. The Y-axis shows the −log10(p-value) of each tested variant. By taking the negative logarithm, small p-values (highly significant results) are pushed upward, making them visually prominent.

A horizontal threshold line — typically at p = 5 × 10−8 — marks the genome-wide significance level. Variants exceeding this threshold are considered statistically significant after correcting for millions of simultaneous tests. These “peaks” point to genomic regions that may contain disease-associated genes.

Origin: The Manhattan plot became widely used in the mid-2000s as genome-wide association studies emerged. The first major GWAS was published in 2005 for age-related macular degeneration. The plot quickly became the standard way to present GWAS results, named for its skyline-like appearance.

// 03 — Anatomy

Parts of a Manhattan plot

ABCDE
A — Y-axis (−log₁₀ p-value): The vertical axis showing transformed p-values; higher values indicate greater statistical significance
B — X-axis (genomic position): The horizontal axis showing chromosomal position, organized sequentially from chromosome 1 to 22 (plus X)
C — Significant peaks: Variants exceeding the genome-wide significance threshold, indicating potential disease-associated loci
D — Significance threshold: Horizontal dashed line at p = 5×10⁻⁸, the standard threshold for genome-wide significance
E — Non-significant variants: The bulk of tested variants that fall below the significance threshold, forming the baseline scatter

// 04 — Usage

When to use it — and when not to

✓Use a Manhattan plot when…
  • Presenting results from a genome-wide association study (GWAS)
  • You need to show statistical significance across the entire genome simultaneously
  • Identifying which chromosomal regions contain significant genetic associations
  • Comparing signal strength across thousands to millions of tested variants
  • Your audience is familiar with genomics and statistical testing
  • You want to visually highlight loci that pass genome-wide significance
×Avoid a Manhattan plot when…
  • Your data is not organized by genomic position — use a standard scatter plot
  • You have only a small number of tested variants — a forest plot or table may be clearer
  • Your audience is not familiar with genomics or p-value transformations
  • You want to show effect sizes — use a volcano plot or forest plot instead
  • You need to compare results between two GWAS — consider a Miami plot
  • The data lacks chromosomal position information

// 05 — Reading guide

How to read a Manhattan plot

Follow these steps to interpret a Manhattan plot from a GWAS study.

1

Identify the significance threshold

Look for a horizontal dashed line, usually at −log₁₀(p) ≈ 7.3, which corresponds to p = 5×10⁻⁸. This is the genome-wide significance threshold that accounts for testing millions of variants.

2

Scan for peaks above the threshold

Points rising above the significance line represent genomic loci with strong statistical evidence of association. These are your primary findings — count how many independent peaks exist.

3

Note chromosome locations

Check which chromosomes harbor significant signals. Alternating colors help distinguish adjacent chromosomes. The chromosomal location tells you where to look for candidate genes.

4

Assess signal strength by peak height

Taller peaks indicate stronger statistical significance. A peak at −log₁₀(p) = 12 is far more significant than one at 8. Very tall peaks often replicate across independent studies.

5

Look for suggestive associations

Variants just below the genome-wide threshold (often above a suggestive line at p = 1×10⁻⁵) may represent true associations that need larger sample sizes to confirm. These are worth noting for follow-up studies.

// 06 — Common mistakes

Mistakes to watch out for

Using the wrong significance threshold

The standard genome-wide significance threshold is p = 5×10⁻⁸, which accounts for approximately one million independent tests across the genome. Using a less stringent threshold (e.g., p = 0.05) will produce thousands of false positives.

Ignoring population stratification

If cases and controls come from different ancestral backgrounds, systematic allele frequency differences can inflate test statistics across the genome. This produces a uniformly elevated baseline rather than distinct peaks. Always check the QQ plot for inflation.

Overplotting without transparency

With millions of data points, opaque dots create a solid mass that obscures the true density pattern. Use transparency (alpha blending) and consider thinning non-significant variants to improve readability.

Truncating the Y-axis too aggressively

Capping the Y-axis at a low value can hide the true strength of your most significant signals. While some truncation is acceptable for very extreme p-values, always note when values exceed the displayed range.

Not labeling significant loci

A Manhattan plot without gene annotations at significant peaks forces readers to cross-reference chromosomal positions manually. Always label the nearest gene(s) at each significant locus for interpretability.

// 07 — Real-world examples

Where you’ll see Manhattan plots used

01

GWAS for complex diseases (diabetes, heart disease)

Large-scale genetic studies testing millions of variants across hundreds of thousands of individuals. Manhattan plots reveal which genomic regions harbor disease-risk variants, guiding drug target discovery.

Medical Genetics
02

Pharmacogenomics drug response studies

Identifying genetic variants that influence how patients respond to specific medications. Manhattan plots help visualize which genes affect drug metabolism, efficacy, or adverse reactions.

Pharmacology
03

Agricultural trait mapping in crops and livestock

Plant and animal breeders use GWAS to find genetic markers linked to yield, disease resistance, or quality traits. Manhattan plots guide marker-assisted selection programs.

Agricultural Science

// 08 — At a glance

Quick reference

Also known asGWAS plot, genome-wide significance plot
First usedMid-2000s, with the rise of genome-wide association studies
Best forDisplaying statistical significance of genetic variants across the genome
Data typesGenomic position on X-axis, −log₁₀(p-value) on Y-axis
Significance thresholdp = 5×10⁻⁸ (genome-wide), p = 1×10⁻⁵ (suggestive)
Typical data size500,000 to 10+ million variants
Common toolsR (qqman, CMplot), Python (matplotlib, gwaspy), PLINK, LocusZoom
Common mistakesWrong threshold, ignoring stratification, overplotting, missing labels

// 09 — Variations

Types of Manhattan plots

Several variants of the Manhattan plot exist for specific analytical needs.

Miami plot

Two Manhattan plots mirrored vertically, comparing two GWAS results (e.g., two traits or ancestries) in a single display.

Circular Manhattan plot

Arranges chromosomes in a circle rather than a line. Useful for multi-trait comparisons and publications with limited horizontal space.

LocusZoom plot

A zoomed-in view of a single significant locus, showing linkage disequilibrium (LD) patterns and nearby genes in high resolution.

Multi-trait Manhattan plot

Overlays results from multiple GWAS on the same plot using color coding or layered tracks to compare signals across traits.

// 10 — FAQs

Frequently asked questions

What is a manhattan plot?+

A Manhattan plot is a specialized scatter plot used primarily in genome-wide association studies (GWAS) to display the statistical significance of genetic variants across the entire genome. The name comes from the plot's resemblance to the Manhattan skyline — most points cluster near the ground (non-significant), while a few peaks rise dramatically like skyscrapers.

When should you use a manhattan plot?+

Use a manhattan plot when presenting results from a genome-wide association study (GWAS). It also works well when you need to show statistical significance across the entire genome simultaneously, and when identifying which chromosomal regions contain significant genetic associations.

When should you avoid a manhattan plot?+

Avoid a manhattan plot when your data is not organized by genomic position — use a standard scatter plot. It is also a poor fit when you have only a small number of tested variants — a forest plot or table may be clearer, or when your audience is not familiar with genomics or p-value transformations.

Is a manhattan plot suitable for dashboards?+

Yes — a manhattan plot can work well in dashboards as long as the panel is large enough for readers to perceive the encoded values, has a clear title, and includes the legend or axis labels needed to interpret it.

What category of chart is a manhattan plot?+

Manhattan Plot belongs to the Scientific family of charts. Charts in that family are designed to answer the same kind of question, so they often work as alternatives when one doesn't quite fit your data.

How do you read a manhattan plot?+

Start with the axis labels and legend, then look at the overall shape before zooming into individual marks. Compare prominent features against the rest of the data, and verify any conclusion against the underlying numbers when precision matters.