ScientificAdvanced

Sequence Logo

A stacked bar chart of letter heights showing information content at each position in a biological sequence — letter height equals frequency times information, revealing conserved motifs in DNA or protein alignments.

// 01 — The chart

What it looks like

Example — DNA transcription factor binding motif8-position consensus
2.01.51.00.50.0bitsTA1A2CT3T4C5A6GAC7TG8Position

A DNA sequence logo showing an 8-position binding motif. Positions 4, 5, and 6 are highly conserved (tall letters near 2 bits), while positions 1 and 3 show more variability. Colors follow the standard nucleotide scheme: A (red), T (green), C (blue), G (gold).

// 02 — Definition

What is a sequence logo?

A sequence logo is a graphical representation of a multiple sequence alignment of nucleotides (DNA/RNA) or amino acids (proteins). At each position in the alignment, letters are stacked on top of each other. The total height of the stack represents the information content at that position, measured in bits, while each letter’s height within the stack is proportional to its frequency.

For DNA, the maximum information content at any position is 2 bits (since there are 4 possible nucleotides: A, T, C, G). A position where all sequences have the same nucleotide reaches 2 bits — it is perfectly conserved. A position where all four nucleotides appear equally reaches 0 bits — it carries no information. Protein logos have a maximum of about 4.32 bits (log2 20 amino acids).

This design is remarkably efficient: you can see both which residues are important at each position and how important that position is, all in a single compact graphic. Sequence logos are the standard way to represent transcription factor binding sites, splice sites, and other short functional motifs in molecular biology.

Origin: Sequence logos were introduced by Tom Schneider and Michael Stephens in their 1990 paper “Sequence logos: a new way to display consensus sequences” published in Nucleic Acids Research. The concept built on Claude Shannon’s information theory to quantify the conservation of biological sequences.

// 03 — Anatomy

Parts of a sequence logo

ATCTGACABCDE
A — Y-axis (information content): Vertical axis measured in bits (0–2 for DNA, 0–4.32 for protein), representing the total information at each position
B — X-axis (position): Each column corresponds to a position in the aligned sequences, numbered sequentially
C — Letter height: The height of each letter is proportional to frequency × information — taller letters are more conserved at that position
D — Low-information position: Short stacks indicate positions with little conservation, where multiple residues appear with similar frequency
E — Stacking order: Within each column, letters are stacked from most to least frequent (most frequent on top), allowing quick identification of the dominant residue

// 04 — Usage

When to use it — and when not to

✓Use a sequence logo when…
  • Visualizing transcription factor binding site motifs from ChIP-seq or SELEX data
  • Showing conservation patterns across a multiple sequence alignment of DNA, RNA, or protein
  • You need to communicate both the dominant residue and the information content at each position
  • Comparing binding specificity of different transcription factors side by side
  • Publishing in molecular biology, genomics, or bioinformatics journals where logos are standard
  • Your alignment is short (5–30 positions) and you want a compact, information-dense visualization
×Avoid a sequence logo when…
  • Your alignment is very long (hundreds of positions) — the logo becomes unreadable at that scale
  • You need to show gaps or insertions in the alignment — standard logos don’t handle indels
  • Your audience is not familiar with molecular biology or information theory
  • You want to show individual sequences rather than a consensus — use a dot plot or alignment viewer
  • You have very few sequences in your alignment, making the frequency estimates unreliable
  • You need to show correlations between positions — logos assume positions are independent

// 05 — Reading guide

How to read a sequence logo

Follow these steps whenever you encounter a sequence logo in a paper or bioinformatics report.

1

Check the Y-axis scale

The Y-axis measures information content in bits. For DNA logos, the maximum is 2 bits (one of 4 equally likely bases). For protein logos, it’s about 4.32 bits (one of 20 amino acids). Positions near the maximum are highly conserved; positions near zero are variable.

2

Read the tallest letters at each position

The letter at the top of each stack is the most frequent residue at that position. If the stack is tall and dominated by a single large letter, that position is highly conserved and likely functionally important.

3

Identify the consensus sequence

Read across the top letters from left to right to extract the consensus motif. For example, if positions 4–6 show tall T, C, and A respectively, the core motif is “TCA.” This is the most probable binding sequence.

4

Assess position variability

Short stacks with multiple small letters indicate degenerate positions where several residues are tolerated. These positions contribute less to binding specificity and may be flexible in the biological context.

5

Note the color scheme

DNA logos typically use: A = red/green, T = red/blue, C = blue, G = gold/black. Protein logos often use chemistry-based coloring: hydrophobic (black), polar (green), acidic (red), basic (blue). Check the legend or paper conventions.

// 06 — Common mistakes

Mistakes to watch out for

Not applying small-sample correction

With few sequences, the observed frequencies are noisy estimates of the true distribution. The information content will be overestimated unless a small-sample correction (such as the one proposed by Schneider) is applied. Without it, positions may appear more conserved than they actually are.

Confusing frequency logos with information logos

A frequency logo shows letter heights proportional only to frequency (each column sums to 1), while a true sequence logo weights by information content. The two look similar but convey different things — a frequency logo can make variable positions look just as important as conserved ones.

Ignoring background composition

The information content calculation assumes a uniform background distribution by default. If the genomic background is AT-rich or GC-rich, this assumption inflates the apparent information at positions matching the background bias. Use a background-corrected model when appropriate.

Using sequence logos for very long alignments

Logos are designed for short motifs (5–30 positions). Applying them to entire gene alignments produces unreadable, horizontal scrolling graphics. For long alignments, use conservation score line plots or specialized alignment viewers instead.

Assuming positions are independent

Standard sequence logos treat each position independently, but biological sequences often have correlated positions (e.g., base-pairing in RNA). If inter-position dependencies are important, consider using mutual information plots or covariance models.

// 07 — Real-world examples

Where you’ll see sequence logos used

01

Genomics: Transcription factor binding motifs

Databases like JASPAR and TRANSFAC display sequence logos for thousands of known transcription factor binding sites. Researchers use them to identify which motif a ChIP-seq peak matches and predict which genes a transcription factor regulates.

Genomics
02

Virology: Viral mutation hotspots

Researchers align hundreds of SARS-CoV-2 spike protein sequences and generate sequence logos to identify conserved regions (potential vaccine targets) and variable regions (sites of immune escape mutations).

Virology
03

Structural biology: Protein domain signatures

The Pfam database uses sequence logos to represent the conserved residues in protein domain families. Structural biologists use these logos to identify which amino acids are essential for protein folding and function.

Structural Biology

// 08 — At a glance

Quick reference

Also known asLogo plot, motif logo, WebLogo
Invented byTom Schneider & Michael Stephens, 1990
Best forVisualizing conservation patterns in short biological sequence alignments
Data typesPosition on X-axis, Information content (bits) on Y-axis
Max info (DNA)2 bits per position (log₂ 4)
Max info (protein)≈4.32 bits per position (log₂ 20)
Common toolsWebLogo, Logomaker, ggseqlogo, MEME Suite, Seq2Logo
Common mistakesNo small-sample correction, confusing frequency vs information, ignoring background

// 09 — Variations

Types of sequence logos

The original sequence logo has inspired several important variants for different analytical needs.

ATTCGA

Frequency logo

Each column sums to 1.0, showing raw frequency rather than information content. Useful when you want to see the full distribution at every position.

ACTGC

Two-sided (differential) logo

Shows enrichment above the baseline and depletion below, comparing two sets of sequences to highlight position-specific differences.

KRLDESW

Protein sequence logo

Uses 20 amino acid letters with chemistry-based coloring. Maximum information content is ~4.32 bits. Common for enzyme active sites and protein domain families.

Position weight matrix heatmap

An alternative to logos that shows the same position-specific frequency data as a color-coded grid. Easier to read for very long motifs.

// 10 — FAQs

Frequently asked questions

What is a sequence logo?+

A sequence logo is a graphical representation of a multiple sequence alignment of nucleotides (DNA/RNA) or amino acids (proteins). At each position in the alignment, letters are stacked on top of each other. The total height of the stack represents the information content at that position, measured in bits, while each letter's height within the stack is proportional to its frequency.

When should you use a sequence logo?+

Use a sequence logo when visualizing transcription factor binding site motifs from ChIP-seq or SELEX data. It also works well when showing conservation patterns across a multiple sequence alignment of DNA, RNA, or protein, and when you need to communicate both the dominant residue and the information content at each position.

When should you avoid a sequence logo?+

Avoid a sequence logo when your alignment is very long (hundreds of positions) — the logo becomes unreadable at that scale. It is also a poor fit when you need to show gaps or insertions in the alignment — standard logos don’t handle indels, or when your audience is not familiar with molecular biology or information theory.

Is a sequence logo suitable for dashboards?+

Yes — a sequence logo can work well in dashboards as long as the panel is large enough for readers to perceive the encoded values, has a clear title, and includes the legend or axis labels needed to interpret it.

What category of chart is a sequence logo?+

Sequence Logo belongs to the Scientific family of charts. Charts in that family are designed to answer the same kind of question, so they often work as alternatives when one doesn't quite fit your data.

How do you read a sequence logo?+

Start with the axis labels and legend, then look at the overall shape before zooming into individual marks. Compare prominent features against the rest of the data, and verify any conclusion against the underlying numbers when precision matters.