Sequence Logo
A stacked bar chart of letter heights showing information content at each position in a biological sequence — letter height equals frequency times information, revealing conserved motifs in DNA or protein alignments.
// 01 — The chart
What it looks like
A DNA sequence logo showing an 8-position binding motif. Positions 4, 5, and 6 are highly conserved (tall letters near 2 bits), while positions 1 and 3 show more variability. Colors follow the standard nucleotide scheme: A (red), T (green), C (blue), G (gold).
// 02 — Definition
What is a sequence logo?
A sequence logo is a graphical representation of a multiple sequence alignment of nucleotides (DNA/RNA) or amino acids (proteins). At each position in the alignment, letters are stacked on top of each other. The total height of the stack represents the information content at that position, measured in bits, while each letter’s height within the stack is proportional to its frequency.
For DNA, the maximum information content at any position is 2 bits (since there are 4 possible nucleotides: A, T, C, G). A position where all sequences have the same nucleotide reaches 2 bits — it is perfectly conserved. A position where all four nucleotides appear equally reaches 0 bits — it carries no information. Protein logos have a maximum of about 4.32 bits (log2 20 amino acids).
This design is remarkably efficient: you can see both which residues are important at each position and how important that position is, all in a single compact graphic. Sequence logos are the standard way to represent transcription factor binding sites, splice sites, and other short functional motifs in molecular biology.
Origin: Sequence logos were introduced by Tom Schneider and Michael Stephens in their 1990 paper “Sequence logos: a new way to display consensus sequences” published in Nucleic Acids Research. The concept built on Claude Shannon’s information theory to quantify the conservation of biological sequences.
// 03 — Anatomy
Parts of a sequence logo
// 04 — Usage
When to use it — and when not to
- Visualizing transcription factor binding site motifs from ChIP-seq or SELEX data
- Showing conservation patterns across a multiple sequence alignment of DNA, RNA, or protein
- You need to communicate both the dominant residue and the information content at each position
- Comparing binding specificity of different transcription factors side by side
- Publishing in molecular biology, genomics, or bioinformatics journals where logos are standard
- Your alignment is short (5–30 positions) and you want a compact, information-dense visualization
- Your alignment is very long (hundreds of positions) — the logo becomes unreadable at that scale
- You need to show gaps or insertions in the alignment — standard logos don’t handle indels
- Your audience is not familiar with molecular biology or information theory
- You want to show individual sequences rather than a consensus — use a dot plot or alignment viewer
- You have very few sequences in your alignment, making the frequency estimates unreliable
- You need to show correlations between positions — logos assume positions are independent
// 05 — Reading guide
How to read a sequence logo
Follow these steps whenever you encounter a sequence logo in a paper or bioinformatics report.
Check the Y-axis scale
The Y-axis measures information content in bits. For DNA logos, the maximum is 2 bits (one of 4 equally likely bases). For protein logos, it’s about 4.32 bits (one of 20 amino acids). Positions near the maximum are highly conserved; positions near zero are variable.
Read the tallest letters at each position
The letter at the top of each stack is the most frequent residue at that position. If the stack is tall and dominated by a single large letter, that position is highly conserved and likely functionally important.
Identify the consensus sequence
Read across the top letters from left to right to extract the consensus motif. For example, if positions 4–6 show tall T, C, and A respectively, the core motif is “TCA.” This is the most probable binding sequence.
Assess position variability
Short stacks with multiple small letters indicate degenerate positions where several residues are tolerated. These positions contribute less to binding specificity and may be flexible in the biological context.
Note the color scheme
DNA logos typically use: A = red/green, T = red/blue, C = blue, G = gold/black. Protein logos often use chemistry-based coloring: hydrophobic (black), polar (green), acidic (red), basic (blue). Check the legend or paper conventions.
// 06 — Common mistakes
Mistakes to watch out for
Not applying small-sample correction
With few sequences, the observed frequencies are noisy estimates of the true distribution. The information content will be overestimated unless a small-sample correction (such as the one proposed by Schneider) is applied. Without it, positions may appear more conserved than they actually are.
Confusing frequency logos with information logos
A frequency logo shows letter heights proportional only to frequency (each column sums to 1), while a true sequence logo weights by information content. The two look similar but convey different things — a frequency logo can make variable positions look just as important as conserved ones.
Ignoring background composition
The information content calculation assumes a uniform background distribution by default. If the genomic background is AT-rich or GC-rich, this assumption inflates the apparent information at positions matching the background bias. Use a background-corrected model when appropriate.
Using sequence logos for very long alignments
Logos are designed for short motifs (5–30 positions). Applying them to entire gene alignments produces unreadable, horizontal scrolling graphics. For long alignments, use conservation score line plots or specialized alignment viewers instead.
Assuming positions are independent
Standard sequence logos treat each position independently, but biological sequences often have correlated positions (e.g., base-pairing in RNA). If inter-position dependencies are important, consider using mutual information plots or covariance models.
// 07 — Real-world examples
Where you’ll see sequence logos used
Genomics: Transcription factor binding motifs
Databases like JASPAR and TRANSFAC display sequence logos for thousands of known transcription factor binding sites. Researchers use them to identify which motif a ChIP-seq peak matches and predict which genes a transcription factor regulates.
GenomicsVirology: Viral mutation hotspots
Researchers align hundreds of SARS-CoV-2 spike protein sequences and generate sequence logos to identify conserved regions (potential vaccine targets) and variable regions (sites of immune escape mutations).
VirologyStructural biology: Protein domain signatures
The Pfam database uses sequence logos to represent the conserved residues in protein domain families. Structural biologists use these logos to identify which amino acids are essential for protein folding and function.
Structural Biology// 08 — At a glance
Quick reference
// 09 — Variations
Types of sequence logos
The original sequence logo has inspired several important variants for different analytical needs.
Frequency logo
Each column sums to 1.0, showing raw frequency rather than information content. Useful when you want to see the full distribution at every position.
Two-sided (differential) logo
Shows enrichment above the baseline and depletion below, comparing two sets of sequences to highlight position-specific differences.
Protein sequence logo
Uses 20 amino acid letters with chemistry-based coloring. Maximum information content is ~4.32 bits. Common for enzyme active sites and protein domain families.
Position weight matrix heatmap
An alternative to logos that shows the same position-specific frequency data as a color-coded grid. Easier to read for very long motifs.
// 10 — FAQs
Frequently asked questions
What is a sequence logo?+
A sequence logo is a graphical representation of a multiple sequence alignment of nucleotides (DNA/RNA) or amino acids (proteins). At each position in the alignment, letters are stacked on top of each other. The total height of the stack represents the information content at that position, measured in bits, while each letter's height within the stack is proportional to its frequency.
When should you use a sequence logo?+
Use a sequence logo when visualizing transcription factor binding site motifs from ChIP-seq or SELEX data. It also works well when showing conservation patterns across a multiple sequence alignment of DNA, RNA, or protein, and when you need to communicate both the dominant residue and the information content at each position.
When should you avoid a sequence logo?+
Avoid a sequence logo when your alignment is very long (hundreds of positions) — the logo becomes unreadable at that scale. It is also a poor fit when you need to show gaps or insertions in the alignment — standard logos don’t handle indels, or when your audience is not familiar with molecular biology or information theory.
Is a sequence logo suitable for dashboards?+
Yes — a sequence logo can work well in dashboards as long as the panel is large enough for readers to perceive the encoded values, has a clear title, and includes the legend or axis labels needed to interpret it.
What category of chart is a sequence logo?+
Sequence Logo belongs to the Scientific family of charts. Charts in that family are designed to answer the same kind of question, so they often work as alternatives when one doesn't quite fit your data.
How do you read a sequence logo?+
Start with the axis labels and legend, then look at the overall shape before zooming into individual marks. Compare prominent features against the rest of the data, and verify any conclusion against the underlying numbers when precision matters.