Home/Chart Types/Flow & Process/Sankey diagram

Flow & ProcessIntermediate

Sankey Diagram

A flow diagram where the width of each band is proportional to the quantity it represents — thicker means more. Used to trace how energy, money, materials, or users move through the stages of a process and where the largest transfers happen.

// 01 — The chart

What it looks like

Example — Website traffic source → landing page → outcomeMay 2025

A Sankey diagram tracing website visitors from traffic source through landing page to final outcome. Band width encodes visitor volume; the highlighted Organic-to-Homepage band is the dominant flow.

// 02 — Definition

What is a Sankey diagram?

A Sankey diagram is a flow visualization where bands or arrows connect nodes, and the width of each band is proportional to the quantity it represents. Thicker bands carry larger flows; thinner bands carry smaller ones. The eye can instantly compare which paths are dominant, where flows split or merge, and where the largest transfers happen as a quantity moves through a system. The encoding is direct: the area of the band is the value.

The diagram reads left to right (or sometimes top to bottom). On the left side you see the sources — where things originate. On the right you see the destinations — where things end up. In between, each node is a stage of the process, and every band traces a single flow from one stage to the next. By convention, every node obeys a conservation principle: the total width entering a node should equal the total width leaving it. If quantity is genuinely lost between stages — wasted heat, dropped-off users, taxed dollars — that loss appears as a separate destination so the diagram remains balanced and the loss is honest.

Sankey diagrams excel at one specific job: showing how a total quantity distributes and redistributes across the stages of a process. They make it obvious where the biggest transfers happen, where losses occur, and how flows split at each stage. This is why they are beloved in energy analysis (Lawrence Livermore National Laboratory has published an annual U.S. energy Sankey for decades), in manufacturing, in supply-chain logistics, in budget visualization, and in web analytics. They reward dense, multi-source data — a problem that would need three or four separate bar charts can often collapse into one Sankey.

The price of that power is rigidity: Sankey diagrams only work when your data fits a small set of nodes (typically fewer than fifteen) and when the flows are strictly one-directional. Stretch them outside that shape and they get worse fast — bands tangle, labels collide, and the conservation principle quietly breaks. The rest of this guide is about how to live inside the Sankey’s sweet spot — and how to recognize when you have walked outside it.

Origin: The Sankey diagram is named after Irish-born British engineer Captain Matthew Henry Phineas Riall Sankey, who used it in 1898 to show the energy efficiency of a steam engine in the Minutes of Proceedings of the Institution of Civil Engineers. The earliest known diagram of this type, however, was created by French engineer Charles Joseph Minard in 1812 and most famously in his 1869 Carte figurative tracing Napoleon’s 1812 Russian campaign — a graphic Edward Tufte called the best statistical graphic ever drawn.

// 03 — When to use

When a Sankey diagram is the right call

Reach for a Sankey diagram whenever the question is about how a quantity moves through a process and you want a single visual that shows both the totals and the individual transfers. Below are the situations where it consistently wins against the alternatives.

&check;Use a Sankey diagram when…

Showing how a total quantity distributes and redistributes across two or more stages
Tracing energy, material, money, or user flows through a system end to end
Identifying the largest transfers and where losses or drop-offs occur
Visualizing website user journeys, conversion funnels with multiple paths, or budget allocations
Comparing multiple paths from source to destination in one chart instead of many
Your audience needs to see both the overall pattern and the size of individual flows
The data obeys a conservation principle (inflow equals outflow at each node)

// 04 — When not to use

When a Sankey diagram is the wrong call

A Sankey diagram can technically render almost any flow data, but “technically possible” is not the same as “good idea.” Below are the cases where the Sankey actively hides information you need to communicate.

×Avoid a Sankey diagram when…

You have more than ~15 nodes — the bands tangle and the diagram becomes unreadable
Flows loop back on themselves (cycles) — Sankey assumes one-directional flow; use a chord or network graph instead
You only need to compare totals across categories — a bar chart reads faster
Your data has only two stages and one source per destination — a simple stacked bar is clearer
Exact values matter more than relative magnitudes — individual band widths are hard to read precisely
Your audience is unfamiliar with the format — a simpler chart often communicates the same insight
The flows are bidirectional or symmetric (trade matrices, migration both directions) — use a chord diagram
Your data lacks a single shared unit — width comparisons require everything be in the same currency, joules, or count

// 05 — Data requirements

What your data needs to look like

Before building the chart, your dataset needs to fit a specific shape. Use this checklist to confirm yours does.

Shape

A long-format link table with one row per band: source label, target label, numeric value. Optionally a category column to group nodes by type.

Minimum rows

3 links across at least 3 nodes. With one or two links there is nothing to flow through.

Maximum rows

~40 links across ~15 nodes. Past that, the diagram tangles and labels overlap.

Required fields

sourcerequired

string / categorical

The label of the node where a flow originates. Source labels appear in the leftmost (or earlier) columns of the diagram. Every source must also exist in your node list — don’t reference an undefined source.

targetrequired

string / categorical

The label of the node where a flow terminates. Target labels appear in the rightmost (or later) columns. The same label can appear as both a target and a source if it sits at an intermediate stage.

valuerequired

number (continuous, ≥ 0)

The quantity carried by this band. Must share a single unit across rows (kilowatt-hours, dollars, users, kilograms). Negative values are not allowed — if you need to show a loss, model it as a flow into a dedicated “waste” or “drop-off” node.

source	target	value	category
Organic	Homepage	4,200	search
Paid	Homepage	2,400	ads
Social	Blog	1,900	social
Homepage	Signup	2,100	outcome
Homepage	Bounce	2,800	outcome
Blog	Browse	2,200	outcome

Parts of a Sankey diagram

Every Sankey diagram is built from the same handful of parts. Knowing the names makes it easier to talk about what to keep, what to drop, and what most templates are getting wrong.

A — Source nodes: Vertical bars on the leftmost column representing where flows originate

B — Flow bands: Curved bands connecting nodes — width encodes the quantity transferred

C — Intermediate nodes: Middle-column nodes where flows split, merge, or pass through

D — Flow direction: Left-to-right reading order shows the progression through process stages

E — Destination nodes: Vertical bars on the rightmost column representing final outcomes

// 07 — Step-by-step

Step-by-step: how to build a good Sankey diagram

A nine-step recipe that works regardless of the tool. Walk through it the first few times and the moves become automatic; skip steps and the diagram usually shows it.

1
Pick the question your diagram answers
A Sankey diagram answers “how does this quantity distribute across the stages of a process, and where do the largest transfers happen?” Write that question down before you draw anything. If the question is about ranking categories or change over time, switch to a different chart now — not after you build it.
2
Reshape the data into a link table
Aggregate the raw events into a tidy table with three columns: source, target, value. Every row is one band. Group small flows below a threshold (say, less than 1% of the total) into an “Other” node so the diagram doesn’t collapse under thread-thin bands.
3
Decide on the number of stages
Two stages (source → destination) is the simplest case. Three or more stages let you show how flows redistribute through intermediate nodes. Resist the urge to add stages just because your data has them — each stage roughly halves the bandwidth available for each band.
4
Verify conservation at every node
For every node, the sum of inflows should equal the sum of outflows. If your data shows a real loss — wasted heat, dropped-off users, taxed dollars — model it as an explicit “loss” destination so the diagram stays balanced and the loss is honest.
5
Order the nodes within each column
Most layout algorithms minimize band crossings automatically; if yours doesn’t, sort nodes by total flow (descending) within each column. Then nudge a few by hand to reduce overlap. The diagram should look more like a fan than a tangle.
6
Choose a color encoding with intent
The simplest scheme paints all bands the same neutral and uses one accent color to highlight a specific flow your headline is about. If you need to encode a category (e.g. renewable vs fossil), assign a small palette to nodes and let bands inherit the source color.
7
Label nodes directly, not in a legend
Every node should carry its name and (if there’s room) its total quantity. Avoid an off-chart legend — readers shouldn’t have to bounce between the diagram and a key to interpret what they are seeing.
8
Annotate the largest flows
Pick the two or three biggest bands and overlay their numeric values. The eye reads width well at the band level and badly at the absolute level — give it a few anchors to calibrate against.
9
Add a takeaway title and ship
“U.S. energy flow, 2024” is a label. “Two-thirds of U.S. energy is wasted as rejected heat” is a takeaway. Lead with the takeaway, put the descriptive label as a subtitle, and verify the diagram is still readable at the size your reader will see it.

// 08 — Real-world examples

Where you’ll see Sankey diagrams used

Sankey diagrams show up in four places more than anywhere else: energy and engineering, web analytics, government finance, and supply-chain logistics. Each context has its own conventions, and they all reward the same fundamentals.

Energy: U.S. national energy flow

Lawrence Livermore National Laboratory publishes an annual Sankey showing how U.S. primary energy flows from sources (petroleum, natural gas, coal, nuclear, renewables) through sectors (residential, commercial, industrial, transportation) to end uses and rejected energy. The diagram makes a striking point: roughly two-thirds of all energy consumed in the U.S. is wasted as heat. It is the canonical real-world Sankey.

Energy & Engineering

Web analytics: Conversion funnel paths

A growth team uses a Sankey to trace visitors from acquisition channel (organic, paid, social, email) through landing page (homepage, blog, product, pricing) to outcome (signup, browse, bounce). The widest bands reveal which channels actually drive conversions and which look big but bounce — information that is invisible in a stack of conversion-rate bar charts.

Web Analytics

Government: Federal budget allocation

A news outlet publishes a Sankey of federal tax revenue flowing into departments and programs. Readers can instantly see that defense, social security, and Medicare dominate, and they can trace each tax type (income, payroll, corporate) to its destination. The diagram answers “where does my tax dollar go?” in one glance.

Public Finance

Supply chain: Global commodity flows

A logistics analyst maps how a commodity (say, soybeans) flows from producing countries (Brazil, U.S., Argentina) through ports and shipping lanes to consuming countries (China, EU, Mexico). The widths reveal trade dependencies that bar charts of imports and exports alone never quite capture.

Supply Chain

// 09 — Variations

Types of Sankey diagrams

The basic Sankey diagram has several important variants, each suited to slightly different data situations. The headline rule stays the same: pick the variant whose strengths match your question.

Two-stage Sankey

The simplest form — direct flows from one set of categories to another without intermediate stages. Often the best place to start.

Multi-stage Sankey

Multiple intermediate columns show flows through several process stages — common in manufacturing, energy, and supply-chain analysis.

Energy / loss Sankey

Adds an explicit “rejected energy” or “loss” destination so waste is shown rather than implied. The classic engineering use case Sankey himself drew.

Alluvial diagram

A specialized Sankey for tracking how categorical group memberships shift between stages — e.g., survey respondents moving between income brackets over time.

// 10 — Comparisons

Sankey diagram vs other chart types

Sankey diagrams get confused with several other flow-style charts because they all use bands, ribbons, or arrows. The differences matter — picking the wrong one changes what your reader is allowed to conclude.

Sankey vs alluvial diagram

Alluvial diagrams are a special case of Sankey designed for showing how categorical group memberships shift between stages. Sankey is the broader family and accepts any quantity flow, including non-categorical transfers like energy or money.

Sankey diagram

Generic flow visualization. Nodes can differ at every stage, and the data may be physical units like kilowatt-hours or kilograms rather than counts of items.

Bands carry any quantity (energy, money, users)
Stages can use different node sets
Best for energy, supply-chain, and budget flows

Alluvial diagram

Specialized Sankey for categorical group changes between stages. The same node set typically reappears at each stage, and the bands track group membership across time.

Bands carry counts of items moving between groups
Same node set reappears at each stage
Best for survey responses and cohort transitions

Sankey vs chord diagram

Both encode flow magnitude with band width, but Sankeys arrange nodes in columns and assume one-directional flow, while chord diagrams arrange nodes in a circle and accept bidirectional flow. Choose based on whether your flows are directed or symmetric.

Sankey diagram

Linear, left-to-right (or top-to-bottom) layout. Sources sit on one side, destinations on the other, and bands always flow forward. Cycles are not supported.

Linear, columnar layout
Strictly directed, one-way flow
Easy to add intermediate process stages

Chord diagram

Circular layout where every node sits on the perimeter and bands run inside the circle. A flow from A to B and a return flow from B to A both appear as separate ribbons.

Circular, polar layout
Bidirectional flow is natural
Best for trade matrices and migration

Sankey vs flow chart (process)

A flow chart explains the steps of a process; a Sankey diagram explains the magnitudes that flow through one. They look similar at a glance and answer entirely different questions — picking the wrong one wastes the page.

Sankey diagram

Quantitative. Every band has a width that encodes a number. Use when the question is “how much flows where?”

Width = quantity (kilowatt-hours, dollars, users)
Shows where the largest transfers happen
Conservation: inflow equals outflow at each node

Flow chart

Qualitative. Boxes and arrows show the order of steps and decisions. Use when the question is “what happens, in what order?”

Boxes = steps; diamonds = decisions
Arrow direction shows sequence, not magnitude
Best for procedures, algorithms, swim lanes

Sankey vs stacked bar over time

When the question is “how has the composition of a total changed across a few stages?” both work. Pick a stacked bar when stages are time periods and you want to compare totals; pick a Sankey when individual flows between specific source/target pairs are the story.

Sankey diagram

Best when the reader needs to trace a specific source to a specific destination, or when the flows redistribute heavily between stages.

Bands trace individual source→target flows
Easy to spot redistribution and crossover
Heavy when there are more than ~4 time stages

Stacked bar over time

Best when the reader needs to compare totals between time periods and just wants a sense of compositional shift.

Bars = totals; segments = composition
Easy to compare totals between periods
Cannot trace which source went to which target

// 11 — Common mistakes

Mistakes to watch out for

Almost every broken Sankey diagram in the wild fails the same handful of ways. If you only memorize six rules, make them these.

Too many nodes and bands

Adding every possible category creates a tangled mess of overlapping bands. Past about fifteen nodes, individual flows become impossible to trace and labels collide. Limit yourself to the top 8–12 nodes per stage and group minor categories into a single “Other” node so the chart stays legible.

Inconsistent or aesthetic widths

If band widths don’t accurately represent quantities, the diagram becomes misleading. Some templates auto-shrink bands when labels collide or auto-balance them for visual symmetry. Always verify that width is strictly proportional to value — if a band looks wrong, fix the data or layout, never the width.

Breaking the conservation principle

By convention, the total inflow at each node should equal the total outflow. If your data shows a real loss between stages — wasted heat, dropped-off users, taxed dollars — model that loss as an explicit destination node so the diagram balances and the loss is honest. Ignoring conservation creates confusion about where quantity went.

Using Sankey for cyclic flows

Sankey diagrams are designed for directed, acyclic flows. If your process has feedback loops (returning customers, recycling, conversational turn-taking), the layout will not converge and the bands will overlap nonsensically. Use a chord diagram, network graph, or dedicated cycle visualization instead.

Hiding quantities behind hover-only tooltips

An interactive Sankey that only shows numeric values on hover excludes touch users, screen-reader users, and printed copies. Bake the most important values directly into the SVG (node totals, top-three flow widths) and let interactivity reveal the rest. The chart should still tell its story when paused.

Missing labels or off-chart legends

Without clear node labels and color-key context, readers cannot interpret the diagram. Direct labeling beats off-chart legends every time — every node should carry its name and, where space allows, its total quantity. Off-chart legends double the eye travel and wreck the reading flow.

Color choices that fail color-blind readers

Default Sankey palettes (especially Plotly’s) often pair reds and greens that are indistinguishable for the most common color-vision deficiencies. Pick a colorblind-safe palette like ColorBrewer or Tableau 10, and reinforce category encoding with patterns, label position, or distinct strokes.

// 12 — Accessibility

Accessibility checklist

Run through this list before publishing. The chart should still communicate its message to readers using assistive technology, color-blind users, keyboard navigation, and reduced-motion settings.

&check;
Color contrast meets WCAG AA
WCAG 1.4.3
Band fills and node fills against the chart background should reach at least 3:1 contrast for graphical objects. Direct labels and titles should reach 4.5:1 for body text and 3:1 for large text. Plotly’s default Sankey palette fails on light backgrounds — audit it before you ship.
&check;
Do not rely on color alone
WCAG 1.4.1
When color encodes node category (renewable vs fossil, organic vs paid), reinforce it with a pattern fill, a textured stroke, or a clearly different label position. Roughly 1 in 12 men and 1 in 200 women have some form of color-vision deficiency, and Sankey palettes often pick reds and greens together.
&check;
Provide a text alternative listing the top flows
WCAG 1.1.1
Add an accessible name (alt text or aria-label) that summarizes the diagram’s key flows by volume, not its chart type. “Sankey of energy flow” is weak; “Top three flows: petroleum to transportation 24 quads, natural gas to industrial 15 quads, coal to electricity 9 quads. Total rejected energy: 67 percent.” is strong.
&check;
Expose the underlying data
WCAG 1.3.1
Place the source/target/value link table beneath or next to the diagram, or expose it via a hidden table that screen readers can navigate row by row. Many readers will copy this data rather than re-key it from the visual.
&check;
Make every node focusable with the keyboard
WCAG 2.1.1
If the Sankey is interactive, every node and every link should be reachable with the Tab key. The focused element should announce its name and connected flow values via aria-label, and tooltips should appear on focus, not only on hover.
&check;
Provide aria-labels on nodes and links
WCAG 4.1.2
Each node’s SVG element should expose an aria-label like “Node: Petroleum, total outflow 33.0 quads”. Each link’s aria-label should read “Flow from Petroleum to Transportation, 24.0 quads” so a screen-reader user can traverse the graph.
&check;
Respect prefers-reduced-motion
WCAG 2.3.3
If bands fade or grow on load, gate the animation behind a prefers-reduced-motion: no-preference media query so motion-sensitive readers see the final state immediately. Skip drag-to-reorder animations entirely when the user prefers reduced motion.
&check;
Make the diagram resizable and zoomable
WCAG 1.4.4
Use a responsive viewBox so the diagram scales with the viewport. Verify it remains legible at 200% browser zoom — thin bands tend to disappear when scaled down. On mobile, consider a horizontal scroll container instead of cramming a 12-node Sankey into 360 pixels.

Microsoft Excel

Spreadsheet — ~15 min (with add-in)

01Lay out your data as a link table with three columns: Source, Target, and Value, with one row per band.
02Excel does not ship a native Sankey type, so install a third-party add-in: open Insert → Get Add-ins and search for Sankey (Peltier Tech Charts, ChartExpo, and Vizzlo are common).
03Highlight the link table including headers, then choose the Sankey option from the add-in’s gallery.
04Map the Source, Target, and Value columns to the add-in’s expected fields and accept the default layout.
05Adjust node colors so categories (renewable vs fossil, paid vs organic) get distinct hues, and check that band widths are still proportional to your values.
06Edit the chart title to a takeaway sentence, verify conservation (inflow equals outflow at every node), and resize the visual so labels don’t overlap.

Tip: if you can’t install add-ins, build the Sankey in Power BI Desktop instead and paste the published image into your Excel report.

Google Sheets

Spreadsheet — ~10 min (with add-on)

01Lay out a link table with three columns: Source, Target, Value — one row per band, headers in row 1.
02Google Sheets has no native Sankey chart type. Open Extensions → Add-ons → Get add-ons and install ChartExpo, Power Tools, or Vizzlo.
03Launch the add-on, choose Sankey diagram from its gallery, and select your link table as the data source.
04Map Source, Target, and Value columns to the add-on’s prompts and click Create chart.
05Customize node colors and labels in the add-on’s settings panel — stick to a small palette so the diagram stays calm.
06Use Insert → Image to embed the rendered Sankey back into your Sheet, or export it as PNG/SVG for a slide deck.

Sheets’ native chart picker only ships rough “flow”-style options through Org chart and Geo chart — neither is a real Sankey, so reach for an add-on.

Python (Plotly)

Code — ~8 min

01Install Plotly with pip install plotly. Matplotlib does not have a native Sankey trace, so Plotly is the easiest path.
02Create three parallel lists: a list of unique node labels, plus source-index, target-index, and value lists describing every band.
03Build the trace with go.Sankey(node=dict(label=labels, color=colors), link=dict(source=src, target=tgt, value=val)).
04Wrap the trace in a go.Figure(), add a title with fig.update_layout(title_text=...), and call fig.show() in a notebook or fig.write_html() to save.
05Color nodes by category by passing a colors list parallel to labels, and color bands by source by setting link.color = [node_colors[s] for s in src].
06For accessibility, set a meaningful figure title and supply alt text via the embedding HTML — Plotly’s SVG output respects role=”img” on the wrapping element.

Tip: if you prefer a static figure, plotly.io.write_image() can export the Sankey as PNG/SVG. Install kaleido (pip install -U kaleido) first.

R (networkD3 / ggalluvial)

Code — ~7 min

01For interactive Sankeys, install networkD3 with install.packages('networkD3') and load it with library(networkD3).
02For static, publication-ready Sankeys (alluvial flavor), install ggalluvial: install.packages('ggalluvial') and library(ggalluvial).
03With networkD3, build a links data frame with source, target, value columns and a nodes data frame with name, then call sankeyNetwork(Links=links, Nodes=nodes, Source='source', Target='target', Value='value', NodeID='name').
04With ggalluvial, pivot your data to long format and pass it to ggplot() with geom_alluvium() and geom_stratum() layered on top.
05Color nodes or strata by category with scale_fill_brewer() to pick a colorblind-safe ColorBrewer palette.
06For networkD3, save the widget with htmlwidgets::saveWidget() to ship as a standalone HTML file. For ggalluvial, ggsave() exports to PNG, PDF, or SVG.

Tip: ggalluvial is best when stages share the same node set (true alluvial); networkD3 is best when stages have different node sets (general Sankey).

JavaScript (D3 + d3-sankey)

Code — ~15 min

01Install D3 and the d3-sankey plugin: npm i d3 d3-sankey, or include the CDN script tags in your HTML.
02Build a graph object with two arrays: nodes (each {name}) and links (each {source, target, value}).
03Configure the layout with const sankey = d3.sankey().nodeWidth(15).nodePadding(12).extent([[1,1],[width-1,height-1]]); then run const graph = sankey({nodes: ..., links: ...}).
04Render nodes as <rect> elements positioned with x0, y0, x1, y1 from the layout, and render links with d3.sankeyLinkHorizontal() as the path generator.
05Color nodes by category with d3.scaleOrdinal(d3.schemeTableau10) and let links inherit the source node color with reduced opacity (~0.3 to 0.5).
06For accessibility, set role=”img” on the SVG, give it an aria-label that lists the top three flows, and add tabindex=”0” to nodes with their own aria-labels.

Tip: Observable’s @d3/sankey-diagram notebook is the canonical reference. Fork it and replace the data to bootstrap a new chart in five minutes.

Tableau

BI — ~25 min (polygon hack)

01Tableau does not have a native Sankey mark, so the standard trick is the “polygon Sankey” hack — reshape your link table to plot bezier curves as polygons.
02In Tableau Prep or with a calculation, generate a long densified data set: each link gets ~50 evaluation points along its curve, with t in [0,1].
03Drop the Path field on the Marks card with mark type Polygon, put t on Columns, and y(t) on Rows where y(t) is the bezier interpolation between source and target row positions.
04Map source position and target position into Detail and use a curve calculation: y(t) = source + (target - source) * (3*t^2 - 2*t^3).
05Encode value with size by mapping link value to the polygon’s vertical extent at every t, and color polygons by the source dimension.
06Verify in Show Me that you have polygons, not lines or bars; for production work, the Tableau Public gallery has reusable templates.

Tip: if the polygon hack is too painful, render the Sankey in d3-sankey and embed it as a Web Page object inside a Tableau dashboard — it scrolls and filters cleanly.

Power BI

BI — ~6 min (custom visual)

01In Power BI Desktop, click the … (more options) icon in the Visualizations pane and choose Get more visuals from AppSource.
02Search for Sankey — Microsoft publishes a free Sankey visual; alternatives like Sankey Chart by ChartExpo and MAQ Software’s Sankey Bar Chart are also popular.
03Drop the imported visual onto your report canvas and drag your Source field to the Source well, your Target field to the Destination well, and your Value field to the Weight well.
04Open the Format pane, expand Data colors, and assign distinct colors per source category so bands inherit the source color.
05Under Format → Data labels, toggle them on and choose to display node totals at each node so readers don’t need to hover.
06Edit the title under Format → Title and write a takeaway sentence rather than the default field name.

Tip: Microsoft’s Sankey custom visual is certified, so it works in Publish to Web and embedded scenarios. ChartExpo offers more polish but requires a free account.

// 16 — Code examples

Working code in the most common stacks

Three runnable snippets that produce the same chart — the website-traffic Sankey from the hero, with Organic colored as the dominant source. Copy, paste, and replace the data with yours.

sankey_diagram.py

import plotly.graph_objects as go

# A small website-traffic flow: source -> landing page -> outcome.
nodes = [
    "Organic", "Paid", "Social", "Email",          # 0..3 sources
    "Homepage", "Blog", "Product", "Pricing",       # 4..7 landing pages
    "Signup", "Browse", "Bounce",                   # 8..10 outcomes
]

# (source_index, target_index, value)
links = [
    (0, 4, 4200), (0, 5, 1800), (0, 6, 1100),
    (1, 4, 2400), (1, 5,  600), (1, 7, 1500),
    (2, 5, 1900), (2, 6,  700),
    (3, 6,  900), (3, 7,  300),
    (4, 8, 2100), (4, 9, 2800), (4, 10, 1700),
    (5, 8,  900), (5, 9, 2200), (5, 10, 1200),
    (6, 8, 1400), (6, 9, 1100), (6, 10,  300),
    (7, 8,  700), (7, 9,  900), (7, 10,  200),
]

# Brand palette: accent for the dominant source, neutrals for the rest.
node_colors = [
    "#c94a2e", "#e8c4b8", "#f5ede9", "#f0e4e0",
    "#1a1a18", "#6b6b67", "#6b6b67", "#6b6b67",
    "#2e7d52", "#b0b0aa", "#c94a2e",
]
link_colors = ["rgba(201,74,46,0.25)" if s == 0 else "rgba(180,170,160,0.18)"
               for s, _, _ in links]

fig = go.Figure(go.Sankey(
    arrangement="snap",
    node=dict(
        pad=14, thickness=14,
        line=dict(color="#1a1a18", width=0.5),
        label=nodes, color=node_colors,
    ),
    link=dict(
        source=[s for s, _, _ in links],
        target=[t for _, t, _ in links],
        value =[v for _, _, v in links],
        color=link_colors,
        hovertemplate="%{source.label} → %{target.label}<br>%{value:,} visitors<extra></extra>",
    ),
))

fig.update_layout(
    title_text="Organic search drives more than half of all signups — May 2025",
    font_family="Inter, system-ui, sans-serif",
    font_size=12,
    margin=dict(l=10, r=10, t=60, b=10),
)
fig.write_html("sankey_diagram.html")
fig.show()

$ python sankey_diagram.py

// 17 — FAQs

Frequently asked questions

What is a Sankey diagram?+

A Sankey diagram is a flow visualization where bands or arrows connect nodes and the width of each band is proportional to the quantity it represents. Thicker bands carry larger flows; thinner bands carry smaller ones. The diagram lets the eye instantly compare which paths are dominant and where flows split, merge, or get lost between stages.

When should you use a Sankey diagram?+

Use a Sankey diagram when you want to show how a total quantity distributes and redistributes across the stages of a process. Good fits include energy and material flows, web analytics user journeys, conversion funnels with multiple paths, government budget allocations, immigration and migration flows, and supply-chain transfers between facilities.

When should you avoid a Sankey diagram?+

Avoid a Sankey diagram when you have more than ~15 nodes (the bands tangle and become unreadable), when flows loop back on themselves (Sankey assumes one-directional flow — use a chord diagram instead), when you only need to compare totals (a bar chart is faster), or when your audience is unfamiliar with the format and a simpler chart would communicate the same insight.

How is a Sankey diagram different from an alluvial diagram?+

Alluvial diagrams are a special case of Sankey designed to show how categorical group memberships change between two or more stages — for example, how survey respondents move between income brackets between 2010 and 2020. Sankey diagrams are the broader family and can show any quantity flow, including non-categorical transfers such as energy or money. Visually they look almost identical, but alluvial diagrams typically share node sets across stages while Sankeys often have different node sets at each stage.

How is a Sankey diagram different from a chord diagram?+

A Sankey diagram shows directed, one-way flows between source and destination nodes arranged in columns. A chord diagram shows bidirectional flows between nodes arranged around a circle, so a flow from A to B and a return flow from B to A both appear as separate ribbons inside the same circle. Use Sankey for one-directional process flows, chord for symmetric matrices like trade between countries or migration between regions.

How is a Sankey diagram different from a flow chart?+

A flow chart visualizes a process as a sequence of decisions and actions — it tells you what happens but says nothing about quantities. A Sankey diagram visualizes the magnitude of flow through a process — every band has a width that encodes a number. Use a flow chart when explaining the steps of a procedure; use Sankey when the question is 'how much flows where?'

How many nodes can a Sankey diagram hold?+

A Sankey diagram is comfortable with five to twelve nodes per stage and starts to break down past about fifteen nodes total. Past that point, the bands begin to overlap, the labels collide, and individual flows become impossible to trace. If you have more nodes, group small categories into an 'Other' bucket, drop nodes below a threshold, or split the diagram into separate small multiples.

Should the inflow and outflow at each node match?+

Yes — by convention every Sankey node should obey the conservation principle: the total width entering a node should equal the total width leaving it. If quantity is genuinely lost between stages (waste heat in an engine, drop-offs in a funnel), show that loss as a separate destination node with its own band so the diagram remains balanced and the loss is explicit.

Why are Sankey diagrams used so much in energy analysis?+

Captain Matthew Sankey invented the modern form of the diagram in 1898 specifically to visualize the energy efficiency of a steam engine. The format is a near-perfect fit for energy: inputs (fuel sources) on the left, useful work and rejected heat on the right, and the width of every band shows where each unit of energy goes. Lawrence Livermore National Laboratory still publishes annual U.S. energy-flow Sankeys — they are the canonical example of the form.

Can a Sankey diagram be interactive?+

Yes — interactive Sankey diagrams support hovering or clicking on a node to highlight its inflows and outflows, dragging nodes to reorder them, and tooltips that show the exact quantity for each band. Libraries like d3-sankey, Plotly, and Highcharts all ship interactive variants. Make sure the interactions are keyboard accessible: every node should be focusable with Tab and the highlight should fire on focus, not only on hover.

What category of chart is a Sankey diagram?+

Sankey diagrams belong to the Flow & Process family of charts. Charts in that family — alluvial, chord, network, parallel sets — are designed to answer the same kind of 'how does quantity move through a system' question, and they often work as alternatives when one doesn't quite fit your data shape.

What's the best library for building Sankey diagrams in code?+

For Python, plotly.graph_objects.Sankey is the easiest path because Matplotlib has no native Sankey geometry. For R, networkD3::sankeyNetwork() and ggalluvial both work, with ggalluvial fitting neatly into the tidyverse. For the web, d3-sankey is the dominant low-level library and Plotly.js is the easiest high-level option. Highcharts and ECharts both ship Sankey types as well.

How do you read a Sankey diagram?+

Start by reading the labels on the leftmost (source) and rightmost (destination) columns. Then trace the widest bands first to find the dominant flows. Look at each node to see whether it splits flows outward or merges them inward, and check whether the right-hand side is thinner than the left to spot losses. Finally, hover or zoom in to read individual flow quantities when precision matters.

// 18 — References

References and further reading

Primary sources, reference texts, and the official documentation for the libraries and tools referenced throughout this guide.

Wikipedia — Sankey diagramReference
Encyclopedia entry covering history, naming, and notable examples — a solid neutral starting point with citations to primary sources including Sankey’s original 1898 paper.
https://en.wikipedia.org/wiki/Sankey_diagram
Matthew Henry Phineas Riall Sankey — The Thermal Efficiency of Steam Engines (1898)Primary source
Captain Sankey’s original paper, published in the Minutes of Proceedings of the Institution of Civil Engineers, where the eponymous diagram first appeared as a way to show steam-engine energy losses.
https://archive.org/details/minutesofproceed12701898inst
Charles Joseph Minard — Carte figurative (1869)Primary source
Minard’s 1869 map of Napoleon’s 1812 Russian campaign — widely cited as a Sankey-style flow diagram, predating Sankey’s own use of the form by nearly thirty years. Tufte called it the best statistical graphic ever drawn.
https://en.wikipedia.org/wiki/Charles_Joseph_Minard#Napoleon%27s_Russian_campaign
Lawrence Livermore National Laboratory — Energy Flow ChartsReference
Annual U.S. energy-flow Sankey diagrams covering primary energy sources, sectors, and end uses. The canonical real-world example of the format and a great template to study.
https://flowcharts.llnl.gov/
Financial Times — Visual VocabularyReference
Open-source poster categorizing chart types by intent. Sankey diagrams sit firmly in the Flow family alongside chord, network, and waterfall charts.
https://github.com/Financial-Times/chart-doctor/tree/main/visual-vocabulary
Mike Bostock — d3-sankeyDocs
The official d3-sankey plugin and Observable notebooks. The reference implementation for any web Sankey diagram, with examples for both static and interactive flavors.
https://github.com/d3/d3-sankey
Plotly — Sankey Diagrams in PythonDocs
Official Plotly documentation for the Sankey trace. Covers node/link configuration, color, hover templates, and exporting to HTML or PNG.
https://plotly.com/python/sankey-diagram/
ggalluvial — Alluvial plots in ggplot2Docs
Documentation for the ggalluvial R package. The clearest reference for the alluvial flavor of Sankey, with a tutorial on long-format data and stratum aesthetics.
https://corybrunson.github.io/ggalluvial/
WAI — Complex Images: Charts and GraphsAccessibility
Web Accessibility Initiative guidance on making complex visuals accessible. The patterns for long descriptions, data tables, and aria-labels apply directly to Sankey diagrams.
https://www.w3.org/WAI/tutorials/images/complex/
Datawrapper Academy — What to consider when creating Sankey diagramsTutorial
Hands-on tutorial with real published examples. Especially useful for choosing the right level of granularity and avoiding band-tangle messes.
https://academy.datawrapper.de/article/258-how-to-create-a-sankey-diagram
Edward Tufte — The Visual Display of Quantitative InformationBook
Tufte’s foundational text on data graphics. The chapters on Minard’s Napoleon march set the standard for how flow visualizations should respect data integrity.
https://www.edwardtufte.com/book/the-visual-display-of-quantitative-information/

← Previous: Chord diagram

1 of 80+ chart types

Next: Alluvial diagram →

Sankey Diagram

What it looks like

What is a Sankey diagram?

When a Sankey diagram is the right call

When a Sankey diagram is the wrong call

What your data needs to look like

Parts of a Sankey diagram

Step-by-step: how to build a good Sankey diagram

Pick the question your diagram answers

Reshape the data into a link table

Decide on the number of stages

Verify conservation at every node

Order the nodes within each column

Choose a color encoding with intent

Label nodes directly, not in a legend

Annotate the largest flows

Add a takeaway title and ship

Where you’ll see Sankey diagrams used

Energy: U.S. national energy flow

Web analytics: Conversion funnel paths

Government: Federal budget allocation

Supply chain: Global commodity flows

Types of Sankey diagrams

Sankey diagram vs other chart types

Sankey vs alluvial diagram

Sankey diagram

Alluvial diagram

Sankey vs chord diagram

Sankey diagram

Chord diagram

Sankey vs flow chart (process)

Sankey diagram

Flow chart

Sankey vs stacked bar over time

Sankey diagram

Stacked bar over time

Mistakes to watch out for

Too many nodes and bands

Inconsistent or aesthetic widths

Breaking the conservation principle

Using Sankey for cyclic flows

Hiding quantities behind hover-only tooltips

Missing labels or off-chart legends

Color choices that fail color-blind readers

Accessibility checklist

Color contrast meets WCAG AA

Do not rely on color alone

Provide a text alternative listing the top flows

Expose the underlying data

Make every node focusable with the keyboard

Provide aria-labels on nodes and links

Respect prefers-reduced-motion

Make the diagram resizable and zoomable

Design and craft tips

Limit the diagram to ~12 nodes per stage

Adjust band widths for aesthetics

Show losses as explicit destination nodes

Use a Sankey for cyclic flows

Label nodes directly with totals

Hide quantities behind hover tooltips

Lead with a takeaway title

Animate every band on load

Related and alternative charts

How to build it in your tool of choice

Microsoft Excel

Google Sheets

Python (Plotly)

R (networkD3 / ggalluvial)

JavaScript (D3 + d3-sankey)

Tableau

Power BI

Working code in the most common stacks

Frequently asked questions

References and further reading