Flow & ProcessIntermediate

Sankey Diagram

A flow diagram where the width of each band is proportional to the quantity it represents — thicker means more. Used to trace how energy, money, materials, or users move through the stages of a process and where the largest transfers happen.

// 01The chart

What it looks like

Example — Website traffic source → landing page → outcomeMay 2025
OrganicPaidSocialEmailHomepageBlogProductPricingSignupBounceBrowse

A Sankey diagram tracing website visitors from traffic source through landing page to final outcome. Band width encodes visitor volume; the highlighted Organic-to-Homepage band is the dominant flow.

// 02Definition

What is a Sankey diagram?

A Sankey diagram is a flow visualization where bands or arrows connect nodes, and the width of each band is proportional to the quantity it represents. Thicker bands carry larger flows; thinner bands carry smaller ones. The eye can instantly compare which paths are dominant, where flows split or merge, and where the largest transfers happen as a quantity moves through a system. The encoding is direct: the area of the band is the value.

The diagram reads left to right (or sometimes top to bottom). On the left side you see the sources — where things originate. On the right you see the destinations — where things end up. In between, each node is a stage of the process, and every band traces a single flow from one stage to the next. By convention, every node obeys a conservation principle: the total width entering a node should equal the total width leaving it. If quantity is genuinely lost between stages — wasted heat, dropped-off users, taxed dollars — that loss appears as a separate destination so the diagram remains balanced and the loss is honest.

Sankey diagrams excel at one specific job: showing how a total quantity distributes and redistributes across the stages of a process. They make it obvious where the biggest transfers happen, where losses occur, and how flows split at each stage. This is why they are beloved in energy analysis (Lawrence Livermore National Laboratory has published an annual U.S. energy Sankey for decades), in manufacturing, in supply-chain logistics, in budget visualization, and in web analytics. They reward dense, multi-source data — a problem that would need three or four separate bar charts can often collapse into one Sankey.

The price of that power is rigidity: Sankey diagrams only work when your data fits a small set of nodes (typically fewer than fifteen) and when the flows are strictly one-directional. Stretch them outside that shape and they get worse fast — bands tangle, labels collide, and the conservation principle quietly breaks. The rest of this guide is about how to live inside the Sankey’s sweet spot — and how to recognize when you have walked outside it.

Origin: The Sankey diagram is named after Irish-born British engineer Captain Matthew Henry Phineas Riall Sankey, who used it in 1898 to show the energy efficiency of a steam engine in the Minutes of Proceedings of the Institution of Civil Engineers. The earliest known diagram of this type, however, was created by French engineer Charles Joseph Minard in 1812 and most famously in his 1869 Carte figurative tracing Napoleon’s 1812 Russian campaign — a graphic Edward Tufte called the best statistical graphic ever drawn.

// 03When to use

When a Sankey diagram is the right call

Reach for a Sankey diagram whenever the question is about how a quantity moves through a process and you want a single visual that shows both the totals and the individual transfers. Below are the situations where it consistently wins against the alternatives.

✓Use a Sankey diagram when…
  • Showing how a total quantity distributes and redistributes across two or more stages
  • Tracing energy, material, money, or user flows through a system end to end
  • Identifying the largest transfers and where losses or drop-offs occur
  • Visualizing website user journeys, conversion funnels with multiple paths, or budget allocations
  • Comparing multiple paths from source to destination in one chart instead of many
  • Your audience needs to see both the overall pattern and the size of individual flows
  • The data obeys a conservation principle (inflow equals outflow at each node)

// 04When not to use

When a Sankey diagram is the wrong call

A Sankey diagram can technically render almost any flow data, but “technically possible” is not the same as “good idea.” Below are the cases where the Sankey actively hides information you need to communicate.

×Avoid a Sankey diagram when…
  • You have more than ~15 nodes — the bands tangle and the diagram becomes unreadable
  • Flows loop back on themselves (cycles) — Sankey assumes one-directional flow; use a chord or network graph instead
  • You only need to compare totals across categories — a bar chart reads faster
  • Your data has only two stages and one source per destination — a simple stacked bar is clearer
  • Exact values matter more than relative magnitudes — individual band widths are hard to read precisely
  • Your audience is unfamiliar with the format — a simpler chart often communicates the same insight
  • The flows are bidirectional or symmetric (trade matrices, migration both directions) — use a chord diagram
  • Your data lacks a single shared unit — width comparisons require everything be in the same currency, joules, or count

// 05Data requirements

What your data needs to look like

Before building the chart, your dataset needs to fit a specific shape. Use this checklist to confirm yours does.

Shape

A long-format link table with one row per band: source label, target label, numeric value. Optionally a category column to group nodes by type.

Minimum rows

3 links across at least 3 nodes. With one or two links there is nothing to flow through.

Maximum rows

~40 links across ~15 nodes. Past that, the diagram tangles and labels overlap.

Required fields
sourcerequired
string / categorical

The label of the node where a flow originates. Source labels appear in the leftmost (or earlier) columns of the diagram. Every source must also exist in your node list — don’t reference an undefined source.

targetrequired
string / categorical

The label of the node where a flow terminates. Target labels appear in the rightmost (or later) columns. The same label can appear as both a target and a source if it sits at an intermediate stage.

valuerequired
number (continuous, ≥ 0)

The quantity carried by this band. Must share a single unit across rows (kilowatt-hours, dollars, users, kilograms). Negative values are not allowed — if you need to show a loss, model it as a flow into a dedicated “waste” or “drop-off” node.

category
string (optional)

Optional grouping used to color nodes or flows by type (for example “renewable” vs “fossil” in an energy Sankey). Keep the number of categories to 2–5 to stay readable.

Example data
sourcetargetvaluecategory
OrganicHomepage4,200search
PaidHomepage2,400ads
SocialBlog1,900social
HomepageSignup2,100outcome
HomepageBounce2,800outcome
BlogBrowse2,200outcome

Tip: if your raw data is a long list of events, aggregate it into a link table first. Sankey diagrams plot summarized flows, not raw rows. Tools like SQL GROUP BY source, target or pandas .groupby(['source','target']).value.sum() are how you get from rows to chart-ready data.

// 06Anatomy

Parts of a Sankey diagram

Every Sankey diagram is built from the same handful of parts. Knowing the names makes it easier to talk about what to keep, what to drop, and what most templates are getting wrong.

ABCDE
A — Source nodes: Vertical bars on the leftmost column representing where flows originate
B — Flow bands: Curved bands connecting nodes — width encodes the quantity transferred
C — Intermediate nodes: Middle-column nodes where flows split, merge, or pass through
D — Flow direction: Left-to-right reading order shows the progression through process stages
E — Destination nodes: Vertical bars on the rightmost column representing final outcomes

// 07Step-by-step

Step-by-step: how to build a good Sankey diagram

A nine-step recipe that works regardless of the tool. Walk through it the first few times and the moves become automatic; skip steps and the diagram usually shows it.

  1. 1

    Pick the question your diagram answers

    A Sankey diagram answers “how does this quantity distribute across the stages of a process, and where do the largest transfers happen?” Write that question down before you draw anything. If the question is about ranking categories or change over time, switch to a different chart now — not after you build it.
  2. 2

    Reshape the data into a link table

    Aggregate the raw events into a tidy table with three columns: source, target, value. Every row is one band. Group small flows below a threshold (say, less than 1% of the total) into an “Other” node so the diagram doesn’t collapse under thread-thin bands.
  3. 3

    Decide on the number of stages

    Two stages (source → destination) is the simplest case. Three or more stages let you show how flows redistribute through intermediate nodes. Resist the urge to add stages just because your data has them — each stage roughly halves the bandwidth available for each band.
  4. 4

    Verify conservation at every node

    For every node, the sum of inflows should equal the sum of outflows. If your data shows a real loss — wasted heat, dropped-off users, taxed dollars — model it as an explicit “loss” destination so the diagram stays balanced and the loss is honest.
  5. 5

    Order the nodes within each column

    Most layout algorithms minimize band crossings automatically; if yours doesn’t, sort nodes by total flow (descending) within each column. Then nudge a few by hand to reduce overlap. The diagram should look more like a fan than a tangle.
  6. 6

    Choose a color encoding with intent

    The simplest scheme paints all bands the same neutral and uses one accent color to highlight a specific flow your headline is about. If you need to encode a category (e.g. renewable vs fossil), assign a small palette to nodes and let bands inherit the source color.
  7. 7

    Label nodes directly, not in a legend

    Every node should carry its name and (if there’s room) its total quantity. Avoid an off-chart legend — readers shouldn’t have to bounce between the diagram and a key to interpret what they are seeing.
  8. 8

    Annotate the largest flows

    Pick the two or three biggest bands and overlay their numeric values. The eye reads width well at the band level and badly at the absolute level — give it a few anchors to calibrate against.
  9. 9

    Add a takeaway title and ship

    “U.S. energy flow, 2024” is a label. “Two-thirds of U.S. energy is wasted as rejected heat” is a takeaway. Lead with the takeaway, put the descriptive label as a subtitle, and verify the diagram is still readable at the size your reader will see it.

// 08Real-world examples

Where you’ll see Sankey diagrams used

Sankey diagrams show up in four places more than anywhere else: energy and engineering, web analytics, government finance, and supply-chain logistics. Each context has its own conventions, and they all reward the same fundamentals.

01

Energy: U.S. national energy flow

Lawrence Livermore National Laboratory publishes an annual Sankey showing how U.S. primary energy flows from sources (petroleum, natural gas, coal, nuclear, renewables) through sectors (residential, commercial, industrial, transportation) to end uses and rejected energy. The diagram makes a striking point: roughly two-thirds of all energy consumed in the U.S. is wasted as heat. It is the canonical real-world Sankey.

Energy & Engineering
02

Web analytics: Conversion funnel paths

A growth team uses a Sankey to trace visitors from acquisition channel (organic, paid, social, email) through landing page (homepage, blog, product, pricing) to outcome (signup, browse, bounce). The widest bands reveal which channels actually drive conversions and which look big but bounce — information that is invisible in a stack of conversion-rate bar charts.

Web Analytics
03

Government: Federal budget allocation

A news outlet publishes a Sankey of federal tax revenue flowing into departments and programs. Readers can instantly see that defense, social security, and Medicare dominate, and they can trace each tax type (income, payroll, corporate) to its destination. The diagram answers “where does my tax dollar go?” in one glance.

Public Finance
04

Supply chain: Global commodity flows

A logistics analyst maps how a commodity (say, soybeans) flows from producing countries (Brazil, U.S., Argentina) through ports and shipping lanes to consuming countries (China, EU, Mexico). The widths reveal trade dependencies that bar charts of imports and exports alone never quite capture.

Supply Chain

// 09Variations

Types of Sankey diagrams

The basic Sankey diagram has several important variants, each suited to slightly different data situations. The headline rule stays the same: pick the variant whose strengths match your question.

Two-stage Sankey

The simplest form — direct flows from one set of categories to another without intermediate stages. Often the best place to start.

Multi-stage Sankey

Multiple intermediate columns show flows through several process stages — common in manufacturing, energy, and supply-chain analysis.

loss

Energy / loss Sankey

Adds an explicit “rejected energy” or “loss” destination so waste is shown rather than implied. The classic engineering use case Sankey himself drew.

Alluvial diagram

A specialized Sankey for tracking how categorical group memberships shift between stages — e.g., survey respondents moving between income brackets over time.

// 10Comparisons

Sankey diagram vs other chart types

Sankey diagrams get confused with several other flow-style charts because they all use bands, ribbons, or arrows. The differences matter — picking the wrong one changes what your reader is allowed to conclude.

Sankey vs alluvial diagram

Alluvial diagrams are a special case of Sankey designed for showing how categorical group memberships shift between stages. Sankey is the broader family and accepts any quantity flow, including non-categorical transfers like energy or money.

Sankey diagram

Generic flow visualization. Nodes can differ at every stage, and the data may be physical units like kilowatt-hours or kilograms rather than counts of items.

  • Bands carry any quantity (energy, money, users)
  • Stages can use different node sets
  • Best for energy, supply-chain, and budget flows

Alluvial diagram

Specialized Sankey for categorical group changes between stages. The same node set typically reappears at each stage, and the bands track group membership across time.

  • Bands carry counts of items moving between groups
  • Same node set reappears at each stage
  • Best for survey responses and cohort transitions

Sankey vs chord diagram

Both encode flow magnitude with band width, but Sankeys arrange nodes in columns and assume one-directional flow, while chord diagrams arrange nodes in a circle and accept bidirectional flow. Choose based on whether your flows are directed or symmetric.

Sankey diagram

Linear, left-to-right (or top-to-bottom) layout. Sources sit on one side, destinations on the other, and bands always flow forward. Cycles are not supported.

  • Linear, columnar layout
  • Strictly directed, one-way flow
  • Easy to add intermediate process stages

Chord diagram

Circular layout where every node sits on the perimeter and bands run inside the circle. A flow from A to B and a return flow from B to A both appear as separate ribbons.

  • Circular, polar layout
  • Bidirectional flow is natural
  • Best for trade matrices and migration

Sankey vs flow chart (process)

A flow chart explains the steps of a process; a Sankey diagram explains the magnitudes that flow through one. They look similar at a glance and answer entirely different questions — picking the wrong one wastes the page.

Sankey diagram

Quantitative. Every band has a width that encodes a number. Use when the question is “how much flows where?”

  • Width = quantity (kilowatt-hours, dollars, users)
  • Shows where the largest transfers happen
  • Conservation: inflow equals outflow at each node

Flow chart

Qualitative. Boxes and arrows show the order of steps and decisions. Use when the question is “what happens, in what order?”

  • Boxes = steps; diamonds = decisions
  • Arrow direction shows sequence, not magnitude
  • Best for procedures, algorithms, swim lanes

Sankey vs stacked bar over time

When the question is “how has the composition of a total changed across a few stages?” both work. Pick a stacked bar when stages are time periods and you want to compare totals; pick a Sankey when individual flows between specific source/target pairs are the story.

Sankey diagram

Best when the reader needs to trace a specific source to a specific destination, or when the flows redistribute heavily between stages.

  • Bands trace individual source→target flows
  • Easy to spot redistribution and crossover
  • Heavy when there are more than ~4 time stages

Stacked bar over time

Best when the reader needs to compare totals between time periods and just wants a sense of compositional shift.

  • Bars = totals; segments = composition
  • Easy to compare totals between periods
  • Cannot trace which source went to which target

// 11Common mistakes

Mistakes to watch out for

Almost every broken Sankey diagram in the wild fails the same handful of ways. If you only memorize six rules, make them these.

Too many nodes and bands

Adding every possible category creates a tangled mess of overlapping bands. Past about fifteen nodes, individual flows become impossible to trace and labels collide. Limit yourself to the top 8–12 nodes per stage and group minor categories into a single “Other” node so the chart stays legible.

Inconsistent or aesthetic widths

If band widths don’t accurately represent quantities, the diagram becomes misleading. Some templates auto-shrink bands when labels collide or auto-balance them for visual symmetry. Always verify that width is strictly proportional to value — if a band looks wrong, fix the data or layout, never the width.

Breaking the conservation principle

By convention, the total inflow at each node should equal the total outflow. If your data shows a real loss between stages — wasted heat, dropped-off users, taxed dollars — model that loss as an explicit destination node so the diagram balances and the loss is honest. Ignoring conservation creates confusion about where quantity went.

Using Sankey for cyclic flows

Sankey diagrams are designed for directed, acyclic flows. If your process has feedback loops (returning customers, recycling, conversational turn-taking), the layout will not converge and the bands will overlap nonsensically. Use a chord diagram, network graph, or dedicated cycle visualization instead.

Hiding quantities behind hover-only tooltips

An interactive Sankey that only shows numeric values on hover excludes touch users, screen-reader users, and printed copies. Bake the most important values directly into the SVG (node totals, top-three flow widths) and let interactivity reveal the rest. The chart should still tell its story when paused.

Missing labels or off-chart legends

Without clear node labels and color-key context, readers cannot interpret the diagram. Direct labeling beats off-chart legends every time — every node should carry its name and, where space allows, its total quantity. Off-chart legends double the eye travel and wreck the reading flow.

Color choices that fail color-blind readers

Default Sankey palettes (especially Plotly’s) often pair reds and greens that are indistinguishable for the most common color-vision deficiencies. Pick a colorblind-safe palette like ColorBrewer or Tableau 10, and reinforce category encoding with patterns, label position, or distinct strokes.

// 12Accessibility

Accessibility checklist

Run through this list before publishing. The chart should still communicate its message to readers using assistive technology, color-blind users, keyboard navigation, and reduced-motion settings.

  • ✓

    Color contrast meets WCAG AA

    WCAG 1.4.3
    Band fills and node fills against the chart background should reach at least 3:1 contrast for graphical objects. Direct labels and titles should reach 4.5:1 for body text and 3:1 for large text. Plotly’s default Sankey palette fails on light backgrounds — audit it before you ship.
  • ✓

    Do not rely on color alone

    WCAG 1.4.1
    When color encodes node category (renewable vs fossil, organic vs paid), reinforce it with a pattern fill, a textured stroke, or a clearly different label position. Roughly 1 in 12 men and 1 in 200 women have some form of color-vision deficiency, and Sankey palettes often pick reds and greens together.
  • ✓

    Provide a text alternative listing the top flows

    WCAG 1.1.1
    Add an accessible name (alt text or aria-label) that summarizes the diagram’s key flows by volume, not its chart type. “Sankey of energy flow” is weak; “Top three flows: petroleum to transportation 24 quads, natural gas to industrial 15 quads, coal to electricity 9 quads. Total rejected energy: 67 percent.” is strong.
  • ✓

    Expose the underlying data

    WCAG 1.3.1
    Place the source/target/value link table beneath or next to the diagram, or expose it via a hidden table that screen readers can navigate row by row. Many readers will copy this data rather than re-key it from the visual.
  • ✓

    Make every node focusable with the keyboard

    WCAG 2.1.1
    If the Sankey is interactive, every node and every link should be reachable with the Tab key. The focused element should announce its name and connected flow values via aria-label, and tooltips should appear on focus, not only on hover.
  • ✓

    Provide aria-labels on nodes and links

    WCAG 4.1.2
    Each node’s SVG element should expose an aria-label like “Node: Petroleum, total outflow 33.0 quads”. Each link’s aria-label should read “Flow from Petroleum to Transportation, 24.0 quads” so a screen-reader user can traverse the graph.
  • ✓

    Respect prefers-reduced-motion

    WCAG 2.3.3
    If bands fade or grow on load, gate the animation behind a prefers-reduced-motion: no-preference media query so motion-sensitive readers see the final state immediately. Skip drag-to-reorder animations entirely when the user prefers reduced motion.
  • ✓

    Make the diagram resizable and zoomable

    WCAG 1.4.4
    Use a responsive viewBox so the diagram scales with the viewport. Verify it remains legible at 200% browser zoom — thin bands tend to disappear when scaled down. On mobile, consider a horizontal scroll container instead of cramming a 12-node Sankey into 360 pixels.

// 13Best practices

Design and craft tips

The mistakes section above tells you what to avoid. The list below is the positive version: the small set of habits that separate a good Sankey diagram from a passable one.

Do

Limit the diagram to ~12 nodes per stage

Past about 12 nodes per stage, bands collapse into a tangle. Group small flows into an “Other” node and ship a clean diagram instead of an exhaustive one.
×Don’t

Adjust band widths for aesthetics

Width is the encoding. If you fudge widths to make the diagram look balanced, you have broken the only thing the chart was doing. Always render width strictly proportional to value.
Do

Show losses as explicit destination nodes

If quantity is genuinely lost between stages (rejected energy, dropped-off users, taxed dollars), draw a dedicated “loss” destination so the diagram remains balanced and the loss is honest.
×Don’t

Use a Sankey for cyclic flows

Sankey diagrams assume one-directional flow. If your process has feedback loops (carbon recycling, chat conversations, returning customers), use a chord diagram or network graph instead.
Do

Label nodes directly with totals

Every node should carry its name and, where space allows, its total quantity. Direct labels remove the bounce between the diagram and an off-chart legend.
×Don’t

Hide quantities behind hover tooltips

Hover-only quantities exclude touch users, screen-reader users, and printed copies. Bake the most important values directly into the SVG and let interactivity reveal the rest.
Do

Lead with a takeaway title

Use the chart title to state the conclusion (“Two-thirds of U.S. energy is wasted as rejected heat”) and a smaller subtitle for the descriptive label (“U.S. primary energy flow, 2024”).
×Don’t

Animate every band on load

Default-on motion exhausts attention and breaks for prefers-reduced-motion users. Render the final state immediately and reserve animation for state transitions (filtering, drilldown).

// 15Tool instructions

How to build it in your tool of choice

Sankey is one of the few common chart types that no spreadsheet ships natively. The recipes below get you to a clean, balanced, label-direct Sankey diagram in each of the most common platforms — with workarounds for tools that don’t have a built-in Sankey type.

Microsoft Excel

Spreadsheet — ~15 min (with add-in)
  1. 01Lay out your data as a link table with three columns: Source, Target, and Value, with one row per band.
  2. 02Excel does not ship a native Sankey type, so install a third-party add-in: open Insert → Get Add-ins and search for Sankey (Peltier Tech Charts, ChartExpo, and Vizzlo are common).
  3. 03Highlight the link table including headers, then choose the Sankey option from the add-in’s gallery.
  4. 04Map the Source, Target, and Value columns to the add-in’s expected fields and accept the default layout.
  5. 05Adjust node colors so categories (renewable vs fossil, paid vs organic) get distinct hues, and check that band widths are still proportional to your values.
  6. 06Edit the chart title to a takeaway sentence, verify conservation (inflow equals outflow at every node), and resize the visual so labels don’t overlap.

Tip: if you can’t install add-ins, build the Sankey in Power BI Desktop instead and paste the published image into your Excel report.

Google Sheets

Spreadsheet — ~10 min (with add-on)
  1. 01Lay out a link table with three columns: Source, Target, Value — one row per band, headers in row 1.
  2. 02Google Sheets has no native Sankey chart type. Open Extensions → Add-ons → Get add-ons and install ChartExpo, Power Tools, or Vizzlo.
  3. 03Launch the add-on, choose Sankey diagram from its gallery, and select your link table as the data source.
  4. 04Map Source, Target, and Value columns to the add-on’s prompts and click Create chart.
  5. 05Customize node colors and labels in the add-on’s settings panel — stick to a small palette so the diagram stays calm.
  6. 06Use Insert → Image to embed the rendered Sankey back into your Sheet, or export it as PNG/SVG for a slide deck.

Sheets’ native chart picker only ships rough “flow”-style options through Org chart and Geo chart — neither is a real Sankey, so reach for an add-on.

Python (Plotly)

Code — ~8 min
  1. 01Install Plotly with pip install plotly. Matplotlib does not have a native Sankey trace, so Plotly is the easiest path.
  2. 02Create three parallel lists: a list of unique node labels, plus source-index, target-index, and value lists describing every band.
  3. 03Build the trace with go.Sankey(node=dict(label=labels, color=colors), link=dict(source=src, target=tgt, value=val)).
  4. 04Wrap the trace in a go.Figure(), add a title with fig.update_layout(title_text=...), and call fig.show() in a notebook or fig.write_html() to save.
  5. 05Color nodes by category by passing a colors list parallel to labels, and color bands by source by setting link.color = [node_colors[s] for s in src].
  6. 06For accessibility, set a meaningful figure title and supply alt text via the embedding HTML — Plotly’s SVG output respects role=”img” on the wrapping element.

Tip: if you prefer a static figure, plotly.io.write_image() can export the Sankey as PNG/SVG. Install kaleido (pip install -U kaleido) first.

R (networkD3 / ggalluvial)

Code — ~7 min
  1. 01For interactive Sankeys, install networkD3 with install.packages('networkD3') and load it with library(networkD3).
  2. 02For static, publication-ready Sankeys (alluvial flavor), install ggalluvial: install.packages('ggalluvial') and library(ggalluvial).
  3. 03With networkD3, build a links data frame with source, target, value columns and a nodes data frame with name, then call sankeyNetwork(Links=links, Nodes=nodes, Source='source', Target='target', Value='value', NodeID='name').
  4. 04With ggalluvial, pivot your data to long format and pass it to ggplot() with geom_alluvium() and geom_stratum() layered on top.
  5. 05Color nodes or strata by category with scale_fill_brewer() to pick a colorblind-safe ColorBrewer palette.
  6. 06For networkD3, save the widget with htmlwidgets::saveWidget() to ship as a standalone HTML file. For ggalluvial, ggsave() exports to PNG, PDF, or SVG.

Tip: ggalluvial is best when stages share the same node set (true alluvial); networkD3 is best when stages have different node sets (general Sankey).

JavaScript (D3 + d3-sankey)

Code — ~15 min
  1. 01Install D3 and the d3-sankey plugin: npm i d3 d3-sankey, or include the CDN script tags in your HTML.
  2. 02Build a graph object with two arrays: nodes (each {name}) and links (each {source, target, value}).
  3. 03Configure the layout with const sankey = d3.sankey().nodeWidth(15).nodePadding(12).extent([[1,1],[width-1,height-1]]); then run const graph = sankey({nodes: ..., links: ...}).
  4. 04Render nodes as <rect> elements positioned with x0, y0, x1, y1 from the layout, and render links with d3.sankeyLinkHorizontal() as the path generator.
  5. 05Color nodes by category with d3.scaleOrdinal(d3.schemeTableau10) and let links inherit the source node color with reduced opacity (~0.3 to 0.5).
  6. 06For accessibility, set role=”img” on the SVG, give it an aria-label that lists the top three flows, and add tabindex=”0” to nodes with their own aria-labels.

Tip: Observable’s @d3/sankey-diagram notebook is the canonical reference. Fork it and replace the data to bootstrap a new chart in five minutes.

Tableau

BI — ~25 min (polygon hack)
  1. 01Tableau does not have a native Sankey mark, so the standard trick is the “polygon Sankey” hack — reshape your link table to plot bezier curves as polygons.
  2. 02In Tableau Prep or with a calculation, generate a long densified data set: each link gets ~50 evaluation points along its curve, with t in [0,1].
  3. 03Drop the Path field on the Marks card with mark type Polygon, put t on Columns, and y(t) on Rows where y(t) is the bezier interpolation between source and target row positions.
  4. 04Map source position and target position into Detail and use a curve calculation: y(t) = source + (target - source) * (3*t^2 - 2*t^3).
  5. 05Encode value with size by mapping link value to the polygon’s vertical extent at every t, and color polygons by the source dimension.
  6. 06Verify in Show Me that you have polygons, not lines or bars; for production work, the Tableau Public gallery has reusable templates.

Tip: if the polygon hack is too painful, render the Sankey in d3-sankey and embed it as a Web Page object inside a Tableau dashboard — it scrolls and filters cleanly.

Power BI

BI — ~6 min (custom visual)
  1. 01In Power BI Desktop, click the … (more options) icon in the Visualizations pane and choose Get more visuals from AppSource.
  2. 02Search for Sankey — Microsoft publishes a free Sankey visual; alternatives like Sankey Chart by ChartExpo and MAQ Software’s Sankey Bar Chart are also popular.
  3. 03Drop the imported visual onto your report canvas and drag your Source field to the Source well, your Target field to the Destination well, and your Value field to the Weight well.
  4. 04Open the Format pane, expand Data colors, and assign distinct colors per source category so bands inherit the source color.
  5. 05Under Format → Data labels, toggle them on and choose to display node totals at each node so readers don’t need to hover.
  6. 06Edit the title under Format → Title and write a takeaway sentence rather than the default field name.

Tip: Microsoft’s Sankey custom visual is certified, so it works in Publish to Web and embedded scenarios. ChartExpo offers more polish but requires a free account.

// 16Code examples

Working code in the most common stacks

Three runnable snippets that produce the same chart — the website-traffic Sankey from the hero, with Organic colored as the dominant source. Copy, paste, and replace the data with yours.

sankey_diagram.py
import plotly.graph_objects as go

# A small website-traffic flow: source -> landing page -> outcome.
nodes = [
    "Organic", "Paid", "Social", "Email",          # 0..3 sources
    "Homepage", "Blog", "Product", "Pricing",       # 4..7 landing pages
    "Signup", "Browse", "Bounce",                   # 8..10 outcomes
]

# (source_index, target_index, value)
links = [
    (0, 4, 4200), (0, 5, 1800), (0, 6, 1100),
    (1, 4, 2400), (1, 5,  600), (1, 7, 1500),
    (2, 5, 1900), (2, 6,  700),
    (3, 6,  900), (3, 7,  300),
    (4, 8, 2100), (4, 9, 2800), (4, 10, 1700),
    (5, 8,  900), (5, 9, 2200), (5, 10, 1200),
    (6, 8, 1400), (6, 9, 1100), (6, 10,  300),
    (7, 8,  700), (7, 9,  900), (7, 10,  200),
]

# Brand palette: accent for the dominant source, neutrals for the rest.
node_colors = [
    "#c94a2e", "#e8c4b8", "#f5ede9", "#f0e4e0",
    "#1a1a18", "#6b6b67", "#6b6b67", "#6b6b67",
    "#2e7d52", "#b0b0aa", "#c94a2e",
]
link_colors = ["rgba(201,74,46,0.25)" if s == 0 else "rgba(180,170,160,0.18)"
               for s, _, _ in links]

fig = go.Figure(go.Sankey(
    arrangement="snap",
    node=dict(
        pad=14, thickness=14,
        line=dict(color="#1a1a18", width=0.5),
        label=nodes, color=node_colors,
    ),
    link=dict(
        source=[s for s, _, _ in links],
        target=[t for _, t, _ in links],
        value =[v for _, _, v in links],
        color=link_colors,
        hovertemplate="%{source.label} → %{target.label}<br>%{value:,} visitors<extra></extra>",
    ),
))

fig.update_layout(
    title_text="Organic search drives more than half of all signups — May 2025",
    font_family="Inter, system-ui, sans-serif",
    font_size=12,
    margin=dict(l=10, r=10, t=60, b=10),
)
fig.write_html("sankey_diagram.html")
fig.show()
$ python sankey_diagram.py

// 17 — FAQs

Frequently asked questions

What is a Sankey diagram?+

A Sankey diagram is a flow visualization where bands or arrows connect nodes and the width of each band is proportional to the quantity it represents. Thicker bands carry larger flows; thinner bands carry smaller ones. The diagram lets the eye instantly compare which paths are dominant and where flows split, merge, or get lost between stages.

When should you use a Sankey diagram?+

Use a Sankey diagram when you want to show how a total quantity distributes and redistributes across the stages of a process. Good fits include energy and material flows, web analytics user journeys, conversion funnels with multiple paths, government budget allocations, immigration and migration flows, and supply-chain transfers between facilities.

When should you avoid a Sankey diagram?+

Avoid a Sankey diagram when you have more than ~15 nodes (the bands tangle and become unreadable), when flows loop back on themselves (Sankey assumes one-directional flow — use a chord diagram instead), when you only need to compare totals (a bar chart is faster), or when your audience is unfamiliar with the format and a simpler chart would communicate the same insight.

How is a Sankey diagram different from an alluvial diagram?+

Alluvial diagrams are a special case of Sankey designed to show how categorical group memberships change between two or more stages — for example, how survey respondents move between income brackets between 2010 and 2020. Sankey diagrams are the broader family and can show any quantity flow, including non-categorical transfers such as energy or money. Visually they look almost identical, but alluvial diagrams typically share node sets across stages while Sankeys often have different node sets at each stage.

How is a Sankey diagram different from a chord diagram?+

A Sankey diagram shows directed, one-way flows between source and destination nodes arranged in columns. A chord diagram shows bidirectional flows between nodes arranged around a circle, so a flow from A to B and a return flow from B to A both appear as separate ribbons inside the same circle. Use Sankey for one-directional process flows, chord for symmetric matrices like trade between countries or migration between regions.

How is a Sankey diagram different from a flow chart?+

A flow chart visualizes a process as a sequence of decisions and actions — it tells you what happens but says nothing about quantities. A Sankey diagram visualizes the magnitude of flow through a process — every band has a width that encodes a number. Use a flow chart when explaining the steps of a procedure; use Sankey when the question is 'how much flows where?'

How many nodes can a Sankey diagram hold?+

A Sankey diagram is comfortable with five to twelve nodes per stage and starts to break down past about fifteen nodes total. Past that point, the bands begin to overlap, the labels collide, and individual flows become impossible to trace. If you have more nodes, group small categories into an 'Other' bucket, drop nodes below a threshold, or split the diagram into separate small multiples.

Should the inflow and outflow at each node match?+

Yes — by convention every Sankey node should obey the conservation principle: the total width entering a node should equal the total width leaving it. If quantity is genuinely lost between stages (waste heat in an engine, drop-offs in a funnel), show that loss as a separate destination node with its own band so the diagram remains balanced and the loss is explicit.

Why are Sankey diagrams used so much in energy analysis?+

Captain Matthew Sankey invented the modern form of the diagram in 1898 specifically to visualize the energy efficiency of a steam engine. The format is a near-perfect fit for energy: inputs (fuel sources) on the left, useful work and rejected heat on the right, and the width of every band shows where each unit of energy goes. Lawrence Livermore National Laboratory still publishes annual U.S. energy-flow Sankeys — they are the canonical example of the form.

Can a Sankey diagram be interactive?+

Yes — interactive Sankey diagrams support hovering or clicking on a node to highlight its inflows and outflows, dragging nodes to reorder them, and tooltips that show the exact quantity for each band. Libraries like d3-sankey, Plotly, and Highcharts all ship interactive variants. Make sure the interactions are keyboard accessible: every node should be focusable with Tab and the highlight should fire on focus, not only on hover.

What category of chart is a Sankey diagram?+

Sankey diagrams belong to the Flow & Process family of charts. Charts in that family — alluvial, chord, network, parallel sets — are designed to answer the same kind of 'how does quantity move through a system' question, and they often work as alternatives when one doesn't quite fit your data shape.

What's the best library for building Sankey diagrams in code?+

For Python, plotly.graph_objects.Sankey is the easiest path because Matplotlib has no native Sankey geometry. For R, networkD3::sankeyNetwork() and ggalluvial both work, with ggalluvial fitting neatly into the tidyverse. For the web, d3-sankey is the dominant low-level library and Plotly.js is the easiest high-level option. Highcharts and ECharts both ship Sankey types as well.

How do you read a Sankey diagram?+

Start by reading the labels on the leftmost (source) and rightmost (destination) columns. Then trace the widest bands first to find the dominant flows. Look at each node to see whether it splits flows outward or merges them inward, and check whether the right-hand side is thinner than the left to spot losses. Finally, hover or zoom in to read individual flow quantities when precision matters.

// 18References

References and further reading

Primary sources, reference texts, and the official documentation for the libraries and tools referenced throughout this guide.

  • Encyclopedia entry covering history, naming, and notable examples — a solid neutral starting point with citations to primary sources including Sankey’s original 1898 paper.
    https://en.wikipedia.org/wiki/Sankey_diagram
  • Captain Sankey’s original paper, published in the Minutes of Proceedings of the Institution of Civil Engineers, where the eponymous diagram first appeared as a way to show steam-engine energy losses.
    https://archive.org/details/minutesofproceed12701898inst
  • Minard’s 1869 map of Napoleon’s 1812 Russian campaign — widely cited as a Sankey-style flow diagram, predating Sankey’s own use of the form by nearly thirty years. Tufte called it the best statistical graphic ever drawn.
    https://en.wikipedia.org/wiki/Charles_Joseph_Minard#Napoleon%27s_Russian_campaign
  • Annual U.S. energy-flow Sankey diagrams covering primary energy sources, sectors, and end uses. The canonical real-world example of the format and a great template to study.
    https://flowcharts.llnl.gov/
  • Open-source poster categorizing chart types by intent. Sankey diagrams sit firmly in the Flow family alongside chord, network, and waterfall charts.
    https://github.com/Financial-Times/chart-doctor/tree/main/visual-vocabulary
  • The official d3-sankey plugin and Observable notebooks. The reference implementation for any web Sankey diagram, with examples for both static and interactive flavors.
    https://github.com/d3/d3-sankey
  • Official Plotly documentation for the Sankey trace. Covers node/link configuration, color, hover templates, and exporting to HTML or PNG.
    https://plotly.com/python/sankey-diagram/
  • Documentation for the ggalluvial R package. The clearest reference for the alluvial flavor of Sankey, with a tutorial on long-format data and stratum aesthetics.
    https://corybrunson.github.io/ggalluvial/
  • Web Accessibility Initiative guidance on making complex visuals accessible. The patterns for long descriptions, data tables, and aria-labels apply directly to Sankey diagrams.
    https://www.w3.org/WAI/tutorials/images/complex/
  • Hands-on tutorial with real published examples. Especially useful for choosing the right level of granularity and avoiding band-tangle messes.
    https://academy.datawrapper.de/article/258-how-to-create-a-sankey-diagram
  • Tufte’s foundational text on data graphics. The chapters on Minard’s Napoleon march set the standard for how flow visualizations should respect data integrity.
    https://www.edwardtufte.com/book/the-visual-display-of-quantitative-information/