Multi-dimensionalIntermediate

Mosaic Plot

A tiled area chart for categorical data where each tile’s width and height represent the proportions of two variables — revealing associations, independence, and deviations in contingency tables.

// 01 — The chart

What it looks like

Example — Titanic survival by class and gender2 × 3 contingency table
62%38%41%59%26%74%1st Class2nd Class3rd ClassSurvivedPerishedSurvivedPerished

A mosaic plot of Titanic survival data. Column width encodes class size; tile height encodes survival proportion. First-class passengers had much higher survival rates.

// 02 — Definition

What is a mosaic plot?

A mosaic plot (also called a Marimekko chart in some contexts) is a graphical display of data from a contingency table — a table showing the frequency distribution of two or more categorical variables. The total area of the chart represents the entire dataset, and this area is recursively subdivided into tiles.

The width of each column represents the proportion of one variable (e.g., passenger class), and the height of each tile within a column represents the conditional proportion of the other variable (e.g., survival rate within that class). The area of each tile is therefore proportional to the cell count in the contingency table.

Mosaic plots are particularly good at revealing associations between categorical variables. If the variables are independent, all columns will have tiles of the same height. Deviations from uniform heights indicate association — and the direction and magnitude of deviation shows the nature of that association.

Origin: Developed by John Hartigan and Beat Kleiner in 1981, later extended by Michael Friendly in 1994 with the addition of color shading to show deviations from expected values (extended mosaic plots).

// 03 — Anatomy

Parts of a mosaic plot

ABC
A — Column (primary variable): Width represents the marginal proportion of the first categorical variable
B — Column width: Wider columns represent categories with more observations in the dataset
C — Tile (cell): Each tile's area is proportional to the count in that cell of the contingency table

// 04 — Usage

When to use it — and when not to

✓Use a mosaic plot when…
  • Visualizing the relationship between two categorical variables
  • Testing for independence in contingency table data
  • Showing both marginal and conditional proportions in one chart
  • Comparing distributions across groups of unequal size
  • Communicating chi-squared test results visually
×Avoid a mosaic plot when…
  • You have more than 4–5 categories per variable — tiles become too small to read
  • Your audience isn't familiar with area-based encoding — bars are simpler
  • You need to show exact counts — the area metaphor makes precise reading hard
  • Variables have many levels, creating an overwhelming number of tiny tiles
  • Data is continuous, not categorical — use density or scatter plots instead

// 05 — Reading guide

How to read a mosaic plot

Follow these steps to interpret any mosaic plot.

1.

Read the variable labels

The x-axis variable determines column widths, and the y-axis variable determines tile heights within each column.

2.

Compare column widths

Wider columns represent more prevalent categories. This gives you the marginal distribution of the x-axis variable.

3.

Compare tile heights across columns

If tile heights are the same across all columns, the variables are independent. If heights vary, there's an association.

4.

Look for color shading (extended mosaic)

In extended mosaic plots, color intensity shows deviations from expected values. Blue means more than expected; red means fewer.

5.

Interpret tile area

Each tile's area is proportional to the cell count. Large tiles represent common combinations; tiny tiles represent rare ones.

// 06 — Pitfalls

Common mistakes

Too many categories

Mosaic plots work best with 2–4 categories per variable. Beyond that, tiles become unreadably small and the visual becomes a patchwork of meaningless slivers.

Confusing area with height

Readers often compare tile heights without accounting for widths. A tall narrow tile may represent fewer cases than a short wide one. Always read both dimensions.

Ignoring the independence baseline

Without understanding what the plot would look like under independence, you can't properly interpret deviations. Use extended mosaic plots with residual shading to make this explicit.

Using it for continuous data

Mosaic plots are designed for categorical data. Discretizing continuous variables to fit a mosaic plot usually loses important information.

// 07 — In the wild

Real-world examples

Epidemiological studies

Public health researchers use mosaic plots to display the relationship between risk factors (smoking status, age group) and disease outcomes, showing both prevalence and association.

Market segmentation

Marketers visualize the relationship between customer demographics and product preferences, with tile area showing segment sizes and height differences revealing preference patterns.

Educational outcomes

Education researchers use mosaic plots to show how student demographics relate to achievement levels, revealing disparities that might be hidden in tabular data.

// 08 — Quick reference

Key facts

Also known asMarimekko chart, Mekko chart
Best for2–3 categorical variables
Data typeContingency tables / cross-tabs
Key encodingArea = cell frequency
OriginHartigan & Kleiner, 1981
DifficultyIntermediate

// 09 — Variations

Variations of the mosaic plot

Extended mosaic plot (Friendly)

Adds color shading to tiles based on Pearson residuals — showing whether each cell has more (blue) or fewer (red) observations than expected under independence.

Fluctuation diagram

Uses equal-width columns but varies tile heights only, sacrificing the marginal proportion encoding for easier height comparison.

Nested mosaic plot

Extends to three or more variables by recursively subdividing tiles — first by variable A, then B within A, then C within B. Becomes complex with more than 3 variables.

// 10 — FAQs

Frequently asked questions

What is a mosaic plot?+

A mosaic plot (also called a Marimekko chart in some contexts) is a graphical display of data from a contingency table — a table showing the frequency distribution of two or more categorical variables. The total area of the chart represents the entire dataset, and this area is recursively subdivided into tiles.

When should you use a mosaic plot?+

Use a mosaic plot when visualizing the relationship between two categorical variables. It also works well when testing for independence in contingency table data, and when showing both marginal and conditional proportions in one chart.

When should you avoid a mosaic plot?+

Avoid a mosaic plot when you have more than 4–5 categories per variable — tiles become too small to read. It is also a poor fit when your audience isn't familiar with area-based encoding — bars are simpler, or when you need to show exact counts — the area metaphor makes precise reading hard.

What is another name for a mosaic plot?+

Mosaic Plot is also known as Marimekko chart, Mekko chart. The name varies between fields, but the visualisation technique is the same.

What size of dataset works best for a mosaic plot?+

Mosaic Plot works best for 2–3 categorical variables. Outside that range the chart either looks empty or becomes too cluttered to read clearly.

Is a mosaic plot suitable for dashboards?+

Yes — a mosaic plot can work well in dashboards as long as the panel is large enough for readers to perceive the encoded values, has a clear title, and includes the legend or axis labels needed to interpret it.