Mosaic Plot
A tiled area chart for categorical data where each tile’s width and height represent the proportions of two variables — revealing associations, independence, and deviations in contingency tables.
// 01 — The chart
What it looks like
A mosaic plot of Titanic survival data. Column width encodes class size; tile height encodes survival proportion. First-class passengers had much higher survival rates.
// 02 — Definition
What is a mosaic plot?
A mosaic plot (also called a Marimekko chart in some contexts) is a graphical display of data from a contingency table — a table showing the frequency distribution of two or more categorical variables. The total area of the chart represents the entire dataset, and this area is recursively subdivided into tiles.
The width of each column represents the proportion of one variable (e.g., passenger class), and the height of each tile within a column represents the conditional proportion of the other variable (e.g., survival rate within that class). The area of each tile is therefore proportional to the cell count in the contingency table.
Mosaic plots are particularly good at revealing associations between categorical variables. If the variables are independent, all columns will have tiles of the same height. Deviations from uniform heights indicate association — and the direction and magnitude of deviation shows the nature of that association.
Origin: Developed by John Hartigan and Beat Kleiner in 1981, later extended by Michael Friendly in 1994 with the addition of color shading to show deviations from expected values (extended mosaic plots).
// 03 — Anatomy
Parts of a mosaic plot
// 04 — Usage
When to use it — and when not to
- Visualizing the relationship between two categorical variables
- Testing for independence in contingency table data
- Showing both marginal and conditional proportions in one chart
- Comparing distributions across groups of unequal size
- Communicating chi-squared test results visually
- You have more than 4–5 categories per variable — tiles become too small to read
- Your audience isn't familiar with area-based encoding — bars are simpler
- You need to show exact counts — the area metaphor makes precise reading hard
- Variables have many levels, creating an overwhelming number of tiny tiles
- Data is continuous, not categorical — use density or scatter plots instead
// 05 — Reading guide
How to read a mosaic plot
Follow these steps to interpret any mosaic plot.
Read the variable labels
The x-axis variable determines column widths, and the y-axis variable determines tile heights within each column.
Compare column widths
Wider columns represent more prevalent categories. This gives you the marginal distribution of the x-axis variable.
Compare tile heights across columns
If tile heights are the same across all columns, the variables are independent. If heights vary, there's an association.
Look for color shading (extended mosaic)
In extended mosaic plots, color intensity shows deviations from expected values. Blue means more than expected; red means fewer.
Interpret tile area
Each tile's area is proportional to the cell count. Large tiles represent common combinations; tiny tiles represent rare ones.
// 06 — Pitfalls
Common mistakes
Too many categories
Mosaic plots work best with 2–4 categories per variable. Beyond that, tiles become unreadably small and the visual becomes a patchwork of meaningless slivers.
Confusing area with height
Readers often compare tile heights without accounting for widths. A tall narrow tile may represent fewer cases than a short wide one. Always read both dimensions.
Ignoring the independence baseline
Without understanding what the plot would look like under independence, you can't properly interpret deviations. Use extended mosaic plots with residual shading to make this explicit.
Using it for continuous data
Mosaic plots are designed for categorical data. Discretizing continuous variables to fit a mosaic plot usually loses important information.
// 07 — In the wild
Real-world examples
Epidemiological studies
Public health researchers use mosaic plots to display the relationship between risk factors (smoking status, age group) and disease outcomes, showing both prevalence and association.
Market segmentation
Marketers visualize the relationship between customer demographics and product preferences, with tile area showing segment sizes and height differences revealing preference patterns.
Educational outcomes
Education researchers use mosaic plots to show how student demographics relate to achievement levels, revealing disparities that might be hidden in tabular data.
// 08 — Quick reference
Key facts
// 09 — Variations
Variations of the mosaic plot
Extended mosaic plot (Friendly)
Adds color shading to tiles based on Pearson residuals — showing whether each cell has more (blue) or fewer (red) observations than expected under independence.
Fluctuation diagram
Uses equal-width columns but varies tile heights only, sacrificing the marginal proportion encoding for easier height comparison.
Nested mosaic plot
Extends to three or more variables by recursively subdividing tiles — first by variable A, then B within A, then C within B. Becomes complex with more than 3 variables.
// 10 — FAQs
Frequently asked questions
What is a mosaic plot?+
A mosaic plot (also called a Marimekko chart in some contexts) is a graphical display of data from a contingency table — a table showing the frequency distribution of two or more categorical variables. The total area of the chart represents the entire dataset, and this area is recursively subdivided into tiles.
When should you use a mosaic plot?+
Use a mosaic plot when visualizing the relationship between two categorical variables. It also works well when testing for independence in contingency table data, and when showing both marginal and conditional proportions in one chart.
When should you avoid a mosaic plot?+
Avoid a mosaic plot when you have more than 4–5 categories per variable — tiles become too small to read. It is also a poor fit when your audience isn't familiar with area-based encoding — bars are simpler, or when you need to show exact counts — the area metaphor makes precise reading hard.
What is another name for a mosaic plot?+
Mosaic Plot is also known as Marimekko chart, Mekko chart. The name varies between fields, but the visualisation technique is the same.
What size of dataset works best for a mosaic plot?+
Mosaic Plot works best for 2–3 categorical variables. Outside that range the chart either looks empty or becomes too cluttered to read clearly.
Is a mosaic plot suitable for dashboards?+
Yes — a mosaic plot can work well in dashboards as long as the panel is large enough for readers to perceive the encoded values, has a clear title, and includes the legend or axis labels needed to interpret it.