A fundamental skill that every scientist must possess is the ability to thrash other people’s work (AKA peer review).
So today I’ll be thrashing a plot I found in the annual report of the University of Toronto’s Collaborative Program In Neuroscience (CPIN).
CPIN plots the distribution of students across the afferent institutes using the dreadful 3D bar plot (see plot above).
Here is a list of the many things that make me mad about this plot.
Do not use the 3d bar plots
Everyone knows that! It’s the very first rule of data visualization: the 3rd dimension (in a bar plot) is useless – and in this case it is confusing, too!
The Y axis reports the number of students but the (3D) bars never touch the unit of the grid lines (at least this is how it’s perceived when trying to read it in 3D). Does this mean that the Institute of Medical Science has ~7.3 students about to graduate?! Moreover, the labels on the Y axis are very small -compared to other labels of this plot- and make it difficult to read these important numbers.
Cognitive and perceptual overload
Your data should quickly emerge from the plot. You shouldn’t waste time and cognitive resources to extract the information embedded in your plot. 2nd important rule in data visualization: use the ink to plot the data.
- On the 3D bar plot the grid bends on the Y axis to form a “distorted reference frame”.
- Every bar is represented by 3 different colors (one for each visible surface of the bar).
- There are shadows for each group of bars.
That is a lot to process!
Distortion and perceptual deception
The color gradient on the bars may seem nice -at first glance. It does indeed add some depth and perceptual texture to the bars. Unfortunately, it also distorts the perceived height of the bars and put an extra strain on our cognitive system – already confused by the 3rd dimension of the plot – when comparing the bars.
My proposal (improving the plot)
A traditional (2d) bar plot. It’s that simple!
A well-crafted data visualization process starts (and finish) with a well-defined question: What do you want to communicate to your audience?
In this case CPIN wants to show how many students are graduating from the different programs at different stages (Masters/PhDs).
Here we’re representing count data, so we should let the audience “count” the items in the graph (while the number are still manageable)!
This is how I envision the graphical representation of the question “how many students are graduating from the different programs?”:
Let them count
Since we are depicting actual persons (i.e. students) they can be represented as “tangible” units (small rectangles stacked on top of each other). This will immediately help the reader to understand HOW MANY students can be found in the different institutes/programs. This trick works with such low numbers, of course.
I’ve also sorted the data by the total number of Masters and PhDs per department. This make it easier to compare the groups and it tells a story, too!
Use colours to tell a story
Colours used for plotting should be used with caution [I must admit that for the plot above I have randomly chosen the colours from my own beloved palette (see the figure on the left)]. You can do good (or harm) with colours. Do you want to emphasize a set (or group) in your data? Where do you want to draw the attention of your audience? Colour is your friend here!
The colour dimension depends – once again- on the story you want to tell.
After all, design (and data design) is all about finding a visual solution to a well-defined problem!