Blog series: User-Centered Data Visualization. Part 2 - The Pizza Effect

Tobias Komischke / Tuesday, January 5, 2010

The underlying data set is pretty simple. It consists of just 8 numbers which are used for four target/actual comparisons: how many swine flu vaccinations 4 organizations have requested and how many they actually received. And still, the way the data was visualized is problematic.

Estimating the area of a circle is hard. The area of a circle grows quadratically with the radius. You double a circle’s radius and the area gets four times as large. You may know this from ordering pizza: the difference between a 14-in pizza (medium) and a 16-in pizza (large) doesn’t really sound significant if you just look at these two numbers. After all, it’s just a 2 inch difference. But eating a large pizza is quite a different challenge than a medium pizza – the effect of the area growing quadratically with the radius.

Columbia University requested twice as many vaccines than Goldman Sachs. This is very hard to see in the circle sizes: with both red circles being the same size (showing that they both received the same number of vaccines), Columbia’s large dotted circle does not look twice as large as Goldman’s – but it is. That’s the pizza effect. Were the actual numbers not shown below the circles, the visualization would be not very expressive. Because estimating the area of circles is so difficult, it's also tough to compare the size of the red circles against the dotted circles for all of the organizations.

Because it’s much easier to estimate the length of a bar than the area of a circle, this graph makes it easy to see that Columbia ordered twice as many vaccines than Goldman – even without data labels next to the bars. It's still a challenge to estimate the number of received vaccines for Goldman and Columbia. A logarithmic scale could resolve this problem, but people have hard times understanding log scales (see the first blog of this series).

People often say that bar charts or column charts are too standard and therefore boring ("not sexy"). These chart types are standard because they are superior to other ways of visualizing data. There are still ways (within limits) to make them attractive, but from a user-centered design perspective I believe that usability has the top priority.

Here’s another interesting fact that neither the original visualization nor the bar chart reveal. It doesn’t seem like it was the intention of Time to focus on this, but since the ratios between requested and received vaccines are shown, it’s straight forward to calculate the actual value which then makes it very easy to see which of the four organizations got better off than the others.

Again I didn’t put data labels next to the bars. Not necessary. It’s obvious that the cancer center and Goldman Sachs have approximately the same ratio between requested and received vaccines – roughly 5% and that’s a third of what Citigroup got and twice as much as Columbia. It’s interesting that, unlike pie charts, bar charts require you to think about how to scale the x-axis. I put it to 100% max to show the distance between the optimum relative return (100%) and the best achieved relative return (17%). So the message is that none of the four organizations got a good portion of what they had asked for. If I had put the max on the x-axis to 20% it would’ve shown more clearly the differences between the organizations. So the message had been: Citigroup got a much higher relative return than the others. Which scaling is better? There is no right or wrong answer. It depends on the underlying question at hand and what you want to express. Doesn’t this demonstrate how manipulative data visualization can be? You betcha.

More blogs in this series to come!