New blog series: User-Centered Data Visualization. Part 1: The Chinese Bubbles

Tobias Komischke / Sunday, November 22, 2009

One of the trends in UI design is data visualization. With data volume and complexity continuously rising, data visualization becomes more and more important as a means to translate data into information and further information into knowledge. Edward Tufte, who is one of the best known visualization experts out there, says that good graphical treatment of data can be found in scientific journals like Nature while more popular magazines oftentimes feature bad visualizations. What’s good and what’s bad? To me, the quality criteria are meaningfulness first and attractiveness second. A visualization is meaningful when it’s valid (in the sense of true) and understandable. It’s attractive if it’s appealing and intriguing. 

Time Magazine has a long tradition in visualizing data about all sorts of things in life. They vary the way they depict data and so sometimes I find their data visualizations better and sometimes worse. In this blog series I will use some of their work together with visualizations from other sources to share some thoughts about data visualizations.  

Earlier this year Time had this graphic that puts the number of unemployed Chinese migrant workers in perspective with the total number of migrant workers, total number of the labor force and the Chinese population.

Pretty complex stuff. It’s multiple part to whole relations, so one number is part of another number. When I saw this graphic I thought it’s attractive. Yet, it took me a while to understand it. The intersections stand out because of the darker color so they draw some visual attention. Turns out, that neither their position nor size have any meaning. They merely try to convey that there is a relation between two bubbles/circles. Based on that one could think that there is no relation between circles that don’t have an intersection. But the 20 mio unemployed migrant workers are part of the Chinese population. Both bubbles are so far away from each other that this relation is not easy to see, nor is it easy to compare their sizes. 

I think there are a couple of other ways that would be more understandable.

This visualization places each bubble inside another one so that the part to whole relationships are easier to understand. All bubbles are also placed on the same basis which makes it easier to compare the sizes. OK, at least easier than the original graphic. The problem is that it’s hard for people to estimate sizes of circles, so using bubbles in this chart was a bad call from the beginning. I will write about this in more blogs to come. For now, how about using a single stacked column to show the same data.       

Because it’s easier for people to estimate the height of a bar than the area of a circle, this chart should be easier to visually read it. As a side note: purists would probably argue that the width of the column does not convey any meaning but still requires you not only to estimate a height but an area which is a harder task. Anyway, the column shows that despite the large absolute number of 20 mio, the unemployed migrant workers only make up a small fraction of the total labor force and a tiny fraction of the Chinese population. In fact, the number is so small that it’s actually hard to see on this chart. The Nature magazine way of mitigating that scaling issue would probably be to use a log scale, but see for yourself how easy or hard to read the result is.

Now the small number of the unemployed migrant workers is stressed a lot against the huge number of the Chinese population. Same data, totally different look and I think the vast majority of Time magazine readers would have trouble understanding this. I generally don’t use log scales for that reason.

Also check out a video on Pixel8 that discusses this data visualization example!

More blogs in this series to come!