The Role of Convention in Dataviz

Tim Brock / Monday, March 23, 2015

If you've read my other articles you might have realized I have a particular interest in how science and the study of perception can aid us in creating effective visualizations; visualizations in which the viewer can quickly and clearly see the important patterns encoded within and not be mislead by errant artifacts. But that doesn't mean that visualization design should be based entirely on matters of perception. Seemingly arbitrary conventions should not be forgotten, as the examples below aim to demonstrate.

Temporal conventions

There may be nothing inherently wrong with the chart below in terms of our ability to perceive patterns. However, the majority of times-series charts you are likely to see will have the time component increasing horizontally from left to right.

This convention may be related to writing conventions: Fuhrman and Boroditsky found that English speakers (who write from left to right (obviously)) tend to perceive time as increasing in the same direction, while Hebrew speakers (who write from right to left) favor time going from right to left. (That paper is a very interesting read; unfortunately it does use bar charts that don't start at 0.) Meanwhile, Bergen and Lau found that some Taiwanese speakers (then resident in America) envisioned time as running from top to bottom, in line with the predominant direction of writing in Taiwan (at the time).

Whether or not you want to take these results and follow them to the letter is probably a matter for debate, but if your data is predominantly to be viewed by a Western audience you are best off following the left to right convention (below) unless you have a very good reason not to.

A second temporal issue is with textual labeling. If you're presenting data to a foreign audience, be aware of differences in formatting of numerical dates. Note that this is not a purely West versus East difference or an English versus not-English difference. Take the time series below. Is the time-gap between each tick a day (ie the first tick is January 4th, the second is January 5th...), a month (the first tick is April the 1st, the second tick is May the 1st...) or a year (the first tick is January 2004, the second tick is January 2005...)? Clearly it's important to be explicit with dates. Ideally use axis labels that are unambiguous, failing that use a more meaningful axis or main title to clarify.

Color conventions

Conventions are also relevant to choosing an appropriate selection of colors. Some color conventions come about naturally - the sea is blue, land is green or brown - and others appear to be arbitrary or a result of branding. Differences abound across cultures.

Take a look at the map below. It may appear to show an island at its center. In fact that's the black sea and the blue protuberance at the top is the Crimean Peninsula. A map with an actual purpose is likely to have other additions to make it more useful - cities, roads, places of historical interest - and these may make it clearer what is land and what is water. But reversing the conventions still serves no useful purpose. That doesn't mean that maps should follow the blue-green convention (there are plenty of useful maps that don't), just not invert it.

I've mentioned previously the dangers of using green for positive or affirmative outcomes and red for negative ones. To recap, because of the relatively high-prevalence of red-green color vision deficiencies (aka color blindness) anything up to ~10% of your audience might struggle. But using red and green for the reverse encodings (below) could well confuse and inadvertently mislead most of your audience.

Political parties also have associated colors. In the UK, the Conservatives are blue and Labour red. The association between Republicans and red states and Democrats and blue states is much more recent. Now this convention seems enshrined it would be unwise to go against: if we color code charts using these "brand" colors, readers who know the context may well not even bother to read the labels. (I think it's also reasonable to argue there's a good case for ignoring the second "rule" in my 7 Do's and Dont's of Dataviz article here.)

Andy Kirk at visualizing data recently conducted a (perhaps not entirely scientific) poll regarding the convention of using blue for males and pink for females. The results suggested this particular convention may not be appreciated. Avoiding gender stereotypes seems like a good idea. The lesson to take from above is that, if you're going to use an alternative pairing of colors, make sure you pick a pair that is significantly different from just inverting the stereotype.