• Aspects of Datasets - Part 2

    This is the second (and final) article looking at key aspects of datasets. Having previously covered relevance, accuracy, and precision, here we will consider consistency, completeness and size.

    Consistency

    On the 23rd of September 1999, NASA's Mars Climate Orbiter entered the Martian atmosphere and burned up. This $125 million dollar mistake was down to inconsistent use of units between two different pieces of software…

    • Fri, Jul 31 2015
  • Don't Dismiss the GIF

    You may associate animated GIFs with questionable website design from the mid-nineties or modern internet meme culture. In terms of data visualization, when comparing to an interactive JavaScript application it's easy to think of an animated GIF as being a vestigial organ of the world wide web. Beyond, perhaps, allowing the user to (re)start the animation (using a little JavaScript) the animated GIF is, essentially, inflexible…

    • Tue, Jul 28 2015
  • Aspects of Datasets - Part 1

    Whether compiling your own dataset or using somebody else's, there's huge potential for wasting both time and money if the data isn't fit for purpose. This is the first article in a two-part series taking a (largely qualitative) look at evaluating the usefulness of datasets. In this part I will cover relevance, accuracy and precision; part two will cover consistency, completeness and size.

    Relevance

    It is…

    • Mon, Jul 20 2015
  • Connect the Dots

    Hopefully you're familiar with the standard scatterplot - a selection of points where each x coordinate is determined by one variable and each y coordinate by another. Below is a simple example, showing the number of countries winning one or more medals versus the total number of medals awarded for all Summer Olympic Games from Athens in 1896 to London in 2012. The data was assembled from Wikipedia's articles on Olympic…

    • Tue, Jul 7 2015
  • Outliers, Expertise and Interpolation

    While writing my recent articles on slopegraphs I became intrigued by the unusual shapes of some of the population curves of the countries and decided to read around the subject a bit. It was through this that I stumbled across a Wikipedia article on the Demographics of Japan. A chart concerning birth rates and death rates in the country piqued my curiosity:

    The big picture is one of (generally) declining birth rates…

    • Tue, Jun 30 2015
  • An Introduction to Slopegraphs - Part 2

    In my previous post I looked at showing changes in time using slopegraphs. This was done with reference to changes in population for thirteen of the G20 nations between 1960 and 2013. However, slopegraphs can also be used to show differences between categorical variables. Sticking with the theme of demographics, and with those same thirteen countries, the slopegraph below - created using R (you could use Excel) - shows…

    • Mon, Jun 8 2015
  • An Introduction to Slopegraphs - Part 1

    When discussing the origin of the slopegraph, the go-to reference seems to be Edward Tufte's "The Visual Display of Quantitative Information". On page 159 (of the second edition) he introduces a chart or "table-graphic" that "when read vertically, ranks 15 countries by government tax collections in 1970 and again in 1979, with names spaced in proportion to the percentages". In addition, each pair of names…

    • Wed, Jun 3 2015
  • Visual Explorations of Sample Size

     

    Drawing conclusion based on small samples is obviously problematic. At the same time, I also wonder whether the rise to prominence of "Big Data" can lead organisations to blindly collect as much data as possible rather than think logically about how much data is actually necessary to perform whatever analysis tasks are required. I'd rather have a bit more data than necessary than not quite enough, but that doesn't mean…

    • Mon, May 25 2015
  • Bar Charts versus Dot Plots

    Bar charts have a distinct advantage over chart forms that require area or angle judgements. That's because the simple perceptual tasks we require for decoding a bar chart - judging lengths and/or position along a scale - are tasks we're good at. But we also decode dot plots through judging position along a scale. Is there a reason to choose one over the other?

    To explore this question I'm going to create several…

    • Sun, May 10 2015
  • Line Charts: Where to Start?

    I've previously explained that it is essential that the bars of bar charts start at 0. The reasoning is simple: we use relative lengths of bars to compare values, so starting a bar somewhere else leads to false judgements. But what about line charts?

    Below is a line chart with three datasets: A, B and C. We can see that:

    1. all lines are well above zero across all the years;
    2. A is roughly flat;
    3. B trends downward with…
    • Tue, May 5 2015
  • Too Big Data: Coping with Overplotting

    Scatter plots are a wonderful way of showing (apparent) relationships in bivariate data. Patterns and clusters that you wouldn't see in a huge block of data in a table can become instantly visible on a page or screen. With all the hype around Big Data in recent years it's easy to assume that having more data is always an advantage. But as we add more and more data points to a scatter plot we can start to lose these patterns…

    • Mon, Apr 20 2015
  • Area Judgements: Areal Problem

    Research suggests we aren't very good at judging areas. We're much better at judging lengths and position. I've already touched on this before when discussing whether to choose a pie chart or a bar chart, but area judgements are also required in other popular visualization formats. If we want to be effective at communicating data through visualizations we should favor encodings that require length or position…

    • Mon, Apr 6 2015
  • Should I Use a Dual-axis Chart?

    While it's obviously essential to label charts properly, it's sometimes easier to think about chart design by stripping away some of these components. For example, what kind of information can we learn from the chart below (assume the missing vertical scale is linear and increases from bottom to top according to common convention).

    Here's a few (not entirely independent) suggestions:

    • A trends upwards with…
    • Thu, Apr 2 2015
  • The Role of Convention in Dataviz

    If you've read my other articles you might have realized I have a particular interest in how science and the study of perception can aid us in creating effective visualizations; visualizations in which the viewer can quickly and clearly see the important patterns encoded within and not be mislead by errant artifacts. But that doesn't mean that visualization design should be based entirely on matters of perception. Seemingly…

    • Mon, Mar 23 2015
  • It's The Little Things That Matter: Axes, Tick Marks, Tick Labels, and Grid Lines

    General principles

    While the fundamentals of chart design largely concern the accurate and efficient visual representation of data, it is important not to forget the supporting structures - axes, tick marks, tick labels, and grid lines. These should be subtle (to avoid distracting from the data) but distinct from the background (otherwise what's the point?).

    Perhaps the most important things to realize is that gray…

    • Wed, Mar 11 2015
  • 7 Do's and Don'ts of DataViz

    As sites like viz.wtf illustrate, there are many ways to create confusing and misleading data visualizations. There are also many common design-choice options that might be considered sub-optimal. This post outlines 7 common “mistakes”, with alternative solutions that avoid them. It is a personal selection and is certainly not definitive. The data used is all fictitious and the chart labels are somewhat arbitrary, acting…

    • Thu, Feb 26 2015
  • Should I Choose a Pie Chart or a Bar Chart?

    Imagine you want a chart that shows how some whole is divided up among its constituent parts. Popular convention may tell you to think in terms of a pie chart. As an example, here’s a bare-bones pie chart with only three categories, showing some fictitious data about sales of jars of spices.

    It’s clear from the chart that only a quarter of the sales are of cumin. You can also probably tell that more jars of…

    • Thu, Jan 22 2015