Demystifying Box-and-whisker Plots — Part 2

Tim Brock / Tuesday, January 26, 2016

Having shown you how to read range bars and box-and-whiskers in Part 1, I now want to use some real-world data to illustrate why they can be useful. Specifically, I'm going to use data relating to the UK general election of 2015. First, for those not familiar with the UK's political system, I'll give a brief overview of how our electoral system works.

The UK is divided in to 650 constituencies for the purpose of elections to the nationwide parliament. At a general election, anyone eligible to vote can vote for one candidate, standing in their constituency, that they want to represent them in the UK's lower house, the House of Commons. The candidate with the most votes in each constituency wins and is elected a Member of Parliament (MP). Hence there are currently 650 MPs in the House of Commons (this changes from time to time as constituency boundaries are redrawn). While this set-up should be easy to follow and means everybody gets a specified representative in parliament, it has led to a system where the proportion of MPs in parliament for a specific political party is not generally very representative of the share of the votes the party won. For example, around 11.3 million people voted for Conservative party candidates (about 37% of all votes cast) and they won over half the "seats" in the House of Commons while UK Independence Party (UKIP) candidates won close to 3.9 million votes but only one of them was elected.

To get a better idea of why there was such a big difference we can look at the spread in the share of votes across constituencies using the data compiled by the House of Commons Library. One option is to bin the vote share by seat and plot it for the different political parties. Here's what that looks like for the Conservatives, UKIP and Labour (who came 2nd in terms of both votes and seats won).

We can see there's a large number of constituencies where the Conservatives won 40-60% of the vote and a large number where UKIP won only 10-20%. This starts to explain things but, with only three parties, this form of visualization already looks a bit of a mess. There are other significant parties that helped determine the result of the election that we haven't seen yet. We could use small multiples, but box plots also provide an elegant alternative. Let's look at "typical" box-and-whisker plots (as defined and illustrated in Part 1) for the six parties who garnered more than 100,000 votes.

We've lost a lot of detailed information compared to the earlier chart but we're free from clutter and cross-party comparisons are easy. We can see that the 75th percentile (the top of the box) is lower for the Liberal Democrats (Lib Dem) than the 25th percentile is for UKIP (the bottom of the box). Despite this, the Liberal Democrats won 8 times as many seat (ie 8) as UKIP. There is one more thing I'd like to add: a box plot to show the distribution of share of the vote for the winning candidates:

Now we can see the importance of those outliers. Not only does the Liberal Democrat distribution have more of them, it has more in the region above ~35% which, as we see from the Winner distribution on the left, is the kind of percentage you'll typically need to win a seat. (Obviously this is all complicated by the fact the distributions are not at all independent.) The Green Party also won one seat and, as with UKIP, it was won by their one candidate that won over 40% of the votes in their constituency.

So far there's one party in these charts that I haven't yet mentioned, the Scottish National Party (SNP). Their box plot looks more like the Winner box plot than any of the real parties. As their name might suggest, they only place candidates in the 59 constituencies of Scotland. The other parties shown all stood in over 570 constituencies (the Conservatives stood in all but three). The box plots above don't represent the differences in sample size at all. A common solution to highlight varying sample size is to scale the width of boxes accordingly. Frequently it's the square root of the number of data points in the distribution that is used:

Another option is to plot all points, not just the outliers, and use jitter in the horizontal direction to separate them. This can be difficult to both implement and get right, you'll likely have to deal with overplotting, but it is probably more intuitive than scaling the box width.

All the SNP's candidates won in excess of 30% of the vote, putting them in the region where winning a seat becomes likely. With this information it's probably unsurprising to learn that they did, in fact, win in 56 of the 59 constituencies in which they stood. As a result, SNP MPs now make up more than 8% of MPs in the House of Commons.

None of this is meant as any kind of political statement. I just think it's a nice collection of data for illustrating the power of box-and-whisker plots.

Deliver the most demanding and beautiful touch-friendly dashboards for desktop and mobile apps with over 75 HTML5 charts, gauges, financial charts, statistical and technical indicators, trend lines and more.

Download Ignite UI now and experience the power of Infragistics jQuery controls!