Presentation from the September 2010 Columbus Web Analytics Wednesday. The presenter was Tim Wilson of Resource Interactive. Download the presentation (PPT 2007) for notes embedded in the slides and some useful animations.
The challenge with data visualization is that it is the combination of two distinct skillsets, and those skillsets aren’t typically ones that both come naturally to a person. The goal – for the purposes of this presentation and for analysts in business – is not to simply make the data “pretty,” but, rather, to make it as easily understood as possible. You want your audience to spend as few brain cycles as possible understanding the data that is being presented to them so that, rather, they can spend those cycles on interpreting the data and making decisions that drive action.This presentation lays out a number of guidelines and examples for doing this.
No single principle drives effective data visualization more than the data-pixel ratio. The goal for any visualization of data should be to minimize the number of pixels that don’t directly communicate information.As it happens, this slide is a complete violation of this concept! The multiple borders, the titles, the pictures of Few and Tufte, and certainly the green gradient background all take away from the core piece of information – what the ratio is.
No single principle drives effective data visualization more than the data-pixel ratio. The goal for any visualization of data should be to minimize the number of pixels that don’t directly communicate information.As it happens, this slide is a complete violation of this concept! The multiple borders, the titles, the pictures of Few and Tufte, and certainly the green gradient background all take away from the core piece of information – what the ratio is.
Top left: a well-meaning analyst had some data and didn’t want to just present in a totally white background, so he highlighted all of it and did “all borders.” That looked too plain, so he made a critical mistake and decided to add to it – it’s much easier for us to make lines heavier and invert cells…when that is often a poor choice. The vertical lines really don’t add value, and the column headings, while needed, are simply reference material. Top right: the vertical lines removed and the column headings made simply bold black on white – the data itself is now equally as weighty as the “decoration.”Bottom left: The horizontal lines separating the rows have been backed off a bit. The human eye can still easily detect them, but they don’t compete with the information itself.Bottom right: Removed the horizontal row lines altogether AND removed the word “Region” from each individual entry. It’s not always the right thing to do to remove horizontal lines, but in many cases they don’t add real value.
This is a default chart generated by Excel 2007 from the data on the previous slide. Applying the data-pixel ratio shows a whole host of opportunities to reduce the non-information pixels.The angled text issue and my personal feelings about Calibri are not data-pixel issues…but we can fix those, too.
This horizonal bar chart addresses all of the issues noted on the previous slide. Notice how the “lit” pixels are almost entirely dedicated to communicating the data itself. And, as a horizontal bar chart, the labels are much easier to read. The horizontal bar chart is woefully underused, and it often enables the tightest, easiest to consume visualization of data.
Microsoft has fallen into the same trap of “improving” Excel through addition. If I look at JUST the first three types of charts available in Excel 2007, only one third of them are ones that I could ever envision myself using. This doesn’t mean the other chart types shouldn’t be available. There may be some oddball scenario where a 3D stacked cone would make sense. But, ALL of these should be buried in the “Other Crazy Options” section of the Chart dialog box.The biggest issue is all of the “3D” options (which, to be clear, are not really going to render anything in true 3D – they’re just going to provide a 2D representation of a 3D visualization, which, we find, leads to more unnecessary effort from your audience to understand the data).
This is the same default chart. We’ve already addressed the issues with this representation, but, to look at the downsides of 3D-ifying examples, it’s a useful place to start. Notice how the Northeast Region is clearly showing Sales that were slightly over $150,000. Now…to the next slide.
When the chart gets made into a 3D chart, the bars are placed in the middle of the “base.” At first blush, it looks like the Northeast Region is right at $150,000. Some people will falsely think that is the actual value. Others will realize (subconsciously) that they need to project the top of the bar back a bit and then follow the gridlines over and around the corner to understand the value actually being represented. This is needlessly making your audience do mental work to understand what is being presented.
At least for 3D charts, Excel applies the “right” math to make the transformation. In the case of shadows, the shadows are actually projected behind the background that has the gridlines. The human brain (again, subconsciously) will be a bit confused by this – is the actual value for each bar, when compared to the gridlines and the values, based on the top of the main bar or on the top of the shadow? Clearly, it’s the top of the main bars…but there is a quick hesitation in trying to make sense of the odd shadow behavior required.
An example of an “artist” getting given some data and told to present it in a way that was engaging and creative. It just isn’t as easily interpreted as it should be.
While not quite “very common,” roughly 5% of the population suffers from some form of color blindness (color blindness is extremely rare in women, but it affects roughly 10% of men). This can make some data visualizations almost impossible to interpret for some people if you are not careful.
In addition to color-blindness,everyone sees information in grayscale that gets printed on a black-and-white printer. These printers are often faster and cheaper than color printers, so expect your data to, at some point, be rendered in grayscale.Notice how, in grayscale, the common red/yellow/green paradigm totally breaks down – only the yellow is distinguishable. The red and green look almost identical!This isn’t to say don’t use color. But, if you do, be sure it merely supplements/reinforces what is already shown. In the example above, for instance, you could only display a circle when the information is bad…and that circle would be red.
This is another example of unnecessarily incorporating color in a visualization. This is the same data set as before, but with each region plotted as its own series. This can happen either inadvertently, or it could be a misguided attempt to solve the “angled text” along the x-axis. The problem is that it requires the viewer to jump back and forth between the legend and the chart to match colors and determine which bar goes with which region.
Viewed in grayscale, the only option for the user is to count – identify, for instance, that the South Region is the third region listed in the legend and that it appears to be a light gray like the third bar in the chart. This is unduly mentally taxing the viewer just to understand the information – mental effort that would be better applied to interpreting the information.
For an exhaustive write-up on these points at http://bit.ly/evilpie .For one of the funniest write-ups that have pie charts as a central figure, see http://bit.ly/piecharts
One example of a pie chart being appropriately used!
The same base chart. A mis-application of the data-pixel ratio would be to see the range from $0 to $150,000 as being redundant because all of the regions exceeded $150,000 in sales. It can be tempting – especially when there is little variation across the values in a series – to shift the y-axis to start at something other than 0. The next slide shows this.
While shifting the bottom of the y-axis to $150,000, you get more granularity in the values, you also get a highly distorted view of the differences between the different regions. At first glance, it looks like the Northwest Region’s sales were many times over the sales of the Northeast and Southeast Regions. This isn’t the case, but the column height is what the viewer will assess initially and most easily, and that presents a misleading picture.
It’s important to understand that the human brain has a very limited ability to hold multiple data points (or data series) in short-term memory at once. Therefore, if there is a relationship between how two metrics are moving over time, that relationship is much more likely to be noticed if both data series are shown on the same screen/page, than if they are on separate screens.
There are LOTS of things wrong with this dashboard:Many violations of the data-pixel ratio – gradient backgrounds, container frames within container frames, an overly heavy logo/navigation areaThe “gauge” is inefficient and ineffective – it doesn’t show how the metric is trending, it is so imprecise that the value itself has to be placed on the gauge, and green/red are vagueThe Regional Performance chart has drop shadows on the lines, and the point markers are unnecessarily large; they also rely heavily on color to identify which region is represented by which lineThe bubble chart needlessly uses a 3D effect; the values themselves wind up overlapping each other, which makes them cluttered and difficult to read; and…it’s not clear what the bubble sizes represent. If they represent the “%” values, they actually aren’t accurately doing so (but, humans are notoriously bad at interpreting the difference in size between two different two-dimensional areas – that’s something not covered in this presentation)
Sparklines, in combination with appropriate labels and some additional data points, provide an effective way to convey how a metric has been trending over time without taking up very much room
I use this as a theoretical “ideal” dashboard. It would only show, in big, red, unequivocal terms, what’s going on that is not expected and that is undesirable. That is really the only information that is going to drive analysis and action.Clearly, in reality, this would not fly. But, the actual dashboard actually adheres to this principle. It doesn’t jump out quite as much as the theoretical one, but the same three trouble spots are clearly evident.Other point: this dashboard is actually another example of how the data-pixel ratio is something that can continue to guide/drive improvement over time. This dashboard was created by someone who knew and followed the data-pixel ratio concept. So, there are a lot of potentially extraneous things not included. But, after this dashboard design was reviewed by a team of people who were well-versed in the concept, a number of opportunities to increase the data-pixel ratio were identified. The next slide shows how the dashboard style has evolved based on that feedback.
I use this as a theoretical “ideal” dashboard. It would only show, in big, red, unequivocal terms, what’s going on that is not expected and that is undesirable. That is really the only information that is going to drive analysis and action.Clearly, in reality, this would not fly. But, the actual dashboard actually adheres to this principle. It doesn’t jump out quite as much as the theoretical one, but the same three trouble spots are clearly evident.Other point: this dashboard is actually another example of how the data-pixel ratio is something that can continue to guide/drive improvement over time. This dashboard was created by someone who knew and followed the data-pixel ratio concept. So, there are a lot of potentially extraneous things not included. But, after this dashboard design was reviewed by a team of people who were well-versed in the concept, a number of opportunities to increase the data-pixel ratio were identified. The next slide shows how the dashboard style has evolved based on that feedback.
Overall, the dashboard has been lightened up. The headings for each of the groups of metrics have gone from being inverted white-on-dark-gray to being simple headings with a light line underneath for delineation. The trend arrows have evolved from being a 5-option (3 of them yellow) set of red/yellow/green to simple grayscale.
There are a lot of resources out there, but I’ve only included the ones that I actually refer to on a regular basis. Some additional notes on the blogs:Peltier Tech Blog – this is a fantastic resource both for how to push Excel pretty hard to achieve the results you want, as well as for best practices (with a lot of “here’s a better way to” posts with data visualizations he has found/seen and how the information could have been more effectively presented)Presentation Zen – Garr Reynolds wrote a book with the same name, and his focus is on presentations, which, really, are a super-set of data visualization; still, his blog is useful, as the principles he espouses for “clearly presenting information” apply to both presentations in general and data visualization in particularFlowing Data – some of Yau’s posts fall into the “clever ways to present data” rather than “the clearest way to present data.” Still, he ferrets out a lot of useful content and provides insightful commentary on it