Edward Tufte has written four books on analytical design:
1983: The Visual Display of Quantitative Information
1990: Visual Explanations: Images and Quantities, Evidence and Narrative
1990: Envisioning Information
2006: Beautiful Evidence
My notes from The Visual Display of Quantitative Information:
1 Graphical excellence
Graphical excellence is the well-designed presentation of interesting data – a matter of substance, of statistics, and of design. Graphical excellence consists of complex ideas communicated with clarity, precision, and efficiency. Graphical excellence is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space. Graphical excellence is nearly always multivariate. And graphical excellence requires telling the truth about the data.
Excellence in statistical graphics consists of complex ideas communicated with clarity, precision, and efficiency. Graphical displays should:
- show the data
- induce the viewer to think about the substance rather than about methodology, graphic design, the technology of graphic production, or something else
- avoid distorting what the data have to say
- present many numbers in a small space
- make large data sets coherent
- encourage the eye to compare different pieces of data
- reveal the data at several levels of detail, from a broad overview to the fine structure
- serve a reasonably clear purpose: description, exploration, tabulation, or decoration
- be closely integrated with the statistical and verbal descriptions of a data set
– Graphics reveal data.
– A silly theory means a silly graphic.
– Data maps: the most extensive data maps place millions of bits of information on a single page before our eyes. No other method for the display of statistical information is so powerful.
– Small, non comparative, highly labeled data sets usually belong in tables
– Two great inventors of modern graphical designs: J.H. Lambert and William Playfair (E.J.Marey is also good)
– The problem with time-series is that the simple passage of time is not a good explanatory variable: descriptive chronology is not casual explanation.
– A vivid design (with appropriate data) is the before-after time-series.
– an especially effective device for enhancing the explanatory power of time-series displays is to add spatial dimensions to the design of the graphic, so that the data are moving over space as well as over time (multivariate data, small multiple (very popular) (the same graphical design structure is repeated for each multiple, once viewers understand the design of one slice, they have immediate access to the data in all the other slices – the constancy of the design allows the viewer to focus on changes in the data rather than on changes in graphical design)
– The relational graphic (ex scatterplot) links at least two variables, encouraging and even imploring the viewer to assess the possible causal relationship between plotted variables (ex C.Y.Ho, R.W.Powell & P.E.Liley: thermal conductivity of the elements: a comprehensive review)
2 Graphical Integrity
– John Tukey (showed how graphics could be used as instruments for reasoning about quantitative information)
– A graphic does not distort if the visual representation of the data is consistent with the numerical representation
– The perceived area of a circle probably grows somewhat more slowly than the actual (physical, measured) area. Different people see the same areas somewhat differently; perceptions change with experience; and perceptions are context-dependent. – use a table to show the numbers
– Tables usually outperform graphics in reporting on small data sets of 20 numbers or less.
– The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities represented
– Clear, detailed, and thorough labeling should be used to defeat graphical distortion and ambiguity. Write out explanations of the data on the graphic itself. Label important events in the data.
– Lie factor=size of effect shown in graphic/size of effect in data. Overstating: log LF>0 and understating log LF<0.
– Each part of a graphic generates visual expectations about its other parts and, in the economy graphical perception, these expectations often determine what the eye sees : show data variation, not design variation
– In time-series displays of money, deflated and standardized units of monetary measurement are nearly always better than normal units (the only way to think clearly about money over time is to make comparisons using inflation-adjusted units of money)
– The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data: Changes in physical area on the surface of a graphic do not reliably produce appropriately proportional changes in perceived areas. The use of two (or three) varying dimensions to show one-dimensional data is a weak and inefficient technique.
– To be truthful and revealing, data graphics must bear on the question at the heart of quantitative thinking: “Compared to what?”
– Graphics must not quote data out of context
3 Sources of Graphical Integrity and Sophistication
– If the statistics are boring, then you’ve got the wrong numbers
– What E. B. White said of writing is also true of statistical graphics: “No one can write decently who is distrustful of the reader’s intelligence, or whose attitude is patronizing”
4 Data-Ink and Graphical Redesign
– The fundamental principle of good statistical graphics: Above all else show the data
– Data-ink ratio = data-ink / total ink used to print the graphic
– Erase non-data-ink, within reason. Ink that fails to depict statistical information does not have much interest to the viewer of a graphic; in fact, sometimes such non-data-ink clutters up the data.
– Erase redundant data-ink, within reason.
– Editing and revision are as essential to sound graphical design work as they are to writing.
5 Chartjunk: Vibrations, Grids, and Ducks
Three widespread types of chartjunk:
- Unintentional optical art: like moiré effects – an undisciplined ambiguity. Should be replaced with varying density or shades of a color, and labeled with words rather than encoded with different types of hatching.
- The dreaded grid: the grid should usually be muted relative to the data or completely suppressed so that its presence is only implicit. Dark grid lines are chartjunk.
- The self-promoting graphical duck: when a graphic is taken over by decorative forms or computer debris, when the data measures and structures become Design Elements, when the overall design purveys Graphical Style rather than quantitative information, then that graphic may be called a duck. (F.ex. using fake perspectives)
6 Data-Ink Maximization and Graphical Design
– Revisions of the box plot: The quartile plot (can replace the conventional scatterplot frame)
– Redesign of the Bar Chart / Histogram: no frame, no vertical axis, no tick,s and the white grid
– Mobilize every graphical element, perhaps several times over, to show the data.
– The graphical element that actually locates or plots the data is the data measure, the ink of the data measure can can itself carry data
– Multiple layers of information are created by multiple viewing depths and multiple viewing angles.
- have a properly chosen format and design
- use words, numbers, and drawing together
- reflect a balance, a proportion, a sense of relevant scale
- display an accessible complexity of detail
- often have a narrative quality, a story to tell about the data
- are drawn in a professional manner, with the technical details of production done with care
- avoid content-free decoration, including chartjunk
– One supertable is far better than a hundred little bar charts.
– For sets of highly labeled numbers, a wordy data graphic works well.
– It is nearly always helpful to write little messages on the plotting field to explain the data, to label outliers and interesting data points, to write equations and sometimes tables on the graphic itself, and to integrate the caption and legend into the design so that the eye is not required to dart back and forth between textual material and the graphic.
– Data graphics are paragraphs about the data and should be treated as sych.
– For graphics in exploratory data analysis, words should tell the viewer how to read the design, and not what to read in terms of content.
– If the nature of the data suggests the shape of the graphic, follow that suggestion.