Visual Theory, Human Perception and Coding Style

Statistical Computing, 36-350

Friday - July 12, 2019

Last time: ggplot

Summary:

Part I

Visualization Theory

This part lecture is a combination of a blog a wrote a long time ago - that has lots of spelling errors and a compression of common theory that is sometime taught in 315.

Graphics

Why Graphics

When graphics (Maybe better to ask “when not graphics”)

~Tufte, The Visual Display of Information, pg 20 and pg 178 (see extra reading for Hw 2)

Overview

Rules to make Good Graphics:

  1. Avoid misleading visuals
  2. Avoid clutter graphics (make sure your message comes through)
  3. Leverage graphics for complex understanding

Human Preception

General dos & don’t for graphics

1. Represent the data as truthfully as possible

1. Truthful Representation

1. Don’t abuse commonly held assumptions

Above example: double scaling

Other examples: Rose Diagrams, etc

“The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data.” ~ Tufte, pg 77

1. DO avoid being misleading with labels

e.g.

2. Avoid clutter graphics

make sure your message comes through

2. Decultering (Data ink)

2. “Decorating” and Data-Ink

Graphics should not draw the viewer’s attention away from the data. Extras get in the way.

Note: Decoration does not refer to appropriate graph labeling. Labels should always be clear, detailed, and thorough. Label key parts of the data. Add text explanations if necessary.

Data Ink should primarily present information about the data: the non-erasable, non-redundant core of a graphic

Tufte suggests (within reason) maximizing the data-ink ratio: \[\mathrm{Data\ ink \ ratio} = \frac{\mathrm{ink\ used\ to\ describe\ data}}{\mathrm{total\ ink\ in\ the\ graph}}\]

2. Impact / Truth in today’s world

“… at least a few computer graphics only evoke the response ‘Isn’t it remarkable that the computer can be programmed to draw like that?’ instead of ‘My, what interesting data.’ (Tufte, pg 120)

Today’s graphics (including ggplot) can auto generate:

3. Showcasing Complex ideas

“More information is better than less information, especially when the marginal costs of handling and interpreting additional information are low, as they are for most graphics.” (Tufte, pg 168)

3. Data Density

Maximize (within reason): \[\text{data density of a graphic } = \frac{\text{number of entries in data matrix}}{\text{area of data graphic}}\]

3. Faceting / Conditional Plots

Are a great example of increasing complexity of visualizations

Graphic Integrity (Summary)

Part II

Human Visual Perception

Cleveland: Human visual perception

Cleveland and McGill (1984)

Cleveland: quantitative comparisons experiment

To really understand Cleveland’s work - let’s go through an in class experiment. I’m going to show you 4 sets of 4 images, and I want you to write down the relative quantities from the first object to the later objects.

A B C D
Positions 1 ? ? ?
Lengths 1 ? ? ?
Angles 1 ? ? ?
Areas 1 ? ? ?

Quantitative perceptual tasks: position, aligned

Quantitative perceptual tasks: length

Quantitative perceptual tasks: angle

Quantitative perceptual tasks: area

Quantitative perceptual tasks: answers

A B C D
Positions 1 3/4 1/4 2/4
Lengths 1 2/4 3/4 1/4
Angles 1 2/3 1/3 4/3
Areas 1 2/4 1/4 3/4

Cleveland and McGill (1984)

Cleveland, The Elements of Graphing Data

Ordering of perceptual tasks

Cleveland and McGill’s ordering

Ordering of perceptual tasks

“generic comparisons” = less accuracy

Quantitative perceptual tasks & ranking

Lessons:

Graphics Dos and Don’ts: More on Human Perception

Ranking: alphabetical

Ranking: informative

Consistency

Comparing weights of newborns: Which age group weighs the least?

Consistency

Give all small multiples the same structure, usually including axis limits, to make comparisons easier and reduce cognitive load

Consistency

Ensure design changes are meaningful (tied to data changes)

Consistency

More consistent redesign, Stephen Few

Semantic associations

Orange vs blue crab species: real graphic from a talk
(crabs dataset in MASS package)

Ordering, consistency and semantic associations

Lessons:

Part III

Coding Style

What and Why is Good Coding

Hadley Wickham

Google Style Guide

Paul E. Johnson, KU

When to have clean code

R Style Guides

Hadley Wichham

Google

Lesser known:

Tidyverse

Quick Walk through of Hadley Wickham Style Guide

Notation and naming

“There are only two hard things in Computer Science: cache invalidation and naming things.”

Phil Karlton

Variable and function names should be lowercase. Use an underscore (_) to separate words within a name. Generally, variable names should be nouns and function names should be verbs. Strive for names that are concise and meaningful (this is not easy!).

Naming Examples:

Good

day_one
day_1

Bad

first_day_of_the_month
DayOne
dayone
djm1

Where possible, avoid using names of existing functions and variables. Doing so will cause confusion for the readers of your code.

Bad

T <- FALSE
c <- 10
mean <- function(x) sum(x)

Assignment

Use <-, not =, for assignment.

Good

x <- 5

Bad

x = 5

Syntax: Spacing

Place spaces around all infix operators (=, +, -, <-, etc.). The same rule applies when using = in function calls. Always put a space after a comma, and never before (just like in regular English).

Good

average <- mean(feet / 12 + inches, na.rm = TRUE)
x <- 1:10 # no spaces with ":"

if (debug) do(x)
plot(x, y)

Bad

average<-mean(feet/12+inches,na.rm=TRUE)
x <- 1 : 10
if(debug)do(x)
plot (x, y)

Extra spacing (i.e., more than one space in a row) is ok if it improves alignment of equal signs or assignments (<-).

list(
  total = a + b + c, 
  mean  = (a + b + c) / n
)

Do not place spaces around code in parentheses or square brackets (unless there’s a comma, in which case see above).

Good

if (debug) do(x)
diamonds[5, ]

Bad

if ( debug ) do(x)  # No spaces around debug
x[1,]   # Needs a space after the comma
x[1 ,]  # Space goes after comma not before

Curly braces

An opening curly brace should never go on its own line and should always be followed by a new line. A closing curly brace should always go on its own line, unless it’s followed by else.

Always indent the code inside curly braces.

Good

if (y < 0 && debug) {
  message("Y is negative")
}

if (y == 0) {
  log(x)
} else {
  y ^ x
}

Bad

if (y < 0 && debug)
message("Y is negative")

if (y == 0) {
  log(x)
} 
else {
  y ^ x
}

It’s ok to leave very short statements on the same line:

if (y < 0 && debug) message("Y is negative")

Line length

Strive to limit your code to 80 characters per line. This fits comfortably on a printed page with a reasonably sized font. If you find yourself running out of room, this is a good indication that you should encapsulate some of the work in a separate function.

Indentation

When indenting your code, use two spaces. Never use tabs or mix tabs and spaces.

The only exception is if a function definition runs over multiple lines. In that case, indent the second line to where the definition starts:

long_function_name <- function(a = "a long argument", 
                               b = "another argument",
                               c = "another long argument") {
  # As usual code is indented by two spaces.
}

Organisation

Commenting guidelines - Comment your code. Each line of a comment should begin with the comment symbol and a single space: #. Comments should explain the why, not the what.

# Load data ---------------------------

# Plot data ---------------------------