Last time: Data frames and apply

Part I

Plot basics: ggplot

Plotting in R

There’s 2 major styles of plotting in R. You’ve already seen examples of base R plots include

But… the R community has come to embrace a new approach to visualization from the ggplot2 package.

Extremely popular graphics library

Base graphics

Why ggplot2?


Basics: some terminology

Introduction to dataset

library(gapminder)
data(gapminder)

ggplot intro: scatter plot

library(ggplot2)
ggplot(gapminder, aes(y = lifeExp, x = gdpPercap)) + 
  geom_point()

ggplot intro: basic structure

ggplot(data = gapminder, aes(y = lifeExp, x = gdpPercap)) + 
  geom_point()
p <- ggplot(data = gapminder, aes(y = lifeExp, x = gdpPercap))
p + geom_point()

ggplot intro: change size

ggplot(gapminder, aes(y = lifeExp, x = gdpPercap)) + 
  geom_point(size = .1)

ggplot intro: add color

ggplot(gapminder, aes(y = lifeExp, x = gdpPercap, color = continent)) + 
  geom_point(size = .1)

ggplot intro: change scales

ggplot(gapminder, aes(y = lifeExp, x = gdpPercap, color = continent)) + 
  geom_point(size = .1) + 
  scale_x_log10() + scale_y_log10()

ggplot components review:

ggplot2 specifications in R (so far):

Part II

Examples

ggplot: boxplot

ggplot(gapminder, aes(y = lifeExp, x = continent)) + 
  geom_boxplot() 

ggplot: histogram (color vs fill)

ggplot(gapminder, aes(x = lifeExp)) + 
  geom_histogram(color = "black", fill = "blue") 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot: histogram (fill wrong way)

ggplot(gapminder, aes(x = lifeExp)) + 
  geom_histogram(aes(fill = "blue"), color = "black")

ggplot: histogram (fill mapping)

ggplot(gapminder, aes(x = lifeExp)) + 
  geom_histogram(aes(fill = continent), color = "black")

ggplot: line plots

ggplot(gapminder[gapminder$country %in% c("United States", "Canada", 
                                          "Mexico", "United Kingdom", 
                                          "Ireland",
                                          "Saudi Arabia"),], 
       aes(x = year, y = gdpPercap)) + 
  geom_line(aes(color = country)) 

Discrete/Categorical Data visualizations

library(tidyverse)
#  Load the data into R
titanic <- read_csv("https://raw.githubusercontent.com/benjaminleroy/stat315summer_data/master/assignments/assignment03/titanic.csv") %>%
  mutate(Pclass = factor(Pclass))

ggplot intro: barchart

ggplot(data = titanic, aes(fill = Pclass)) + 
  geom_bar(aes(x = Pclass)) 

ggplot intro: spine chart

“Spine chart” (or stacked bar):
same as bar, but with constant x-values

ggplot(data = titanic, aes(fill = Pclass)) + 
  geom_bar(aes(x = factor(1))) + coord_cartesian()

ggplot intro: pie chart

“Pie chart”: spine chart, but using polar coordinates
(angle vs. radius), with counts mapped to angle instead of height

ggplot(data = titanic, aes(fill = Pclass)) + 
  geom_bar(aes(x = factor(1)), width = 1) + coord_polar(theta = "y")

Part III

Commentary ++

Grammar of Graphics: why bother

Some of these are bad ideas!—but we’ve seen the flexibility of expressing a graph from the ground up using a grammar, instead of a “chart zoo” approach (like Excel’s chart wizard)


“[The grammar] makes it easier for you to iteratively update a plot, changing a single feature at a time. The grammar is also useful because it suggests the high-level aspects of a plot that can be changed, giving you a framework to think about graphics, and hopefully shortening the distance from mind to paper. It also encourages the use of graphics customised to a particular problem, rather than relying on generic named graphics.”
–Hadley Wickham, ggplot2

Speed of Use: Facets

Facets are helpful: divide into sub-plots by values of a variable,
with automatically consistent scales and a common legend

p <- ggplot(titanic, aes(x = Pclass, fill = factor(Survived))) + 
  geom_bar()
p + coord_cartesian() +
  facet_wrap(~ Sex)

ggplot components

ggplot2 specifications in R:

Of course we can also control guides (axes, legends, titles…)

Grammar of Graphics: practice

What data map to which aes here? What geom, scale, coord are used? Any facet?

Part III

More Complicated Structure

ggplot: stacking and multiple images

Stacking

a <- ggplot(gapminder,aes(x = lifeExp, y = ..density..)) + 
  geom_histogram() 
a + geom_density()


Multiple images

library(gridExtra)
a <- ggplot(gapminder,aes(x = lifeExp)) + 
  geom_histogram()
b <- ggplot(gapminder,aes(x = lifeExp)) + 
  geom_density()
grid.arrange(a,b) 

Titles

ggplot(gapminder, aes(y = lifeExp, x = gdpPercap, color = continent,
                      shape = continent)) + 
  geom_point() + 
  scale_x_log10() + scale_y_log10() +
  labs(x = "GDP per Capita (log)",
       y = "Average Life Expectancy (log)",
       title = "National GDP per Capita vs Life Expectancy",
       color = "Continent")

Part IV

Addendum

Global verse local

g_vis <- ggplot(gapminder, aes(y = lifeExp, x = gdpPercap)) + # global
  geom_point(color = "blue")  
l_vis <- ggplot() + 
  geom_point(data = gapminder, aes(y = lifeExp, x = gdpPercap)) # local
# ^ notice the need to write data = gapminder
g_vis2 <- g_vis + geom_line() 
l_vis2 <- l_vis + geom_line()  # geom_line doesn't do anything here

grid.arrange(g_vis, l_vis, 
             g_vis2, l_vis2, nrow = 2)

Multiple data sources

We can also override global mapping and data sources.

gapminder2 <- gapminder
gapminder2$lifeExp <- 2 * gapminder2$lifeExp

g_vis <- ggplot(gapminder, aes(y = lifeExp, x = gdpPercap)) + # global
  geom_point(color = "blue") 
g_vis_plus_another <- g_vis + 
  geom_point(data = gapminder2, aes(y = lifeExp, x = gdpPercap), color = "red")

grid.arrange(g_vis, g_vis_plus_another, nrow = 1)

Summary