Statistical Computing, 36-350
Wednesday July 10, 2019
apply
subset()
: function for extracting rows of a data frame meeting a conditionsplit()
: function for splitting up rows of a data frame, according to a factor variableapply()
: function for applying a given routine to rows or columns of a matrix or data framelapply()
: similar, but used for applying a routine to elements of a vector or listsapply()
: similar, but will try to simplify the return type, in comparison to lapply()
tapply()
: function for applying a given routine to groups of elements in a vector or list, according to a factor variablePlot basics: ggplot
There’s 2 major styles of plotting in R
. You’ve already seen examples of base R
plots include
plot()
: generic plotting functionhist()
: histogramBut… the R
community has come to embrace a new approach to visualization from the ggplot2
package.
Extremely popular graphics library
ggplot
: The main function where you specify the dataset and variables to plotgeoms
: geometric objects
geom_point()
, geom_bar()
, geom_density()
, geom_line()
, geom_area()
aes
: aesthetics
shape
, transparency (alpha
), color
, fill
, linetype
.scales
: Define how your data will be plotted
library(gapminder)
data(gapminder)
library(ggplot2)
ggplot(gapminder, aes(y = lifeExp, x = gdpPercap)) +
geom_point()
ggplot(data = gapminder, aes(y = lifeExp, x = gdpPercap)) +
geom_point()
p <- ggplot(data = gapminder, aes(y = lifeExp, x = gdpPercap))
p + geom_point()
ggplot
function (in this case data
and variable mappings)ggplot(gapminder, aes(y = lifeExp, x = gdpPercap)) +
geom_point(size = .1)
ggplot(gapminder, aes(y = lifeExp, x = gdpPercap, color = continent)) +
geom_point(size = .1)
ggplot(gapminder, aes(y = lifeExp, x = gdpPercap, color = continent)) +
geom_point(size = .1) +
scale_x_log10() + scale_y_log10()
ggplot
components review:ggplot2
specifications in R
(so far):
data
aes
: aesthetic attributes (position, length, color, symbol…)geom
: geometric element (point, line, bar…)scale
: scale transformation (axis limits, log scale,Examples
ggplot
: boxplotggplot(gapminder, aes(y = lifeExp, x = continent)) +
geom_boxplot()
ggplot
: histogram (color
vs fill
)ggplot(gapminder, aes(x = lifeExp)) +
geom_histogram(color = "black", fill = "blue")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot
: histogram (fill
wrong way)ggplot(gapminder, aes(x = lifeExp)) +
geom_histogram(aes(fill = "blue"), color = "black")
ggplot
: histogram (fill
mapping)ggplot(gapminder, aes(x = lifeExp)) +
geom_histogram(aes(fill = continent), color = "black")
ggplot
: line plotsggplot(gapminder[gapminder$country %in% c("United States", "Canada",
"Mexico", "United Kingdom",
"Ireland",
"Saudi Arabia"),],
aes(x = year, y = gdpPercap)) +
geom_line(aes(color = country))
library(tidyverse)
# Load the data into R
titanic <- read_csv("https://raw.githubusercontent.com/benjaminleroy/stat315summer_data/master/assignments/assignment03/titanic.csv") %>%
mutate(Pclass = factor(Pclass))
ggplot(data = titanic, aes(fill = Pclass)) +
geom_bar(aes(x = Pclass))
+ coord_cartesian()
- i.e. it’s in the standard cartersian coordinate system.“Spine chart” (or stacked bar):
same as bar, but with constant x-values
ggplot(data = titanic, aes(fill = Pclass)) +
geom_bar(aes(x = factor(1))) + coord_cartesian()
“Pie chart”: spine chart, but using polar coordinates
(angle vs. radius), with counts mapped to angle instead of height
ggplot(data = titanic, aes(fill = Pclass)) +
geom_bar(aes(x = factor(1)), width = 1) + coord_polar(theta = "y")
Commentary ++
Some of these are bad ideas!—but we’ve seen the flexibility of expressing a graph from the ground up using a grammar, instead of a “chart zoo” approach (like Excel’s chart wizard)
“[The grammar] makes it easier for you to iteratively update a plot, changing a single feature at a time. The grammar is also useful because it suggests the high-level aspects of a plot that can be changed, giving you a framework to think about graphics, and hopefully shortening the distance from mind to paper. It also encourages the use of graphics customised to a particular problem, rather than relying on generic named graphics.”
–Hadley Wickham, ggplot2
Facets are helpful: divide into sub-plots by values of a variable,
with automatically consistent scales and a common legend
p <- ggplot(titanic, aes(x = Pclass, fill = factor(Survived))) +
geom_bar()
p + coord_cartesian() +
facet_wrap(~ Sex)
ggplot
componentsggplot2
specifications in R
:
data
aes
: aesthetic attributes (position, length, color, symbol…)stat
: statistical variable transformation (identity, count, smooth, quantile…)geom
: geometric element (point, line, bar…)scale
: scale transformation (axis limits, log scale,coord
: Cartesian, polar, map projection…facet
: divide into subplots / small multiples using aOf course we can also control guides (axes, legends, titles…)
What data
map to which aes
here? What geom
, scale
, coord
are used? Any facet
?
More Complicated Structure
ggplot
: stacking and multiple imagesStacking
a <- ggplot(gapminder,aes(x = lifeExp, y = ..density..)) +
geom_histogram()
a + geom_density()
Multiple images
library(gridExtra)
a <- ggplot(gapminder,aes(x = lifeExp)) +
geom_histogram()
b <- ggplot(gapminder,aes(x = lifeExp)) +
geom_density()
grid.arrange(a,b)
ggplot(gapminder, aes(y = lifeExp, x = gdpPercap, color = continent,
shape = continent)) +
geom_point() +
scale_x_log10() + scale_y_log10() +
labs(x = "GDP per Capita (log)",
y = "Average Life Expectancy (log)",
title = "National GDP per Capita vs Life Expectancy",
color = "Continent")
Addendum
aes
mappings in the ggplot()
initial call are passed to all later elements (e.g. geom
s)aes
mapping in later elements (e.g. geom
s) are only defined for that specific elementg_vis <- ggplot(gapminder, aes(y = lifeExp, x = gdpPercap)) + # global
geom_point(color = "blue")
l_vis <- ggplot() +
geom_point(data = gapminder, aes(y = lifeExp, x = gdpPercap)) # local
# ^ notice the need to write data = gapminder
g_vis2 <- g_vis + geom_line()
l_vis2 <- l_vis + geom_line() # geom_line doesn't do anything here
grid.arrange(g_vis, l_vis,
g_vis2, l_vis2, nrow = 2)
We can also override global mapping and data sources.
gapminder2 <- gapminder
gapminder2$lifeExp <- 2 * gapminder2$lifeExp
g_vis <- ggplot(gapminder, aes(y = lifeExp, x = gdpPercap)) + # global
geom_point(color = "blue")
g_vis_plus_another <- g_vis +
geom_point(data = gapminder2, aes(y = lifeExp, x = gdpPercap), color = "red")
grid.arrange(g_vis, g_vis_plus_another, nrow = 1)
ggplot
is plotting package for the future of R
computingggplot
acts like a grammar - we are able to extend the layers iterativeggplot()
defines global information for all following elements (relative to data and aes
mappings)aes
allows you to define how to make columns a data.frame to aesthetics of the graphicgeom_...
define geometric attributes of the graphicfacets
allow one to divide the graphic up conditional on a factor variableggplot2
specifications in R
:
data
aes
: aesthetic attributes (position, length, color, symbol…)stat
: statistical variable transformation (identity, count, smooth, quantile…)geom
: geometric element (point, line, bar…)scale
: scale transformation (axis limits, log scale,coord
: Cartesian, polar, map projection…facet
: divide into subplots / small multiples using aOf course we can also control guides (axes, legends, titles…)