applysubset(): function for extracting rows of a data frame meeting a conditionsplit(): function for splitting up rows of a data frame, according to a factor variableapply(): function for applying a given routine to rows or columns of a matrix or data framelapply(): similar, but used for applying a routine to elements of a vector or listsapply(): similar, but will try to simplify the return type, in comparison to lapply()tapply(): function for applying a given routine to groups of elements in a vector or list, according to a factor variablePlot basics: ggplot
There’s 2 major styles of plotting in R. You’ve already seen examples of base R plots include
plot(): generic plotting functionhist(): histogramBut… the R community has come to embrace a new approach to visualization from the ggplot2 package.
Extremely popular graphics library
ggplot: The main function where you specify the dataset and variables to plotgeoms: geometric objects
geom_point(), geom_bar(), geom_density(), geom_line(), geom_area()aes: aesthetics
shape, transparency (alpha), color, fill, linetype.scales: Define how your data will be plotted
library(gapminder)
data(gapminder)
library(ggplot2)
ggplot(gapminder, aes(y = lifeExp, x = gdpPercap)) +
geom_point()
ggplot(data = gapminder, aes(y = lifeExp, x = gdpPercap)) +
geom_point()
p <- ggplot(data = gapminder, aes(y = lifeExp, x = gdpPercap))
p + geom_point()
ggplot function (in this case data and variable mappings)ggplot(gapminder, aes(y = lifeExp, x = gdpPercap)) +
geom_point(size = .1)
ggplot(gapminder, aes(y = lifeExp, x = gdpPercap, color = continent)) +
geom_point(size = .1)
ggplot(gapminder, aes(y = lifeExp, x = gdpPercap, color = continent)) +
geom_point(size = .1) +
scale_x_log10() + scale_y_log10()
ggplot components review:ggplot2 specifications in R (so far):
dataaes: aesthetic attributes (position, length, color, symbol…)geom: geometric element (point, line, bar…)scale: scale transformation (axis limits, log scale,Examples
ggplot: boxplotggplot(gapminder, aes(y = lifeExp, x = continent)) +
geom_boxplot()
ggplot: histogram (color vs fill)ggplot(gapminder, aes(x = lifeExp)) +
geom_histogram(color = "black", fill = "blue")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot: histogram (fill wrong way)ggplot(gapminder, aes(x = lifeExp)) +
geom_histogram(aes(fill = "blue"), color = "black")
ggplot: histogram (fill mapping)ggplot(gapminder, aes(x = lifeExp)) +
geom_histogram(aes(fill = continent), color = "black")
ggplot: line plotsggplot(gapminder[gapminder$country %in% c("United States", "Canada",
"Mexico", "United Kingdom",
"Ireland",
"Saudi Arabia"),],
aes(x = year, y = gdpPercap)) +
geom_line(aes(color = country))
library(tidyverse)
# Load the data into R
titanic <- read_csv("https://raw.githubusercontent.com/benjaminleroy/stat315summer_data/master/assignments/assignment03/titanic.csv") %>%
mutate(Pclass = factor(Pclass))
ggplot(data = titanic, aes(fill = Pclass)) +
geom_bar(aes(x = Pclass))
+ coord_cartesian() - i.e. it’s in the standard cartersian coordinate system.“Spine chart” (or stacked bar):
same as bar, but with constant x-values
ggplot(data = titanic, aes(fill = Pclass)) +
geom_bar(aes(x = factor(1))) + coord_cartesian()
“Pie chart”: spine chart, but using polar coordinates
(angle vs. radius), with counts mapped to angle instead of height
ggplot(data = titanic, aes(fill = Pclass)) +
geom_bar(aes(x = factor(1)), width = 1) + coord_polar(theta = "y")
Commentary ++
Some of these are bad ideas!—but we’ve seen the flexibility of expressing a graph from the ground up using a grammar, instead of a “chart zoo” approach (like Excel’s chart wizard)
“[The grammar] makes it easier for you to iteratively update a plot, changing a single feature at a time. The grammar is also useful because it suggests the high-level aspects of a plot that can be changed, giving you a framework to think about graphics, and hopefully shortening the distance from mind to paper. It also encourages the use of graphics customised to a particular problem, rather than relying on generic named graphics.”
–Hadley Wickham, ggplot2
Facets are helpful: divide into sub-plots by values of a variable,
with automatically consistent scales and a common legend
p <- ggplot(titanic, aes(x = Pclass, fill = factor(Survived))) +
geom_bar()
p + coord_cartesian() +
facet_wrap(~ Sex)
ggplot componentsggplot2 specifications in R:
dataaes: aesthetic attributes (position, length, color, symbol…)stat: statistical variable transformation (identity, count, smooth, quantile…)geom: geometric element (point, line, bar…)scale: scale transformation (axis limits, log scale,coord: Cartesian, polar, map projection…facet: divide into subplots / small multiples using aOf course we can also control guides (axes, legends, titles…)
What data map to which aes here? What geom, scale, coord are used? Any facet?
More Complicated Structure
ggplot: stacking and multiple imagesStacking
a <- ggplot(gapminder,aes(x = lifeExp, y = ..density..)) +
geom_histogram()
a + geom_density()
Multiple images
library(gridExtra)
a <- ggplot(gapminder,aes(x = lifeExp)) +
geom_histogram()
b <- ggplot(gapminder,aes(x = lifeExp)) +
geom_density()
grid.arrange(a,b)
ggplot(gapminder, aes(y = lifeExp, x = gdpPercap, color = continent,
shape = continent)) +
geom_point() +
scale_x_log10() + scale_y_log10() +
labs(x = "GDP per Capita (log)",
y = "Average Life Expectancy (log)",
title = "National GDP per Capita vs Life Expectancy",
color = "Continent")
Addendum
aes mappings in the ggplot() initial call are passed to all later elements (e.g. geoms)aes mapping in later elements (e.g. geoms) are only defined for that specific elementg_vis <- ggplot(gapminder, aes(y = lifeExp, x = gdpPercap)) + # global
geom_point(color = "blue")
l_vis <- ggplot() +
geom_point(data = gapminder, aes(y = lifeExp, x = gdpPercap)) # local
# ^ notice the need to write data = gapminder
g_vis2 <- g_vis + geom_line()
l_vis2 <- l_vis + geom_line() # geom_line doesn't do anything here
grid.arrange(g_vis, l_vis,
g_vis2, l_vis2, nrow = 2)
We can also override global mapping and data sources.
gapminder2 <- gapminder
gapminder2$lifeExp <- 2 * gapminder2$lifeExp
g_vis <- ggplot(gapminder, aes(y = lifeExp, x = gdpPercap)) + # global
geom_point(color = "blue")
g_vis_plus_another <- g_vis +
geom_point(data = gapminder2, aes(y = lifeExp, x = gdpPercap), color = "red")
grid.arrange(g_vis, g_vis_plus_another, nrow = 1)
ggplot is plotting package for the future of R computingggplot acts like a grammar - we are able to extend the layers iterativeggplot() defines global information for all following elements (relative to data and aes mappings)aes allows you to define how to make columns a data.frame to aesthetics of the graphicgeom_... define geometric attributes of the graphicfacets allow one to divide the graphic up conditional on a factor variable