This week’s agenda: learning to master pipes and dplyr.

# Load the tidyverse!
library(tidyverse)

Pipes to base R

For each of the following code blocks, which are written with pipes, write equivalent code in base R (to do the same thing).

letters %>%
  toupper %>%
  paste(collapse="+") 
## [1] "A+B+C+D+E+F+G+H+I+J+K+L+M+N+O+P+Q+R+S+T+U+V+W+X+Y+Z"
"     Ceci n'est pas une pipe     " %>% 
  gsub("une", "un", .) %>%
  trimws
## [1] "Ceci n'est pas un pipe"
rnorm(1000) %>% 
  data.frame(x = .) %>%
  ggplot(.) +  # I'm giving you a hint here
    geom_histogram(aes(x = x, y = ..density..)) + 
    labs(title = "N(0,1) draws")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

rnorm(1000) %>% 
  hist(breaks=30, plot=FALSE) %>% # use the ?hist to figure out what this does when plot=FALSE
  .[["density"]] %>%
  max
## [1] 0.405

Base R to pipes

For each of the following code blocks, which are written in base R, write equivalent code with pipes (to do the same thing).

paste("Your grade is", sample(c("A","B","C","D","R"), size = 1))
## [1] "Your grade is D"
state.name[which.max(state.x77[,"Illiteracy"])] 
## [1] "Louisiana"
str_url <- 
  paste0("https://raw.githubusercontent.com/benjaminleroy/",
         "36-350-summer-data/master/Week1/endgame.txt")

# Base R:
lines <- readLines(str_url)
text <- paste(lines, collapse = " ")
words <- strsplit(text, split = "[[:space:]]|[[:punct:]]")[[1]]
wordtab <- table(words)
wordtab <- sort(wordtab, decreasing = TRUE)
head(wordtab, 10)
## words
##         the    to     I     a   and   you     s    of    it 
## 10146   780   553   478   466   408   375   374   282   261
# Base R:
lines <- readLines(str_url)
text <- paste(lines, collapse = " ")
words <- strsplit(text, split = "[[:space:]]|[[:punct:]]")[[1]]
words <- words[words != ""]
wordtab <- table(words)
wordtab <- sort(wordtab, decreasing = TRUE)
head(wordtab, 10)
## words
## the  to   I   a and you   s  of  it  in 
## 780 553 478 466 408 375 374 282 261 251

Shark attack data, revisited

Below we read in the similar data.frame shark_attacks containing information about victims of shark attacks we’ve seen in previous labs. (Note the difference of location - I changed it a little bit.)

shark_attacks <- read.csv("https://raw.githubusercontent.com/benjaminleroy/36-350-summer-data/master/Week2/shark-attacks-clean.csv", stringsAsFactors = TRUE)
time_factor_to_numeric <- function(time_fac) {
  # fill in and document
  NULL
}
my_fac <- factor(c("13h30"))
my_fac2 <- factor(c(NA))

# my_fac_numeric <- time_factor_to_numeric(my_fac)
# my_fac_numeric == 13.5
# my_fac2_numeric <- time_factor_to_numeric(my_fac2)
# is.na(my_fac2_numeric)
my_fac3 <- factor(c("05h15", NA, "23h59"))

#my_fac3_numeric <- time_factor_to_numeric(my_fac3)
#all.equal(my_fac3_numeric, c(5.25, NA, 23 + 59/60))
time_factor_to_numeric_no_vec <- time_factor_to_numeric
time_factor_to_numeric_vectorized <- Vectorize(time_factor_to_numeric_no_vec)

dplyr attacks sharks

With your vectorized function we’ll start exploring the dplyr tools we saw in lecture.