Last time: Prediction

Overview for the day:

  1. Advanced\(^2\) Tidyverse (Split-Apply-Combine)
  2. Parallization
  3. Deep Learning

Part I

Advanced\(^2\) Tidyverse: Split-Apply-Combine

Review of tidyverse: dplyr and tidyr

Split-apply-combine

Today we will learn a general strategy that can be summmarized in three conceptual steps:

These are conceptual steps; often the apply and combine steps can be performed in multiple steps (in the tidyverse reable fashion), and ther combine can also see reformating (leveraging gather and spread)

Simple but powerful

Does split-apply-combine sound simple? It is, but it’s very powerful when combined with the right data structures

Strikes data set

Data set on 18 countries over 35 years (compiled by Bruce Western, in the Sociology Department at Harvard University). The measured variables:

library(tidyverse)
strikes_df <- read.csv("https://raw.githubusercontent.com/benjaminleroy/36-350-summer-data/master/Week6/strikes.csv")
dim(strikes_df) # Since 18 × 35 = 630, some years missing from some countries
## [1] 625   8
head(strikes_df)
##     country year strike.volume unemployment inflation left.parliament
## 1 Australia 1951           296          1.3      19.8            43.0
## 2 Australia 1952           397          2.2      17.2            43.0
## 3 Australia 1953           360          2.5       4.3            43.0
## 4 Australia 1954             3          1.7       0.7            47.0
## 5 Australia 1955           326          1.4       2.0            38.5
## 6 Australia 1956           352          1.8       6.3            38.5
##   centralization density
## 1      0.3748588      NA
## 2      0.3751829      NA
## 3      0.3745076      NA
## 4      0.3710170      NA
## 5      0.3752675      NA
## 6      0.3716072      NA

An interesting question

Is there a relationship between a country’s ruling party alignment (left versus right) and the volume of strikes?