Statistical Computing, 36-350

Friday July 26, 2019

- Fitting models is critical to both statistical inference and prediction
- Exploratory data analysis is a very good first step and gives you a sense of what you’re dealing with before you start modeling
- Linear regression is the most basic modeling tool of all, and one of the most ubiquitous
`lm()`

allows you to fit a linear model by specifying a formula, in terms of column names of a given data frame- Utility functions
`coef()`

,`fitted()`

,`residuals()`

,`summary()`

,`plot()`

/`autoplot`

,`predict()`

are very handy and should be used over manual access tricks - Logistic regression is the natural extension of linear regression to binary data; use
`glm()`

with`family = "binomial"`

and all the same utility functions - Generalized additive models add a level of flexibility in that they allow the predictors to have nonlinear effects; use
`gam()`

and utility functions

**fill this in**

*Object Oriented Programming ‘Theory’/ Foundation*

- We’ve been treating functions and the object of interest
- applying them to data structures (through loops, etc) to create new objects
- concatenated functions them together (
`x %>% f %>% g`

)

- But we’ve always assumed data would just be in the correct format to go into our functions and that, realistically speaking - our data is imputable and we
**act on it**

- Object Oriented Programming focuses on the object
- specifically, OOP envisions:
- all operations being built around an object (which has a
`class`

) and use have`methods`

that operate on the objects. - that one would have the potential to extend off basic class structures to make more complicated class structures. E.g.

`## [1] "lm"`

`## [1] "glm" "lm"`

- all operations being built around an object (which has a

the first point relates to the idea that OOP tries to

**encapsulates**struction and functions (abstracting away details of an object)`## [1] "lm"`

`## [1] "coefficients" "residuals" "effects" "rank" ## [5] "fitted.values" "assign" "qr" "df.residual" ## [9] "contrasts" "xlevels" "call" "terms" ## [13] "model"`

- the second point (in a more general way) related to the idea that OOP is a
**polymorphism**(many shapes) - encouraging functions to preform different tasks on different objects- E.g. the
`summary`

function

`## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.2000 0.4000 0.7000 0.7979 1.0400 5.0100`

`## Fair Good Very Good Premium Ideal ## 1610 4906 12082 13791 21551`

- E.g. the

- R’s object structure is much less utilized on a day to day level than
`python`

’s class structure - Even so objects in
`R`

are all around - Understanding the structure of objects in
`R`

can help in other areas of`R`

coding

There are many class structures in `R`

.

: is the first and most commonly used OOP system. (Pretty informal, imputable)`S3`

“An S3 class is (most often) a list with a class attribute. It is constructed by the following code class(obj) <- ‘class.name’.”

~ Professor Javier Rojo, Rice University

: more formal than`S4`

`S3`

(specific functions to define the`class`

, a`method`

, etc). Still imputable, uses “@” instead of “$”: Allows for more complicated structure (like`R6`

`python`

classes). Methods below to objects, and objects are mutableothers :

`RC`

,`R.oo`

,`proto`

You can read more about all the class structure options in the Object Oriented part of in Hadley Wickham’s *Advanced R* book.

- I’m going to rely on an example structure taught by Professor Gaston Sanchez at UC Berkeley.
- We’ve going to focus on
`S3`

as most objects in`R`

are this style - Understanding the build blocks of
`S3`

will give you a better understanding of all the functions and objects we’ve seen before

*Motivation: Coin Toss*

Let’s start with a **functional** and **simulation-based** view of a coin flip.

We can image we could toss the coin for some simulation event:

`## [1] "heads"`

Maybe more complicated:

`## [1] "tails" "heads" "heads" "heads" "tails"`

In our simulation lecture we saw the benefit of using functions to encapsulate things we’d like to do multiple times

```
toss <- function(coin, times = 1) {
sample(coin, size = times, replace = TRUE)
}
toss(coin, times = 1)
```

`## [1] "tails"`

Note how this “abstracts” away some of what is going on (**desirable**)

Typical probability problems that have to do with coin tossing, require to compute the total proportion of `"heads"`

or `"tails"`

:

`## [1] 0.6`

It is also customary to compute the relative frequencies of `"heads"`

or `"tails"`

in a series of tosses:

`## [1] 1.0000000 0.5000000 0.6666667 0.5000000 0.6000000`

Or even to visualize this:

```
library(tidyverse)
set.seed(5938)
hundreds <- toss(coin, times = 500)
head_freqs <- cumsum(hundreds == "heads") / 1:500
my_data_vis <- data.frame(index = 1:length(hundreds),
flips = hundreds,
head_freqs = head_freqs)
ggplot(my_data_vis) +
geom_line(aes(x = index, y = head_freqs)) +
labs(y = "Head Frequency")
```

So far we have written code in R that

- simulates tossing a coin one or more times.
- computed proportion of heads
- relative frequencies of heads in a series of tosses.
- produced a plot of the relative frequencies and see how, as the number of tosses increases, the frequency of heads approach 0.5

*Object Oriented Programming*

Take a look at these 2 experiments with coins (notice we basically did a quick “copy & paste”).

```
# random seed
set.seed(534)
# five tosses
five <- toss(coin, times = 5)
# prop of heads in five
sum(five == "heads") / length(five)
```

`## [1] 0.6`

The second experiment involves tossing a coin six times and computing the proportion of heads:

`## [1] 0.8`

To make a class, we should really be doing 3 things

- create a
**constructor**(way to create a new element of the class) - create
**methods**to apply to the class - create a
**validator**to check to see that an element of the class follows the desired structure

S3 objects are usually built on top of lists, or atomic vectors with attributes. You can also turn functions into S3 objects.

To make an object an instance of a class, you just take an existing base object and set the `"class"`

attribute.

- You can do that during creation of the object with
`structure()`

, *or after the object has been created with*.`class <- ()`

```
# object coin
coin1 <- structure(c("heads", "tails"),
class = "coin")
# object coin
coin2 <- c("heads", "tails")
class(coin2) <- "coin"
```

You can also determine if an object inherits from a specific class using `inherits()`

`## [1] TRUE`

A coin could have a function `flip()`

:

`## [1] "tails"`

1 issue with this function is that it will “flip” anything - even things that aren’t coins:

`## [1] "tic"`

We could add a `stop()`

condition that checks if the argument `coin`

is of the right class:

```
flip <- function(coin, times = 1) {
if (class(coin) != "coin") {
stop("\nflip() requires an object 'coin'")
}
sample(coin, size = times, replace = TRUE)
}
# ok
flip(coin1)
```

`## [1] "heads"`

```
## Error in flip(c("tic", "tac", "toe")):
## flip() requires an object 'coin'
```

- A more formal strategy, and one that follows OOP principles, is to create a flip
**method**.- examples of methods:

```
## function (x, ...)
## UseMethod("print")
## <bytecode: 0x7f9f5fc56330>
## <environment: namespace:base>
```

```
## function (x, y, ...)
## UseMethod("plot")
## <bytecode: 0x7f9f5ba174d8>
## <environment: namespace:graphics>
```

- These functions (methods) are not unique functions
- typically comprise a colelction of functions to do similar - Depending on the class of the object, a generic method will look for a specific function for that class:

```
## [1] add1 alias anova case.names
## [5] coerce confint cooks.distance deviance
## [9] dfbeta dfbetas drop1 dummy.coef
## [13] effects extractAIC family formula
## [17] fortify hatvalues influence initialize
## [21] kappa labels logLik model.frame
## [25] model.matrix nobs plot predict
## [29] print proj qqnorm qr
## [33] residuals rstandard rstudent show
## [37] simulate slotsFromS3 summary variable.names
## [41] vcov
## see '?methods' for accessing help and source code
```

```
## [1] anyDuplicated as_tibble as.data.frame as.raster as.tbl_cube
## [6] boxplot coerce determinant duplicated edit
## [11] head initialize isSymmetric Math Math2
## [16] Ops relist subset summary tail
## [21] unique
## see '?methods' for accessing help and source code
```

`flip`

method

When implementing new methods, you begin by creating a **generic** method with the function `UseMethod()`

:

A generic method alone is not very useful. You need to create specific cases for the generic function. In our example, we only have one class `"coin"`

, we follow the naming scheme: “method_name.class_name”

Example:

`## [1] "heads"`

`## Error in UseMethod("flip"): no applicable method for 'flip' applied to an object of class "character"`

Let’s review our class `"coin"`

. The way we defined a coin object was like this:

```
# object coin
coin1 <- c("heads", "tails")
class(coin1) <- "coin"
# bad coin
ttt <- c('tic', 'tac', 'toe')
class(ttt) <- "coin"
flip(ttt) # now flips :/
```

`## [1] "tic"`

**Constructor**

For convenience purposes, we can define a class **constructor** function to initialize a `"coin"`

object:

```
coin <- function(object = c("heads", "tails")) {
class(object) <- "coin"
object
}
# default coin
coin()
```

```
## [1] "heads" "tails"
## attr(,"class")
## [1] "coin"
```

```
## [1] "h" "t"
## attr(,"class")
## [1] "coin"
```

- though not required, for larger objects it’s beneficial to create functions that validate the construction of an object (this is similar to testing, etc that we will talk about Monday)

For now, we could just write our coin as:

```
coin <- function(object = c("heads", "tails")) {
if (length(object) != 2) {
stop("\n'object' must be of length 2")
}
class(object) <- "coin"
object
}
standard <- coin()
standard
```

```
## [1] "heads" "tails"
## attr(,"class")
## [1] "coin"
```

```
## Error in coin(c("tick", "tac", "toe")):
## 'object' must be of length 2
```

but more will be need in the future.

*OOP attributes, standard methods*

The `sample`

function allows for different probabilities in sampling, why not allow our coins to be biased?

```
coin <- function(object = c("heads", "tails"), prob = c(0.5, 0.5)) {
if (length(object) != 2) {
stop("\n'object' must be of length 2")
}
attr(object, "prob") <- prob
class(object) <- "coin"
object
}
coin()
```

```
## [1] "heads" "tails"
## attr(,"prob")
## [1] 0.5 0.5
## attr(,"class")
## [1] "coin"
```

- similar to attributes relative to factors (they have levels)

```
## [1] R R R D D R
## Levels: D R
```

```
## [1] 2 2 2 1 1 2
## attr(,"levels")
## [1] "D" "R"
```

```
check_prob <- function(prob) {
if (!is.numeric(prob)) {
stop("\n'prob' must be a numeric vector")
}
if (length(prob) != 2 | !is.numeric(prob)) {
stop("\n'prob' must be a numeric vector of length 2")
}
if (any(prob < 0) | any(prob > 1)) {
stop("\n'prob' values must be between 0 and 1")
}
if (sum(prob) != 1) {
stop("\nelements in 'prob' must add up to 1")
}
TRUE
}
```

Our updated coin class:

```
coin <- function(object = c("heads", "tails"), prob = c(0.5, 0.5)) {
if (length(object) != 2) {
stop("\n'object' must be of length 2")
}
check_prob(prob)
attr(object, "prob") <- prob
class(object) <- "coin"
object
}
flip.coin <- function(x, times = 1) {
sample(x, prob = attr(x, "prob"),
size = times, replace = TRUE)
}
coin1 <- coin(prob = c(.2, .8))
flip(coin1, times = 10)
```

```
## [1] "tails" "heads" "tails" "tails" "tails" "heads" "tails" "tails"
## [9] "tails" "tails"
```

**Bad coins**

```
## Error in check_prob(prob):
## elements in 'prob' must add up to 1
```

```
## Error in coin(c("tic", "tac", "toe")):
## 'object' must be of length 2
```

Some objects can be multiple classes. For example we saw before that `glm`

objects actually had multiple classes.

`## [1] "glm" "lm"`

**subclass**:`glm`

is a subclass of`lm`

, which we can see by the ordering of the classes.*Benefit:*if a method is applied to a`glm`

object where no`glm`

specific method is available, then the`lm`

method will be applied**superclass**: is the reverse (aka`lm`

is the superclass of`glm`

)

(E.g. there is no `plot.glm`

but there is a `plot.lm`

function (in your console type `methods(plot)`

and you’ll see no `plot.glm`

).)

```
## [1] plot,ANY-method plot,color-method plot.acf*
## [4] plot.ACF* plot.augPred* plot.compareFits*
## [7] plot.data.frame* plot.decomposed.ts* plot.default
## [10] plot.dendrogram* plot.density* plot.ecdf
## [13] plot.factor* plot.formula* plot.function
## [16] plot.ggplot* plot.gls* plot.gtable*
## [19] plot.hcl_palettes* plot.hclust* plot.histogram*
## [22] plot.HoltWinters* plot.intervals.lmList* plot.isoreg*
## [25] plot.lm* plot.lme* plot.lmList*
## [28] plot.medpolish* plot.mlm* plot.nffGroupedData*
## [31] plot.nfnGroupedData* plot.nls* plot.nmGroupedData*
## [34] plot.pdMat* plot.ppr* plot.prcomp*
## [37] plot.princomp* plot.profile.nls* plot.R6*
## [40] plot.ranef.lme* plot.ranef.lmList* plot.raster*
## [43] plot.shingle* plot.simulate.lme* plot.spec*
## [46] plot.stepfun plot.stl* plot.table*
## [49] plot.trellis* plot.ts plot.tskernel*
## [52] plot.TukeyHSD* plot.Variogram*
## see '?methods' for accessing help and source code
```

Some coins are special (people collect them), let’s make a special coin class:

```
rare_coin <- function(name, year, ...){
object <- coin(...)
attr(object, "name") <- name
attr(object, "year") <- year
class(object) <- c("rare_coin", "coin")
object
}
```

`## [1] "rare_coin" "coin"`

`## [1] "heads" "tails" "tails" "tails" "tails"`

We already have a `flip`

method for our class, but there are lots of common functions that we apply to may objects

- Common in statistics:
`summary`

,`plot`

,`predict`

,`print`

, … More specialized

`[`

,`[[`

,`+`

, …“The greatest use of object oriented programming in R is through print methods, summary methods and plot methods. These methods allow us to have one generic function call, plot say, that dispatches on the type of its argument and calls a plotting function that is specific to the data supplied.”

~ R Manual (referring to the S3 system).

```
print.coin <- function(coin){
cat(paste0("Coin: ", coin[1], "/", coin[2], "\n"))
prob <- attr(coin, "prob")
cat(paste0(" Prob: ", prob[1], "/", prob[2], "\n"))
}
print.rare_coin <- function(coin){
cat(paste0("Rare coin: ", attr(coin, "name"),
", ", attr(coin, "year"), "\n"))
print.coin(coin)
}
print(coin1)
```

```
## Coin: heads/tails
## Prob: 0.2/0.8
```

```
## Rare coin: Lincoln penny, 1972
## Coin: heads/tails
## Prob: 0.5/0.5
```

- Object oriented programming (OOP) focuses on the object
- in R, there are multiple class structures, with
`S3`

the base and most standard, and`S4`

and`R6`

additionally commonly used structures `class()`

defined and checks which class an object is (but a functionalized**constructor**is recommended)- (S3) class specific
**methods**start with a generic (defined with`UseMethod()`

), and then have specific methods per class`function_name.class_name`

- there are also standard generic functions which need not be reinitialized.

- Best practice is to create
**validators**to check if the assumptions of the class are met

and for homework