Statistical Computing, 36-350
Friday July 26, 2019
lm()
allows you to fit a linear model by specifying a formula, in terms of column names of a given data framecoef()
, fitted()
, residuals()
, summary()
, plot()
/autoplot
, predict()
are very handy and should be used over manual access tricksglm()
with family = "binomial"
and all the same utility functionsgam()
and utility functionsfill this in
Object Oriented Programming ‘Theory’/ Foundation
x %>% f %>% g
)class
) and use have methods
that operate on the objects.## [1] "lm"
## [1] "glm" "lm"
the first point relates to the idea that OOP tries to encapsulates struction and functions (abstracting away details of an object)
## [1] "lm"
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "contrasts" "xlevels" "call" "terms"
## [13] "model"
summary
function## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2000 0.4000 0.7000 0.7979 1.0400 5.0100
## Fair Good Very Good Premium Ideal
## 1610 4906 12082 13791 21551
python
’s class structureR
are all aroundR
can help in other areas of R
codingThere are many class structures in R
.
S3
: is the first and most commonly used OOP system. (Pretty informal, imputable)
“An S3 class is (most often) a list with a class attribute. It is constructed by the following code class(obj) <- ‘class.name’.”
~ Professor Javier Rojo, Rice University
S4
: more formal than S3
(specific functions to define the class
, a method
, etc). Still imputable, uses “@” instead of “$”R6
: Allows for more complicated structure (like python
classes). Methods below to objects, and objects are mutable
others : RC
, R.oo
, proto
You can read more about all the class structure options in the Object Oriented part of in Hadley Wickham’s Advanced R book.
S3
as most objects in R
are this styleS3
will give you a better understanding of all the functions and objects we’ve seen beforeMotivation: Coin Toss
Let’s start with a functional and simulation-based view of a coin flip.
We can image we could toss the coin for some simulation event:
## [1] "heads"
Maybe more complicated:
## [1] "tails" "heads" "heads" "heads" "tails"
In our simulation lecture we saw the benefit of using functions to encapsulate things we’d like to do multiple times
toss <- function(coin, times = 1) {
sample(coin, size = times, replace = TRUE)
}
toss(coin, times = 1)
## [1] "tails"
Note how this “abstracts” away some of what is going on (desirable)
Typical probability problems that have to do with coin tossing, require to compute the total proportion of "heads"
or "tails"
:
## [1] 0.6
It is also customary to compute the relative frequencies of "heads"
or "tails"
in a series of tosses:
## [1] 1.0000000 0.5000000 0.6666667 0.5000000 0.6000000
Or even to visualize this:
library(tidyverse)
set.seed(5938)
hundreds <- toss(coin, times = 500)
head_freqs <- cumsum(hundreds == "heads") / 1:500
my_data_vis <- data.frame(index = 1:length(hundreds),
flips = hundreds,
head_freqs = head_freqs)
ggplot(my_data_vis) +
geom_line(aes(x = index, y = head_freqs)) +
labs(y = "Head Frequency")
So far we have written code in R that
Object Oriented Programming
Take a look at these 2 experiments with coins (notice we basically did a quick “copy & paste”).
# random seed
set.seed(534)
# five tosses
five <- toss(coin, times = 5)
# prop of heads in five
sum(five == "heads") / length(five)
## [1] 0.6
The second experiment involves tossing a coin six times and computing the proportion of heads:
## [1] 0.8
To make a class, we should really be doing 3 things
S3 objects are usually built on top of lists, or atomic vectors with attributes. You can also turn functions into S3 objects.
To make an object an instance of a class, you just take an existing base object and set the "class"
attribute.
structure()
,class <- ()
.# object coin
coin1 <- structure(c("heads", "tails"),
class = "coin")
# object coin
coin2 <- c("heads", "tails")
class(coin2) <- "coin"
You can also determine if an object inherits from a specific class using inherits()
## [1] TRUE
A coin could have a function flip()
:
## [1] "tails"
1 issue with this function is that it will “flip” anything - even things that aren’t coins:
## [1] "tic"
We could add a stop()
condition that checks if the argument coin
is of the right class:
flip <- function(coin, times = 1) {
if (class(coin) != "coin") {
stop("\nflip() requires an object 'coin'")
}
sample(coin, size = times, replace = TRUE)
}
# ok
flip(coin1)
## [1] "heads"
## Error in flip(c("tic", "tac", "toe")):
## flip() requires an object 'coin'
## function (x, ...)
## UseMethod("print")
## <bytecode: 0x7f9f5fc56330>
## <environment: namespace:base>
## function (x, y, ...)
## UseMethod("plot")
## <bytecode: 0x7f9f5ba174d8>
## <environment: namespace:graphics>
## [1] add1 alias anova case.names
## [5] coerce confint cooks.distance deviance
## [9] dfbeta dfbetas drop1 dummy.coef
## [13] effects extractAIC family formula
## [17] fortify hatvalues influence initialize
## [21] kappa labels logLik model.frame
## [25] model.matrix nobs plot predict
## [29] print proj qqnorm qr
## [33] residuals rstandard rstudent show
## [37] simulate slotsFromS3 summary variable.names
## [41] vcov
## see '?methods' for accessing help and source code
## [1] anyDuplicated as_tibble as.data.frame as.raster as.tbl_cube
## [6] boxplot coerce determinant duplicated edit
## [11] head initialize isSymmetric Math Math2
## [16] Ops relist subset summary tail
## [21] unique
## see '?methods' for accessing help and source code
flip
method
When implementing new methods, you begin by creating a generic method with the function UseMethod()
:
A generic method alone is not very useful. You need to create specific cases for the generic function. In our example, we only have one class "coin"
, we follow the naming scheme: “method_name.class_name”
Example:
## [1] "heads"
## Error in UseMethod("flip"): no applicable method for 'flip' applied to an object of class "character"
Let’s review our class "coin"
. The way we defined a coin object was like this:
# object coin
coin1 <- c("heads", "tails")
class(coin1) <- "coin"
# bad coin
ttt <- c('tic', 'tac', 'toe')
class(ttt) <- "coin"
flip(ttt) # now flips :/
## [1] "tic"
Constructor
For convenience purposes, we can define a class constructor function to initialize a "coin"
object:
coin <- function(object = c("heads", "tails")) {
class(object) <- "coin"
object
}
# default coin
coin()
## [1] "heads" "tails"
## attr(,"class")
## [1] "coin"
## [1] "h" "t"
## attr(,"class")
## [1] "coin"
For now, we could just write our coin as:
coin <- function(object = c("heads", "tails")) {
if (length(object) != 2) {
stop("\n'object' must be of length 2")
}
class(object) <- "coin"
object
}
standard <- coin()
standard
## [1] "heads" "tails"
## attr(,"class")
## [1] "coin"
## Error in coin(c("tick", "tac", "toe")):
## 'object' must be of length 2
but more will be need in the future.
OOP attributes, standard methods
The sample
function allows for different probabilities in sampling, why not allow our coins to be biased?
coin <- function(object = c("heads", "tails"), prob = c(0.5, 0.5)) {
if (length(object) != 2) {
stop("\n'object' must be of length 2")
}
attr(object, "prob") <- prob
class(object) <- "coin"
object
}
coin()
## [1] "heads" "tails"
## attr(,"prob")
## [1] 0.5 0.5
## attr(,"class")
## [1] "coin"
## [1] R R R D D R
## Levels: D R
## [1] 2 2 2 1 1 2
## attr(,"levels")
## [1] "D" "R"
check_prob <- function(prob) {
if (!is.numeric(prob)) {
stop("\n'prob' must be a numeric vector")
}
if (length(prob) != 2 | !is.numeric(prob)) {
stop("\n'prob' must be a numeric vector of length 2")
}
if (any(prob < 0) | any(prob > 1)) {
stop("\n'prob' values must be between 0 and 1")
}
if (sum(prob) != 1) {
stop("\nelements in 'prob' must add up to 1")
}
TRUE
}
Our updated coin class:
coin <- function(object = c("heads", "tails"), prob = c(0.5, 0.5)) {
if (length(object) != 2) {
stop("\n'object' must be of length 2")
}
check_prob(prob)
attr(object, "prob") <- prob
class(object) <- "coin"
object
}
flip.coin <- function(x, times = 1) {
sample(x, prob = attr(x, "prob"),
size = times, replace = TRUE)
}
coin1 <- coin(prob = c(.2, .8))
flip(coin1, times = 10)
## [1] "tails" "heads" "tails" "tails" "tails" "heads" "tails" "tails"
## [9] "tails" "tails"
Bad coins
## Error in check_prob(prob):
## elements in 'prob' must add up to 1
## Error in coin(c("tic", "tac", "toe")):
## 'object' must be of length 2
Some objects can be multiple classes. For example we saw before that glm
objects actually had multiple classes.
## [1] "glm" "lm"
glm
is a subclass of lm
, which we can see by the ordering of the classes. Benefit: if a method is applied to a glm
object where no glm
specific method is available, then the lm
method will be appliedlm
is the superclass of glm
)(E.g. there is no plot.glm
but there is a plot.lm
function (in your console type methods(plot)
and you’ll see no plot.glm
).)
## [1] plot,ANY-method plot,color-method plot.acf*
## [4] plot.ACF* plot.augPred* plot.compareFits*
## [7] plot.data.frame* plot.decomposed.ts* plot.default
## [10] plot.dendrogram* plot.density* plot.ecdf
## [13] plot.factor* plot.formula* plot.function
## [16] plot.ggplot* plot.gls* plot.gtable*
## [19] plot.hcl_palettes* plot.hclust* plot.histogram*
## [22] plot.HoltWinters* plot.intervals.lmList* plot.isoreg*
## [25] plot.lm* plot.lme* plot.lmList*
## [28] plot.medpolish* plot.mlm* plot.nffGroupedData*
## [31] plot.nfnGroupedData* plot.nls* plot.nmGroupedData*
## [34] plot.pdMat* plot.ppr* plot.prcomp*
## [37] plot.princomp* plot.profile.nls* plot.R6*
## [40] plot.ranef.lme* plot.ranef.lmList* plot.raster*
## [43] plot.shingle* plot.simulate.lme* plot.spec*
## [46] plot.stepfun plot.stl* plot.table*
## [49] plot.trellis* plot.ts plot.tskernel*
## [52] plot.TukeyHSD* plot.Variogram*
## see '?methods' for accessing help and source code
Some coins are special (people collect them), let’s make a special coin class:
rare_coin <- function(name, year, ...){
object <- coin(...)
attr(object, "name") <- name
attr(object, "year") <- year
class(object) <- c("rare_coin", "coin")
object
}
## [1] "rare_coin" "coin"
## [1] "heads" "tails" "tails" "tails" "tails"
We already have a flip
method for our class, but there are lots of common functions that we apply to may objects
summary
, plot
, predict
, print
, …More specialized [
, [[
, +
, …
“The greatest use of object oriented programming in R is through print methods, summary methods and plot methods. These methods allow us to have one generic function call, plot say, that dispatches on the type of its argument and calls a plotting function that is specific to the data supplied.”
~ R Manual (referring to the S3 system).
print.coin <- function(coin){
cat(paste0("Coin: ", coin[1], "/", coin[2], "\n"))
prob <- attr(coin, "prob")
cat(paste0(" Prob: ", prob[1], "/", prob[2], "\n"))
}
print.rare_coin <- function(coin){
cat(paste0("Rare coin: ", attr(coin, "name"),
", ", attr(coin, "year"), "\n"))
print.coin(coin)
}
print(coin1)
## Coin: heads/tails
## Prob: 0.2/0.8
## Rare coin: Lincoln penny, 1972
## Coin: heads/tails
## Prob: 0.5/0.5
S3
the base and most standard, and S4
and R6
additionally commonly used structuresclass()
defined and checks which class an object is (but a functionalized constructor is recommended)UseMethod()
), and then have specific methods per class function_name.class_name
and for homework