Last week: Object oriented programming

Object oriented programming (OOP) focuses on the object
in R, there are multiple class structures, with S3 the base and most standard, and S4 and R6 additionally commonly used structures
class() defined and checks which class an object is (but a functionalized constructor is recommended)
(S3) class specific methods start with a generic (defined with UseMethod()), and then have specific methods per class function_name.class_name
- there are also standard generic functions which need not be reinitialized.
Best practice is to create validators to check if the assumptions of the class are met

Part I

Debugging basics

Bug!

The original name for glitches and unexpected defects: dates back to at least Edison in 1876, but better story from Grace Hopper in 1947:

(From Wikipedia)

Debugging: what and why?

Debugging is a the process of locating, understanding, and removing bugs from your code

Why should we care to learn about this?

The truth: you’re going to have to debug, because you’re not perfect (none of us are!) and so you can’t write perfect code
Debugging is frustrating and time-consuming, but essential
Writing code that makes it easier to debug later is worth it, even if it takes a bit more time (lots of our design ideas support this)
Simple things you can do to help: use lots of comments, use meaningful variable names!

Debugging: how?

Debugging is (largely) a process of differential diagnosis. Stages of debugging:

Reproduce the error: can you make the bug reappear?
Characterize the error: what can you see that is going wrong?
Localize the error: where in the code does the mistake originate?
Modify the code: did you eliminate the error? Did you add new ones?

Reproduce the bug

Step 0: make if happen again

Can we produce it repeatedly when re-running the same code, with the same input values?
And if we run the same code in a clean copy of R, does the same thing happen?

Characterize the bug

Step 1: figure out if it’s a pervasive/big problem

How much can we change the inputs and get the same error?
Or is it a different error?
And how big is the error?

Localize the bug

Step 2: find out exactly where things are going wrong

This is most often the hardest part!
Understand errors, using traceback(), and also cat(), print()
Interactively debug with the R tool browser()

Localizing can be easy or hard

Sometimes error messages are easier to decode, sometimes they’re harder; this can make locating the bug easier or harder

f <- function(a) g(5 * a)
g <- function(b) h(b - 1)
h <- function(c) {
        c <- log(-c)
      if (c > 2){
        return(c^2)
      } else {
        return(c^3)
      }
}

f(-5)

## [1] 10.61519

f(5)

## Warning in log(-c): NaNs produced

## Error in if (c > 2) {: missing value where TRUE/FALSE needed

What do you mean we have a missing value! c definitely exists right?

`traceback()`

Calling traceback(), after an error: traces back through all the function calls leading to the error

Start your attention at the “bottom”, where you recognize the function you called
Read your way up to the “top”, which is the lowest-level function that produces the error
Often the most useful bit is somewhere in the middle

If you run f(5) in the console, then call traceback(), you’ll see:

> traceback()
3: h(b - 1) at #1
2: g(5 * a) at #1
1: f(5)

We can see that f() is calling h() is calling g() and this last function is throwing the error.

Why? It ends up that if you do log of a negative number it returns NA, and NAs and booleans don’t really mix.

Part II

Debugging tools

`cat()`, `print()`

Most primitive strategy: manually call cat() or print() at various points, to print out the state of variables, to help you localize the error

This is the “stone knives and bear skins” approach to debugging; it is still very popular among some people (actual quote from stackoverflow):

I’ve been a software developer for over twenty years … I’ve never had a problem I could not debug using some careful thought, and well-placed debugging print statements. Many people say that my techniques are primitive, and using a real debugger in an IDE is much better. Yet from my observation, IDE users don’t appear to debug faster or more successfully than I can, using my stone knives and bear skins.

Specialized tools for debugging

R provides you with many debugging tools. Why should we use them, and move past our handy cat() or print() statements?

Let’s see what our primitive hunter found on stackoverflow, after a receiving bunch of comments in response to his quote:

Sweet! … Very illuminating. Debuggers can help me do ad hoc inspection or alteration of variables, code, or any other aspect of the runtime environment, whereas manual debugging requires me to stop, edit, and re-execute.

`browser()`

One of the simplest but most powerful built-in debugging tools: browser(). Place a call to browser() at any point in your function that you want to debug. As in:

my_fun <- function(arg1, arg2, arg3) {
  # Some initial code 
  browser()
  # Some final code
}

Then redefine the function in the console, and run it. Once execution gets to the line with browser(), you’ll enter an interactive debug mode

Things to do while browsing

While in the interactive debug mode granted to you by browser(), you can type any normal R code into the console, to be executed within in the function environment, so you can, e.g., investigate the values of variables defined in the function

You can also type:

“n” (or simply return) to execute the next command
“s” to step into the next function
“f” to finish the current loop or function
“c” to continue execution normally
“Q” to stop the function and return to the console

(To print any variables named n, s, f, c, or Q, defined in the function environment, use print(n), print(s), etc.)

Browsing in R Studio

You have buttons to click that do the same thing as “n”, “s”, “f”, “c”, “Q” in the “Console” panel; you can see the locally defined variables in the “Environment” panel; the traceback in the “Traceback” panel

Knitting and debugging

As with cat(), print(), traceback(), used for debugging, you should only run browser() in the console, never in an Rmd code chunk that is supposed to be evaluated when knitting

But, to keep track of your debugging code (that you’ll run in the console), you can still use code chunks in Rmd, you just have to specify eval=FALSE

# As an example, here's a code chunk that we can keep around in this Rmd doc,
# but that will never be evaluated (because eval=FALSE) in the Rmd file, take 
# a look at it!
big_mat <- matrix(rnorm(1000)^3, 1000, 1000)
big_mat
# Note that the output of big_mat is not printed to the console, and also
# that big_mat was never actually created! (This code was not evaluated)

Part III

Testing

What is testing?

Testing is the systematic writing of additional code to ensure your functions behaves properly.

There’s a lot of topics related to testing, but we’ll focus on two specific aspects: assertions (i.e, ensuring conditions of your functions are satisfied) and unit tests (i.e., simple tests to ensure your function works).

Benefits:
- Enables you to catch problems early
- Provides natural documentation of your functions
- Encourages you to write simpler functions via refactoring

Of course, this requires you to spend more time upfront to write all these things, but it can dramatically save the amount of time you do bug-fixing afterwards.

Assertions

Assertions are boolean checks to ensure that the inputs to your function are properly formatted. For example, if your function expects a matrix, your first few lines of your function should check it is actually a matrix (as opposed to a vector, for example).

If these boolean checks do not pass (i.e, it fails), then you can have the assertion print out a meaningful custom message to pass to the user.

Ideally, if all your assertions pass, your function should never crash afterwards since “everything is as expected”.

`assert_that()`

We use the assert_that() function in the assertthat package to make assertions. The main benefit to using assert_that() is that you can write custom, meaningful error messages.

Example 1: Function that creates an n by n matrix with 0’s.

#not meaningful errors
create_matrix_simple <- function(n){
  matrix(0, n, n)
}

create_matrix_simple(4)

##      [,1] [,2] [,3] [,4]
## [1,]    0    0    0    0
## [2,]    0    0    0    0
## [3,]    0    0    0    0
## [4,]    0    0    0    0

create_matrix_simple(4.1)

##      [,1] [,2] [,3] [,4]
## [1,]    0    0    0    0
## [2,]    0    0    0    0
## [3,]    0    0    0    0
## [4,]    0    0    0    0

create_matrix_simple("asdf")

## Error in matrix(0, n, n): non-numeric matrix extent

library(assertthat)

## Warning: package 'assertthat' was built under R version 3.5.2

#meaningful errors
create_matrix <- function(n){
  assert_that(length(n) == 1 && is.numeric(n) 
              && n > 0 && n %% 1 == 0, 
              msg = "n is not a positive integer")
  matrix(0, n, n)
}

create_matrix(4)

##      [,1] [,2] [,3] [,4]
## [1,]    0    0    0    0
## [2,]    0    0    0    0
## [3,]    0    0    0    0
## [4,]    0    0    0    0

create_matrix(4.1)

## Error: n is not a positive integer

create_matrix("asdf")

## Error: n is not a positive integer

Example 2: Function that does a linear regression.

mat <- matrix(rnorm(20), 10, 2)
colnames(mat) <- paste0("X", 1:2)
dat <- as.data.frame(mat)

#not meaningful errors
run_lm_simple <- function(dat){
  res <- lm(X1 ~ ., data = dat)
  coef(res)
}

run_lm_simple(dat)

## (Intercept)          X2 
##  -0.4974193  -0.3026353

run_lm_simple(mat)

## Error in model.frame.default(formula = X1 ~ ., data = dat, drop.unused.levels = TRUE): 'data' must be a data.frame, not a matrix or an array

#meaningful errors
run_lm <- function(dat){
  assert_that(is.data.frame(dat), msg = "dat must be a data frame")
  
  res <- lm(X1 ~ ., data = dat)
  coef(res)
}

run_lm(dat)

## (Intercept)          X2 
##  -0.4974193  -0.3026353

run_lm(mat)

## Error: dat must be a data frame

Unit testing

Assertions are “tests done on the fly” to ensure that your function can perform properly with the given inputs.

Unit tests, on the other hand, are a suite of tests that your code needs to (or, at least, should) pass at every step of development.

It would seem obvious that when fixing bugs you would want to set up a system of checks that would help to ensure that the bugs do not come back, or that other bugs are not introduced when updating code. But what is less obvious is how to do that.

Often times, you can never fully ensure that your code will always behave properly on any input. However, if your function behaves properly on a wide range of simple inputs where you know what the intended outcome should be, you can extrapolate comfortably to know that, for the most part, your function will behave properly for most inputs.

`test_that()`

We use the test_that() function in the testthat package to make unit tests. This allows a clean format to write tests that you can document.

Each test consists of two parts: 1) a message that describes what you are testing, and 2) code to execute that results in a TRUE or FALSE. If TRUE, your test passed. If FALSE (or your testing code crashed), your test failed and you should investigate your function and testing code to see why your test failed.

Typically, you’ll write tests for simple inputs for your function such that you know exactly what the output should be.

Most commonly, you will use expect_true() or expect_error() as the last line of your test.

Example 1: Given a matrix mat and row indices idx, compute the median of each column of mat[idx,].

library(testthat)

## Warning: package 'testthat' was built under R version 3.5.2

#the incorrect version
col_median <- function(mat, idx){
  apply(mat[idx,], 2, median)
}

mat <- matrix(1:16, 4, 4)
mat

##      [,1] [,2] [,3] [,4]
## [1,]    1    5    9   13
## [2,]    2    6   10   14
## [3,]    3    7   11   15
## [4,]    4    8   12   16

test_that("col_median() works", {
  res <- col_median(mat, 1:3)
  
  expect_true(all(res == c(2,6,10,14)))
})

test_that("col_median() works for one row", {
  res <- col_median(mat, 1)
  
  expect_true(all(res == c(1,5,9,13)))
})

## Error: Test failed: 'col_median() works for one row'
## * dim(X) must have a positive length
## 1: col_median(mat, 1) at <text>:18
## 2: apply(mat[idx, ], 2, median) at <text>:5
## 3: stop("dim(X) must have a positive length")

test_that("col_median() errors for non-integer indices", {
  expect_error(col_median(mat, c(1.4, 2)))
})

## Error: Test failed: 'col_median() errors for non-integer indices'
## * `col_median(mat, c(1.4, 2))` did not throw an error.

#the correct version
col_median <- function(mat, idx){
  assert_that(all(idx %% 1 == 0), msg = "idx must be all integers")
  
  apply(mat[idx,,drop=FALSE], 2, median)
}

test_that("col_median() works", {
  res <- col_median(mat, 1:3)
  
  expect_true(all(res == c(2,6,10,14)))
})

test_that("col_median() works for one row", {
  res <- col_median(mat, 1)
  
  expect_true(all(res == c(1,5,9,13)))
})

test_that("col_median() errors for non-integer indices", {
  expect_error(col_median(mat, c(1.4, 2)))
})

Example 2: Given a numeric vector, find the first index (from left to right) of the number 2. Then add 1 to all values from that index to the end (right) of the vector.

#incorrect version
increment <- function(vec){
  n <- length(vec)
  idx <- which(vec == 2)
  vec[idx[1]:n] = vec[idx[1]:n]+1
  vec
}

test_that("increment() works", {
  res <- increment(1:10)
  
  expect_true(all(res == c(1,3:11)))
})

test_that("increment() works when no 2 is in the vector", {
  res <- increment(3:10)
  
  expect_true(all(res == 3:10))
})

## Error: Test failed: 'increment() works when no 2 is in the vector'
## * NA/NaN argument
## 1: increment(3:10) at <text>:16

#correct version
increment <- function(vec){
  n <- length(vec)
  idx <- which(vec == 2)
  if(length(idx) == 0) {
    return(vec)
  }
  
  vec[idx[1]:n] <- vec[idx[1]:n]+1
  vec
}

test_that("increment() works", {
  res <- increment(1:10)
  
  expect_true(all(res == c(1,3:11)))
})

test_that("increment() works when no 2 is in the vector", {
  res <- increment(3:10)
  
  expect_true(all(res == 3:10))
})

Unit testing: how?

Unit testing is essentially a log of all the properties of your function that you want remember to check. Since you’ll be changing your code constantly, you want to be assured that when “fixing one bug”, “an old bug that used to be fixed does not reappear”.

Useful tips of unit testing:

Always run your unit tests whenever you change your function, and before you run any “expensive” computations.
Keep writing more unit tests. Whenever you encounter a bug during your code development or code usage, add an appropriate unit test.
Write short functions, and functions that call other short functions. It’s hard to write a unit test for a 100+ line function that does so many things, since you’ll have no idea what “simple cases” to test for.
These comments should help motivate Friday’s Lecture

Summary

Debugging involves diagnosing your code when you encounter an error or unexpected behavior
- Step 0: Reproduce the error
- Step 1: Characterize the error
- Step 2: Localize the error
- Step 3: Modify the code
Functions such as traceback(), print() and browser() can help you understand how your function is behaving at different points in time during the computations.
Assertations using assert_that() help ensure that the inputs to your function are correct, so your function can proceed without errors.
Unit tests using test_that() give a recorded list of simple properties you want your function to display, so you can ensure that it works correctly as you futher modify your code.
Important: It’s hard to teach coding practices. The best way to learn is to use these practices from now onwards whenever you code!!

Debugging and Testing

Statistical Computing, 36-350

Monday July 29, 2019

Last week: Object oriented programming

Part I

Bug!

Debugging: what and why?

Debugging: how?

Reproduce the bug

Characterize the bug

Localize the bug

Localizing can be easy or hard

`traceback()`

Part II

`cat()`, `print()`

Specialized tools for debugging

`browser()`

Things to do while browsing

Browsing in R Studio

Knitting and debugging

Part III

What is testing?

Assertions

`assert_that()`

Unit testing

`test_that()`

Unit testing: how?

Summary

Debugging and Testing

Statistical Computing, 36-350

Monday July 29, 2019

Last week: Object oriented programming

Part I

Bug!

Debugging: what and why?

Debugging: how?

Reproduce the bug

Characterize the bug

Localize the bug

Localizing can be easy or hard

traceback()

Part II

cat(), print()

Specialized tools for debugging

browser()

Things to do while browsing

Browsing in R Studio

Knitting and debugging

Part III

What is testing?

Assertions

assert_that()

Unit testing

test_that()

Unit testing: how?

Summary

`traceback()`

`cat()`, `print()`

`browser()`

`assert_that()`

`test_that()`