# Indexing and Iteration

Tuesday - June 2, 2019

# Logistics:

• Piazza & Homework: please feel free to post questions about Labs / Homework on Piazza (that includes when I make errors :) )
• Wednesday at 1:30 -2:30 pm we will have review session over the basics to make sure everyone is starting off strong. I’ll have some things to work through - but you can also bring your questions (same room as class)
• OHs: Thursday 2:30 - 3:30 pm (definitely), and we’ll have an OH Tuesday…

# Last time: R basics

• We write programs by composing functions to manipulate data
• The basic data types let us represent Booleans, numbers, and characters
• Data structures let us group together related values
• Vectors let us group values of the same type
• Arrays add multi-dimensional structure to vectors
• Matrices act like you’d hope they would
• Lists let us combine different types of data
• Data frames are hybrids of matrices and lists, allowing each column to have a different data type

Indexing

# How R indexes vectors, matrices, lists

There are 3 ways to index a vector, matrix, data frame, or list in R:

1. Using explicit integer indices (or negative integers)
2. Using a Boolean vector (often created on-the-fly)
3. Using names

Note: in general, we have to set the names ourselves. Use names() for vectors and lists, and rownames(), colnames() for matrices and data frames

# Indexing with integers

The most transparent way. Can index with an integer, or integer vector (or negative integer, or negative integer vector). Examples for vectors:

set.seed(33) # For reproducibility
x_vec <- rnorm(6) # Generate a vector of 6 random standard normals
x_vec
## [1] -0.13592452 -0.04079697  1.01053901 -0.15826244 -2.15663750  0.49864683
x_vec[3] # Third element
## [1] 1.010539
x_vec[c(3,4,5)] # Third through fifth elements
## [1]  1.0105390 -0.1582624 -2.1566375
x_vec[3:5] # Same, but written more succintly
## [1]  1.0105390 -0.1582624 -2.1566375
x_vec[c(3,5,4)] # Third, fifth, then fourth element
## [1]  1.0105390 -2.1566375 -0.1582624
x_vec[-3] # All but third element
## [1] -0.13592452 -0.04079697 -0.15826244 -2.15663750  0.49864683
x_vec[c(-3,-4,-5)] # All but third through fifth element
## [1] -0.13592452 -0.04079697  0.49864683
x_vec[-c(3,4,5)] # Same
## [1] -0.13592452 -0.04079697  0.49864683
x_vec[-(3:5)] # Same, more succint (note the parantheses!)
## [1] -0.13592452 -0.04079697  0.49864683

Examples for matrices:

x_mat <- matrix(x_vec, 3, 2) # Fill a 3 x 2 matrix with those same 6 normals,
# column major order
x_mat
##             [,1]       [,2]
## [1,] -0.13592452 -0.1582624
## [2,] -0.04079697 -2.1566375
## [3,]  1.01053901  0.4986468
x_mat[2,2] # Element in 2nd row, 2nd column
## [1] -2.156638
x_mat[5] # Same (note this is using column major order)
## [1] -2.156638
x_mat[2,] # Second row
## [1] -0.04079697 -2.15663750
x_mat[1:2,] # First and second rows
##             [,1]       [,2]
## [1,] -0.13592452 -0.1582624
## [2,] -0.04079697 -2.1566375
x_mat[,1] # First column
## [1] -0.13592452 -0.04079697  1.01053901
x_mat[,-1] # All but first column 
## [1] -0.1582624 -2.1566375  0.4986468

Practice time:

# for each line: dim, values, how else to write?
y_mat <- matrix(1:100, nrow = 2)
y_mat[1,]
y_mat[,20]
y_mat[3,]

Examples for lists:

x_list <- list(x_vec, letters, sample(c(TRUE,FALSE),size = 4,replace = TRUE))
x_list
## [[1]]
## [1] -0.13592452 -0.04079697  1.01053901 -0.15826244 -2.15663750  0.49864683
##
## [[2]]
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
##
## [[3]]
## [1]  TRUE  TRUE FALSE FALSE
x_list[[3]] # Third element of list
## [1]  TRUE  TRUE FALSE FALSE
x_list[3] # Third element of list, kept as a list
## [[1]]
## [1]  TRUE  TRUE FALSE FALSE
x_list[1:2] # First and second elements of list (note the single brackets!)
## [[1]]
## [1] -0.13592452 -0.04079697  1.01053901 -0.15826244 -2.15663750  0.49864683
##
## [[2]]
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
x_list[-1] # All but first element of list
## [[1]]
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
##
## [[2]]
## [1]  TRUE  TRUE FALSE FALSE

Note: you will get errors if you try to do either of above commands with double brackets [[ ]]

# Indexing with booleans

This might appear a bit more tricky at first but is very useful, especially when we define a boolean vector “on-the-fly”. Examples for vectors:

x_vec[c(F,F,T,F,F,F)] # Third element
## [1] 1.010539
x_vec[c(T,T,F,T,T,T)] # All but third element
## [1] -0.13592452 -0.04079697 -0.15826244 -2.15663750  0.49864683
pos_vec <- x_vec > 0 # Boolean vector indicating whether each element is positive
pos_vec
## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE
x_vec[pos_vec] # Pull out only positive elements
## [1] 1.0105390 0.4986468
x_vec[x_vec > 0] # Same, but more succint (this is done "on-the-fly")
## [1] 1.0105390 0.4986468

Works the same way for lists; in lab, we’ll explore logical indexing for matrices

# Indexing with names

Indexing with names can also be quite useful. We must have names in the first place; with vectors or lists, use names() to set the names

names(x_list) <- c("normals", "letters", "bools")
x_list[["letters"]] # "letters" (third) element 
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
x_list$letters # Same, just using different notation ## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" ## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z" x_list[c("normals","bools")] ##$normals
## [1] -0.13592452 -0.04079697  1.01053901 -0.15826244 -2.15663750  0.49864683
##
## \$bools
## [1]  TRUE  TRUE FALSE FALSE
• We will see indexing by names being especially useful when we talk more about data frames, shortly
• In lab, we’ll practice using rownames() and colnames() and named indexing with matrices

# Part II

Control flow (if, else, etc.)

# Control flow

Summary of the control flow tools in R:

• if(), else if(), else: standard conditionals
• ifelse(): conditional function that vectorizes nicely
• switch(): handy for deciding between several options

# if() and else

Use if() and else to decide whether to evaluate one block of code or another, depending on a condition

x <- 0.5

if (x >= 0) {
x
} else {
-x
}
## [1] 0.5
• Condition in if() needs to give one TRUE or FALSE value
• Note that the else statement is optional
• Single line actions don’t need braces, i.e., could shorten above to if (x >= 0) x else -x

# else if()

We can use else if() arbitrarily many times following an if() statement

x <- -2

if (x^2 < 1) {
x^2
} else if (x >= 1) {
2*x-1
} else {
-2*x-1
}
## [1] 3
• Each else if() only gets considered if the conditions above it were not TRUE
• The else statement gets evaluated if none of the above conditions were TRUE
• Note again that the else statement is optional

# Quick decision making

In the ifelse() function we specify a condition, then a value if the condition holds, and a value if the condition fails

ifelse(x > 0, x, -x)
## [1] 2

One advantage of ifelse() is that it vectorizes nicely,

# Deciding between many options

Instead of an if() statement followed by else if() statements (and perhaps a final else), we can use switch(). We pass a variable to select on, then a value for each option

type_of_summary <- "mode"

switch(type_of_summary,
mean=mean(x_vec),
median=median(x_vec),
histogram=hist(x_vec),
"I don't understand")
## [1] "I don't understand"
• Here we are expecting type_of_summary to be a string, either “mean”, “median”, or “histogram”; we specify what to do for each
• The last passed argument has no name, and it serves as the else clause
• Try changing type_of_summary above and see what happens

# Reminder: Boolean operators

Remember our standard Boolean operators, & and |. These combine terms elementwise

u_vec <- runif(10, -1, 1)
u_vec
##  [1]  0.54949775 -0.22561403 -0.72846986  0.80071515  0.13290531
##  [6] -0.91453168 -0.02336149 -0.29755356  0.93932343  0.57915778
u_vec[-0.5 <= u_vec & u_vec <= 0.5] <- 999
u_vec
##  [1]   0.5494977 999.0000000  -0.7284699   0.8007152 999.0000000
##  [6]  -0.9145317 999.0000000 999.0000000   0.9393234   0.5791578

# Lazy Boolean operators

In contrast to the standard Boolean operators, && and || give just a single Boolean, “lazily”: meaning we terminate evaluating the expression ASAP

(0 > 0) && all(matrix(0,2,2) == matrix(0,3,3)) 
## [1] FALSE
(0 > 0) && (ThisVariableIsNotDefined == 0) 
## [1] FALSE
• Note R never evaluates the expression on the right in each line (each would throw an error)
• In control flow, we typically just want one Boolean
• Rule of thumb: use & and | for indexing or subsetting, and && and || for conditionals

Iteration

# Iteration

Computers: good at applying rigid rules over and over again. Humans: not so good at this. Iteration is at the heart of programming

Summary of the iteration methods in R:

• for(), while() loops: standard loop constructs
• Vectorization: use it whenever possible! Often faster and simpler
• apply() family of functions: useful alternative to for() loop, we’ll learn these soon

# for()

A for() loop increments a counter variable along a vector. It repeatedly runs a code block, called the body of the loop, with the counter set at its current value, until it runs through the vector

n <- 10
log_vec <- vector(length=n, mode="numeric")
for (i in 1:n) {
log_vec[i] <- log(i)
}
log_vec
##  [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
##  [8] 2.0794415 2.1972246 2.3025851

Here i is the counter and the vector we are iterating over is 1:n. The body is the code in between the braces

# Breaking from the loop

We can break out of a for() loop early (before the counter has been iterated over the whole vector), using break

n <- 10
log_vec <- vector(length=n, mode="numeric")
for (i in 1:n) {
if (log(i) > 2) {
cat("I'm outta here. I don't like numbers bigger than 2\n")
break
}
log_vec[i] <- log(i)
}
## I'm outta here. I don't like numbers bigger than 2
log_vec
##  [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
##  [8] 0.0000000 0.0000000 0.0000000

# Variations on standard for() loops

Many different variations on standard for() are possible. Two common ones:

• Nonnumeric counters: counter variable always gets iterated over a vector, but it doesn’t have to be numeric
• Nested loops: body of the for() loop can contain another for() loop (or several others)
for (str in c("PhD", "Ben", "LeRoy")) {
cat(paste(str, "declined to comment\n"))
}
## PhD declined to comment
## Ben declined to comment
## LeRoy declined to comment
for (i in 1:4) {
for (j in 1:i^2) {
cat(paste(j,""))
}
cat("\n")
}
## 1
## 1 2 3 4
## 1 2 3 4 5 6 7 8 9
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

# while()

A while() loop repeatedly runs a code block, again called the body, until some condition is no longer true

i <- 1
log_vec <- c()
while (log(i) <= 2) {
log_vec <- c(log_vec, log(i))
i <- i+1
}
log_vec
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101

# for() versus while()

• for() is better when the number of times to repeat (values to iterate over) is clear in advance

• while() is better when you can recognize when to stop once you’re there, even if you can’t guess it to begin with

• while() is more general, in that every for() could be replaced with a while() (but not vice versa)

# while(TRUE) or repeat

while(TRUE) and repeat: both do the same thing, just repeat the body indefinitely, until something causes the flow to break. Example (try running in your console):

repeat {
ans <- readline("Who is the best PhD of Statistics at CMU? ")
if (ans == "LeRoy" || ans == "Ben") {
cat("Yes! You get an 'A'.")
break
}
else {
cat("Wrong answer!\n")
}
}

# Avoiding explicit iteration

• Warning: some people have a tendency to overuse for() and while() loops in R
• They aren’t always needed. Remember vectorization should be used whenever possible
• We’ll emphasize this on the lab/homework, and try to hit upon it throughout the course

# Summary

• Three ways to index vectors, matrices, data frames, lists: integers, Booleans, names
• Boolean on-the-fly indexing can be very useful
• Named indexing will be especially useful for data frames
• Indexing lists can be a bit tricky (beware of the difference between [ ] and [[ ]])
• if(), else if(), else: standard conditionals
• ifelse(): shortcut for using if() and else in combination
• switch(): shortcut for using if(), elseif(), and else in combination
• for(), while(), repeat: standard loop constructs
• Don’t overuse explicit for() loops, vectorization is your friend!
• apply(): can also be very useful (we’ll see them later)