Indexing
There are 3 ways to index a vector, matrix, data frame, or list in R:
Note: in general, we have to set the names ourselves. Use names()
for vectors and lists, and rownames()
, colnames()
for matrices and data frames
The most transparent way. Can index with an integer, or integer vector (or negative integer, or negative integer vector). Examples for vectors:
set.seed(33) # For reproducibility
x_vec <- rnorm(6) # Generate a vector of 6 random standard normals
x_vec
## [1] -0.13592452 -0.04079697 1.01053901 -0.15826244 -2.15663750 0.49864683
x_vec[3] # Third element
## [1] 1.010539
x_vec[c(3,4,5)] # Third through fifth elements
## [1] 1.0105390 -0.1582624 -2.1566375
x_vec[3:5] # Same, but written more succintly
## [1] 1.0105390 -0.1582624 -2.1566375
x_vec[c(3,5,4)] # Third, fifth, then fourth element
## [1] 1.0105390 -2.1566375 -0.1582624
x_vec[-3] # All but third element
## [1] -0.13592452 -0.04079697 -0.15826244 -2.15663750 0.49864683
x_vec[c(-3,-4,-5)] # All but third through fifth element
## [1] -0.13592452 -0.04079697 0.49864683
x_vec[-c(3,4,5)] # Same
## [1] -0.13592452 -0.04079697 0.49864683
x_vec[-(3:5)] # Same, more succint (note the parantheses!)
## [1] -0.13592452 -0.04079697 0.49864683
Examples for matrices:
x_mat <- matrix(x_vec, 3, 2) # Fill a 3 x 2 matrix with those same 6 normals,
# column major order
x_mat
## [,1] [,2]
## [1,] -0.13592452 -0.1582624
## [2,] -0.04079697 -2.1566375
## [3,] 1.01053901 0.4986468
x_mat[2,2] # Element in 2nd row, 2nd column
## [1] -2.156638
x_mat[5] # Same (note this is using column major order)
## [1] -2.156638
x_mat[2,] # Second row
## [1] -0.04079697 -2.15663750
x_mat[1:2,] # First and second rows
## [,1] [,2]
## [1,] -0.13592452 -0.1582624
## [2,] -0.04079697 -2.1566375
x_mat[,1] # First column
## [1] -0.13592452 -0.04079697 1.01053901
x_mat[,-1] # All but first column
## [1] -0.1582624 -2.1566375 0.4986468
Practice time:
# for each line: dim, values, how else to write?
y_mat <- matrix(1:100, nrow = 2)
y_mat[1,]
y_mat[,20]
y_mat[3,]
Examples for lists:
x_list <- list(x_vec, letters, sample(c(TRUE,FALSE),size = 4,replace = TRUE))
x_list
## [[1]]
## [1] -0.13592452 -0.04079697 1.01053901 -0.15826244 -2.15663750 0.49864683
##
## [[2]]
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
##
## [[3]]
## [1] TRUE TRUE FALSE FALSE
x_list[[3]] # Third element of list
## [1] TRUE TRUE FALSE FALSE
x_list[3] # Third element of list, kept as a list
## [[1]]
## [1] TRUE TRUE FALSE FALSE
x_list[1:2] # First and second elements of list (note the single brackets!)
## [[1]]
## [1] -0.13592452 -0.04079697 1.01053901 -0.15826244 -2.15663750 0.49864683
##
## [[2]]
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
x_list[-1] # All but first element of list
## [[1]]
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
##
## [[2]]
## [1] TRUE TRUE FALSE FALSE
Note: you will get errors if you try to do either of above commands with double brackets [[ ]]
This might appear a bit more tricky at first but is very useful, especially when we define a boolean vector “on-the-fly”. Examples for vectors:
x_vec[c(F,F,T,F,F,F)] # Third element
## [1] 1.010539
x_vec[c(T,T,F,T,T,T)] # All but third element
## [1] -0.13592452 -0.04079697 -0.15826244 -2.15663750 0.49864683
pos_vec <- x_vec > 0 # Boolean vector indicating whether each element is positive
pos_vec
## [1] FALSE FALSE TRUE FALSE FALSE TRUE
x_vec[pos_vec] # Pull out only positive elements
## [1] 1.0105390 0.4986468
x_vec[x_vec > 0] # Same, but more succint (this is done "on-the-fly")
## [1] 1.0105390 0.4986468
Works the same way for lists; in lab, we’ll explore logical indexing for matrices
Indexing with names can also be quite useful. We must have names in the first place; with vectors or lists, use names()
to set the names
names(x_list) <- c("normals", "letters", "bools")
x_list[["letters"]] # "letters" (third) element
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
x_list$letters # Same, just using different notation
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
x_list[c("normals","bools")]
## $normals
## [1] -0.13592452 -0.04079697 1.01053901 -0.15826244 -2.15663750 0.49864683
##
## $bools
## [1] TRUE TRUE FALSE FALSE
rownames()
and colnames()
and named indexing with matricesControl flow (if, else, etc.)
Summary of the control flow tools in R:
if()
, else if()
, else
: standard conditionalsifelse()
: conditional function that vectorizes nicelyswitch()
: handy for deciding between several optionsif()
and else
Use if()
and else
to decide whether to evaluate one block of code or another, depending on a condition
x <- 0.5
if (x >= 0) {
x
} else {
-x
}
## [1] 0.5
if()
needs to give one TRUE
or FALSE
valueelse
statement is optionalif (x >= 0) x else -x
else if()
We can use else if()
arbitrarily many times following an if()
statement
x <- -2
if (x^2 < 1) {
x^2
} else if (x >= 1) {
2*x-1
} else {
-2*x-1
}
## [1] 3
else if()
only gets considered if the conditions above it were not TRUE
else
statement gets evaluated if none of the above conditions were TRUE
else
statement is optionalIn the ifelse()
function we specify a condition, then a value if the condition holds, and a value if the condition fails
ifelse(x > 0, x, -x)
## [1] 2
One advantage of ifelse()
is that it vectorizes nicely,
Instead of an if()
statement followed by else if()
statements (and perhaps a final else
), we can use switch()
. We pass a variable to select on, then a value for each option
type_of_summary <- "mode"
switch(type_of_summary,
mean=mean(x_vec),
median=median(x_vec),
histogram=hist(x_vec),
"I don't understand")
## [1] "I don't understand"
type_of_summary
to be a string, either “mean”, “median”, or “histogram”; we specify what to do for eachelse
clausetype_of_summary
above and see what happensRemember our standard Boolean operators, &
and |
. These combine terms elementwise
u_vec <- runif(10, -1, 1)
u_vec
## [1] 0.54949775 -0.22561403 -0.72846986 0.80071515 0.13290531
## [6] -0.91453168 -0.02336149 -0.29755356 0.93932343 0.57915778
u_vec[-0.5 <= u_vec & u_vec <= 0.5] <- 999
u_vec
## [1] 0.5494977 999.0000000 -0.7284699 0.8007152 999.0000000
## [6] -0.9145317 999.0000000 999.0000000 0.9393234 0.5791578
In contrast to the standard Boolean operators, &&
and ||
give just a single Boolean, “lazily”: meaning we terminate evaluating the expression ASAP
(0 > 0) && all(matrix(0,2,2) == matrix(0,3,3))
## [1] FALSE
(0 > 0) && (ThisVariableIsNotDefined == 0)
## [1] FALSE
&
and |
for indexing or subsetting, and &&
and ||
for conditionalsIteration
Computers: good at applying rigid rules over and over again. Humans: not so good at this. Iteration is at the heart of programming
Summary of the iteration methods in R:
for()
, while()
loops: standard loop constructsapply()
family of functions: useful alternative to for()
loop, we’ll learn these soonfor()
A for()
loop increments a counter variable along a vector. It repeatedly runs a code block, called the body of the loop, with the counter set at its current value, until it runs through the vector
n <- 10
log_vec <- vector(length=n, mode="numeric")
for (i in 1:n) {
log_vec[i] <- log(i)
}
log_vec
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
## [8] 2.0794415 2.1972246 2.3025851
Here i
is the counter and the vector we are iterating over is 1:n
. The body is the code in between the braces
We can break out of a for()
loop early (before the counter has been iterated over the whole vector), using break
n <- 10
log_vec <- vector(length=n, mode="numeric")
for (i in 1:n) {
if (log(i) > 2) {
cat("I'm outta here. I don't like numbers bigger than 2\n")
break
}
log_vec[i] <- log(i)
}
## I'm outta here. I don't like numbers bigger than 2
log_vec
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
## [8] 0.0000000 0.0000000 0.0000000
for()
loopsMany different variations on standard for()
are possible. Two common ones:
for()
loop can contain another for()
loop (or several others)for (str in c("PhD", "Ben", "LeRoy")) {
cat(paste(str, "declined to comment\n"))
}
## PhD declined to comment
## Ben declined to comment
## LeRoy declined to comment
for (i in 1:4) {
for (j in 1:i^2) {
cat(paste(j,""))
}
cat("\n")
}
## 1
## 1 2 3 4
## 1 2 3 4 5 6 7 8 9
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
while()
A while()
loop repeatedly runs a code block, again called the body, until some condition is no longer true
i <- 1
log_vec <- c()
while (log(i) <= 2) {
log_vec <- c(log_vec, log(i))
i <- i+1
}
log_vec
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
for()
versus while()
for()
is better when the number of times to repeat (values to iterate over) is clear in advance
while()
is better when you can recognize when to stop once you’re there, even if you can’t guess it to begin with
while()
is more general, in that every for()
could be replaced with a while()
(but not vice versa)
while(TRUE)
or repeat
while(TRUE)
and repeat
: both do the same thing, just repeat the body indefinitely, until something causes the flow to break. Example (try running in your console):
repeat {
ans <- readline("Who is the best PhD of Statistics at CMU? ")
if (ans == "LeRoy" || ans == "Ben") {
cat("Yes! You get an 'A'.")
break
}
else {
cat("Wrong answer!\n")
}
}
for()
and while()
loops in R[ ]
and [[ ]]
)if()
, else if()
, else
: standard conditionalsifelse()
: shortcut for using if()
and else
in combinationswitch()
: shortcut for using if()
, elseif()
, and else
in combinationfor()
, while()
, repeat
: standard loop constructsfor()
loops, vectorization is your friend!apply()
: can also be very useful (we’ll see them later)