Name:
Andrew ID:
Collaborated with:

This lab is to be done in class (completed outside of class if need be). You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an knitted HTML file on Canvas, by Wednesday 10pm, this week.

## For reproducibility --- don't change this!
set.seed(07012019)

The binomial distribution

The binomial distribution \(\mathrm{Bin}(m,p)\) is defined by the number of successes in \(m\) independent trials, each have probability \(p\) of success. Think of flipping a coin \(m\) times, where the coin is weighted to have probability \(p\) of landing on heads.

The R function rbinom() generates random variables with a binomial distribution. E.g.,

rbinom(n=20, size=10, prob=0.5)

produces 20 observations from \(\mathrm{Bin}(10,0.5)\).

Some simple manipulations

Some simple plots

Working with matrices and lists

Prostate cancer data set

We’re going to look at a data set on 97 men who have prostate cancer (from the book The Elements of Statistical Learning). There are 9 variables measured on these 97 men:

  1. lpsa: log PSA score
  2. lcavol: log cancer volume
  3. lweight: log prostate weight
  4. age: age of patient
  5. lbph: log of the amount of benign prostatic hyperplasia
  6. svi: seminal vesicle invasion
  7. lcp: log of capsular penetration
  8. gleason: Gleason score
  9. pgg45: percent of Gleason scores 4 or 5

To load this prostate cancer data set into your R session, and store it as a matrix pros_data:

pros_data <-
  as.matrix(read.table("https://raw.githubusercontent.com/linnylin92/36-350_public/master/dat/pros.dat"))

Basic indexing and calculations

Exploratory data analysis with plots

A bit of Boolean indexing never hurt anyone

Some string basics

c(“I’M NOT ANGRY I SWEAR”) # Convert to lower case c(“Mom, I don’t want my veggies”) # Convert to upper case c(“Hulk, sMasH”) # Convert to upper case c(“R2-D2 is in prime condition, a real bargain!”) # Convert to lower case

presidents <- c("Clinton", "Bush", "Reagan", "Carter", "Ford")
phrase <- "Give me a break"
ingredients <- "chickpeas, tahini, olive oil, garlic, salt"

Shakespeare’s complete works

Project Gutenberg offers over 50,000 free online books, especially old books (classic literature), for which copyright has expired. We’re going to look at the complete works of William Shakespeare, taken from the Project Gutenberg website.

To avoid hitting the Project Gutenberg server over and over again, we’ve grabbed a text file from them that contains the complete works of William Shakespeare and put it on our course website. Visit https://raw.githubusercontent.com/linnylin92/36-350_public/master/dat/shakespeare.txt in your web browser and just skim through this text file a little bit to get a sense of what it contains (a whole lot!).

Reading in text, basic exploratory tasks

Computing word counts

A tiny bit of regular expressions