Name:
Andrew ID:
Collaborated with:

This lab is to be done in class (completed outside of class if need be). You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an knitted HTML file on Canvas, by Tuesday 10pm, this week.

This week’s agenda: exploratory data analysis, cleaning data, fitting linear models, and using associated utility functions.

Prostate cancer data

Recall the data set on 97 men who have prostate cancer (from the book The Elements of Statistical Learning). Reading it into our R session:

pros_df <- 
  read.table("https://raw.githubusercontent.com/benjaminleroy/36-350-summer-data/master/Week1/pros.dat")
dim(pros_df)
## [1] 97  9
head(pros_df, 3)
##       lcavol  lweight age      lbph svi       lcp gleason pgg45       lpsa
## 1 -0.5798185 2.769459  50 -1.386294   0 -1.386294       6     0 -0.4307829
## 2 -0.9942523 3.319626  58 -1.386294   0 -1.386294       6     0 -0.1625189
## 3 -0.5108256 2.691243  74 -1.386294   0 -1.386294       7    20 -0.1625189

Simple exploration and linear modeling

Exoplanets data set

There are now over 1,000 confirmed planets outside of our solar system. They have been discovered through a variety of methods, with each method providing access to different information about the planet. Many were discovered by NASA’s Kepler space telescope, which observes the “transit” of a planet in front of its host star. In these problems you will use data from the NASA Exoplanet Archive to investigate some of the properties of these exoplanets. (You don’t have to do anything yet, this was just by way of background.)

Reading in, cleaning data

Exploring the exoplanets

Kepler’s third law

For our exoplanet data set, the orbital period \(T\) is found in the variable pl_orbper, and the mass of the host star \(M\) in the variable st_mass, and the semi-major axis \(a\) in the variable pl_orbsmax. Kepler’s third law states that (when the mass \(M\) of the host star is much greater than the mass \(m\) of the planet), the orbital period \(T\) satisfies:

\[ T^2 \approx \frac{4\pi^2}{GM}a^3. \]

Above, \(G\) is Newton’s constant. (You don’t have to do anthing yet, this was just by way of background.)

Linear regression in deep space