Homework #1 0. Obtain R and install it. 1. Investigate the difference between loops and apply in R. For example, run the following: foo <- function(n=10000,d=100) { x <- matrix(rnorm(n*d),ncol=d) y <- rep(0,d) cat(date(),"\n") # do things the hard way for(i in 1:n){ for(j in 1:d){ y[j] <- y[j]+x[i,j]/n } } cat(date(),"\n") # now do them the easy way z <- apply(x,2,mean) cat(date(),"\n") cat("Difference:",sum(abs(y-z)),"\n") # Note: any difference is due to roundoff } foo() Try various values of n and d. See how big you can make the difference in time, but don't spend cpu hours. Remove the inner loop, by: y <- y+x[i,]/n and see what happens. Do a help on mapply, sapply, lapply, and keep these in mind for the future. 3. Go to CRAN and look at the contributed packages. Find one that looks interesting, install it, and play with it. Tell me which ones you looked at. 4. Generate random numbers uniform on the unit square: matrix(runif(2*n),ncol=2) # n is the number of points and must be set somewhere Plot these. Repeat this many times and look at the patterns of the points. Do you see structure in some of the plots? Print out the plot that you like the best (most structure, prettiest, makes you feel all warm and squishy, whatever). What does this exercise tell you about random numbers? 5. Write a function that takes a vector of data and plots the histogram of the data and a curve showing the kernel estimator of the data (use hist() and density() for these). Make sure you can pass any of the parameters that hist() takes to your function. Check out the man page for hist, and experiment with the variable breaks. In particular, see what happens when you give the breakpoints explicitly, as a vector, and vary the starting point, leaving the width between them the same. 6. Read http://www.ams.jhu.edu/~marchette/ID08/CMU-ML-06-108.pdf. 7. Read http://www.ams.jhu.edu/~marchette/ID08/jain-review.pdf, through section 4 (you may read more if you want).