Homework #2 1. Consider the two class problem where class 1 is drawn from N(0,1) and class 2 is drawn from N(2,1) (normals with unit variance and means 0 or 2). a) What is the Bayes error in this case? Write the exact answer (an integral) and a decimal approximation to the answer. Assume equal priors (equal probability for each class). It might help to draw the picture. b) What classifier attains Bayes error? Describe the classifier as completely as you can. c) Implement a k-nearest neighbor classifier: use knn, knn.cv (library(class) -- standard library comes with R). Sample 100 observations from each class and compute the resubstitution error for this sample using a 1-nearest neighbor classifier. d) Sample n observations from each of the classes and build a 1-nearest neighbor classifier, for n=10,100,1000,10000. Using crossvalidation, estimate the error of your classifier, and plot it as a function of n. Do you conclude that the 1-nearest neighbor classifier is/is not consistent for this problem? Here's a snippet of code that does one of these: set.seed(2345123) x <- rnorm(100) y <- rnorm(100,2) data <- c(x,y) cls <- rep(1:2,each=100) a <- knn.cv(data,cls,k=1) sum(a!=cls)/length(cls) e) Redo d with a k-nearest neighbor classifier for k=floor(log(n)). Do you think the k-nearest neighbor classifier can be consistent? 2. Now redo the above, only with a linear classifier. The easy way to do this is to get the MASS library: library(MASS) lin <- lda(data,cls) Note: you can set CV=T to do crossvalidation. You should be able to do this by just replacing the knn call with the lda call (with a little fiddling to get the right thing out of what is returned). 3. Simulate the Trunk example as follows: i. Set d. ii. Draw 100 points each from the two classes N(mu1,1), N(mu2,1), where mu1 = (1,1/sqrt(2),...,1/sqrt(d)) and mu2 = -mu1. Note: there are two ways to draw from uncorrelated normals like this: a <- matrix(rnorm(100*d),ncol=d) data <- scale(a,center=-1/sqrt(1:d),scale=F) # or get package mvtnorm or similar multivariate normal package library(mvtnorm) data <- rmvnorm(100,1/sqrt(1:d),diag(1,d)) The second approach would allow you to draw from uncorrelated normals, by putting in any covariance matrix you want. The first approach only works for uncorrelated (diagonal covariance matrix). In this problem, we want uncorrelated normals, so the first approach works just fine. iii. Compute the error using the "full knowledge" classifier: argmin(d(x,mu1),d(x,mu2)) (min distance to (known) mean). One approach to this: a <- scale(x,center=mu1,scale=F) b <- scale(x,center=mu2,scale=F) d1 <- apply(a,1,function(x) sum(x^2)) # or d1 <- apply(a^2,1,sum) d2 <- apply(b,1,function(x) sum(x^2)) # or d2 <- apply(b^2,1,sum) cls <- d1>d2 iv. Compute the error of the "estimated" classifier: argmin(d(x,x1hat),d(x,x2hat)) (min distance to the estimated means of the two classes). By "errors" in the above is meant the (estimated) probability of error. The errors should be computed on a held-out set in iv: for each class, use 50 points each to estimate the means and the other 50 points to evaluate the classifier. Plot the errors on the same plot for d=2,5,10,50,100,500,1000,5000,10000. Does this support the theoretical results of Trunk?