Homework #5 Due 3/4, via email to me. 1. Using passive.dat, build and test a passive fingerprinting OS classifier. Use only the variables from 3 to 31 in the classifier (you may choose to select a subset of these), and variable 2 as the class label. Use crossvalidation to evaluate your classifier. For example, if you did: library(class) a <- read.table("passive.dat") k <- knn.cv(a[,3:31],a[,2],k=1) sum(k != a[,2]) you would find that you make 236 errors, and would have a confusion matrix like: 0 1 2 3 4 5 6 0 4 0 0 0 1 0 32 1 0 4 1 0 1 1 12 2 0 1 5 0 0 0 13 3 0 0 0 0 0 0 6 4 0 2 0 0 2 1 23 5 0 1 0 0 1 2 12 6 40 9 20 7 27 25 3316 So, windows (class 6) is incorrectly classified as apple (class 0) 32 times. Note that saying "windows" no matter what the data says gets 155 errors -- that is, there are 155 non-windows machines in the data. Your task is to beat 155. Describe your method, why you picked the variables you did, etc. Give the confusion matrix, as well as the number wrong. 2. Reduce the problem to the two class problem: windows and not-windows. Still using crossvalidation, how well can you do? 3. Using the first 1000 windows machines, build an anomaly detector as follows: win <- a[which(a[,2]==6),] train <- win[1:1000,] test <- rbind(win[-(1:1000),],a[which(a[,2] != 6),]) So, train has the 1000 "normal" (windows) boxes, and test has all the remaining machines. You will score your anomaly detector as follows: windows machines will be considered normal, everything else will be abnormal. So incorrectly calling a windows machine abnormal is a false alarm. Correctly calling a windows machine "normal" or correctly calling a non-windows machine "abnormal" are both correct detections. Here is the anomaly detector: You will set a threshold tau. More on this in a bit. Compute the distance between x and the training data in train, using the variables 3-31. If min(d(x,train))