This file will be updated as I think of things you might want to know. 1) If you write a function in R that relies on a library, you can use require, instead of library, within the function. The difference is that library always loads, while require checks, and does a conditional load, only if you haven't already loaded the library. myfunction <- function(data,cls) { require(class) # whatever } 2) You can use negative indices in a matrix or vector to get "everything but": data[,-c(2,4,6)] returns all columns of data EXCEPT 2,4,6. This can be useful if you want to write your own crossvalidation. mycrossval <- function(data,cls,FUN=knn) { n <- length(cls) out <- rep(0,n) for(i in 1:n){ out[i] <- FUN(data[-i,],data[i,],cls[-i]) } out } Here I am calling the user provide classifier FUN (which defaults to knn, k=1) on the ith observation, training with all but the ith observation. So, if I've defined my own classifier, myclassifier, I can crossvalidate it with: est <- mycrossval(x,classes,myclassifier) Note: it is a (very) good idea to get in the habit of using the names of the variables, so you don't mix them up, or so that if you change the code by adding new variables, the old stuff will continue to work: est <- mycrossval(data=x,cls=classes,FUN=myclassifier) Also, you don't have to remember the order of the data. Along these lines, I always have to look up knn to remember the order of its arguments. Instead, you can do: knn(test=x,train=y,cl=classes,k=4) So, in the above, assuming you make sure that you name your variables in your functions consistently, you can change the call to FUN to: out[i] <- FUN(train=data[-i,],test=data[i,],cl=cls[-i]) Now, what about passing arguments to FUN? knn has the argument k that you might want to pass, and your classifier might have an argument gamma that changes something about the classifier. You don't want to have to hard-wire k and gamma into the crossvalidation function, and anyway, how do you tell if an argument makes sense for a given classifier? So, you use ...: mycrossval <- function(data,cls,FUN=knn,...) { n <- length(cls) out <- rep(0,n) for(i in 1:n){ out[i] <- FUN(train=data[-i,],test=data[i,],cl=cls[-i],...) } out } This passes any further arguments, if any, on to FUN. So: est <- mycrossval(data=x,cl=classes,FUN=myclassifier,gamma=1.27) est <- mycrossval(data=x,cl=classes,FUN=knn,k=3) est <- mycrossval(data=x,cl=classes,FUN=lda,method="t") will all work, giving the tested classifier the arguments after FUN. est <- mycrossval(data=x,cl=classes,k=3,FUN=knn) may work, but it's a good idea to always put the arguments passed to FUN at the end to keep them straight. There is a slightly better way to handle passing functions than that described above. The above passes the full function code in the variable FUN. Instead there is a way to just pass the function name, and use do.call to call that function. For our purposes, the above works just fine. If you are a die-hard computer scientist, you should look up the "right" way to do this. 3) You can load in all the definitions in a file using source: source("mystuff.R") I keep a file open in which I define my functions, then I :w it and source it to test the functions out. 4) If you do a lot of stuff every time you load R, you can put this in a .Rprofile file. This file, in your home directory (or in the directory in which you start R), is loaded in first when R is started. So you may choose to put library(class) library(MASS) etc in this. It is also a place to set various options (do a ?options to learn about this). For example, my .Rprofile on my laptop contains: options("browser"="/usr/local/firefox/firefox") 5) You can save things within an R session, without quitting, (at least) three different ways: Use write to write to a text file: write(t(data),file="datafile",ncol=ncol(data)) (Note: I had to take the transpose of data, since R is column major instead of row major. Bloody Fortran programmers.) Now you can use read.table, or scan, to read the data back in. Use save.image() to same the entire image of your session (as if you quit and answered "y" to the save question. This overwrites the .RData file, so that next time you start R it will look just like it looked when you called save.image(). I believe it also overwrites the history file, but haven't tested this. Note: you can give save.image a filename argument to save into something other than .RData. Use save() to save particular objects in a binary file: save(a,b,data,file="foo") Later, say the next time you start R, you can do: load("foo") and you will have the variables a,b, and data, just the way they were when you saved. 6) R remembers everything you did (at least back a few hundred commands) and all your variables, if you answer "y" when you quit. This means that when you start up next time, it's as if you never left. Sometimes you don't want all that stuff. You just want to start up a clean R. You can either delete the .RData file (which is permanent), or you can start R with (in Unix, I dunno what to do in Windoze): R --no-restore The file that contains your history is .Rhistory. This is a text file, so you can edit it and see what commands you typed. Or, in R, use the history command. 7) You can get a browser help with help.start() This brings up your browser. You can search, look at the libraries you have available, see some documentation, etc. Your browser option must be set right (see above) and you have to have JAVA for the search capability to work right. You can also use help.search() at the command line if you don't know the exact name of the function you are looking for: help.search("correlation") Note: this searches all the libraries, even those not loaded in, so if you find something, you may need to load the library before you can see the help for that function. 8) NEVER NEVER NEVER name anything T, TRUE, F, FALSE. R is pretty smart about variables and functions (you can have a variable plot without overwriting the plot function) but it is a good idea to take care with variable and function names. If you overwrite a function name for a variable, say: plot <- 5 Then call plot: plot(1:10) This will work just fine. But now if you want to see the plot function: plot you'll get 5. You have to rm(plot) in order to get rid of the variable that is masking the function. If you used the crossval code above instead of fixing it to use do.call, you will find that very weird things happen if you give a variable the same name as you classifier function. Never use the underscore _. It used to be that a _ b was the same as a <- b. So, although this has been changed, and you can now name things my_favorite_function or whatever, on old systems this will not work, and it's best to avoid it. 9) ls() will list your current objects, rm can delete them. On Unix systems x11() will bring up a plot window (maybe this works on Windoze, but I dunno). dev.off() kills the current plot window, and graphics.off() kills them all. See dev.cur and related functions for moving around between graphics windows. dev.print will let you print (to a printer or a file). In Windoze there is a menu function for this. See demo("graphics") for different ways to display data, manipulate the plots, etc. 10) In rare cases you may want to garbage collect yourself, instead of letting R do it automatically. See gc(), gcinfo(). 11) There are several ways to read data in. We've seen read.table: a <- read.table(file="mydata",sep=",",header=T) will read in comma separated data from a file in which the first line corresponds to column headers. You can also use scan: a <- matrix(scan("file"),byrow=T,ncol=7) will read in data from a file and put it in a 7 column matrix, row first. See also readBin and its relatives connections, readLines, and readChar for setting up file connections and reading various binary or other data. 12) Assignment in R is done via <-, but it can also be done with = a <- 9 a = 9 are the same. a == 9 is a comparison, so that's one reason to use the more cumbersum <-. 13) If you need to do optimization, check the functions optimize nlm nls optim uniroot nls takes a "formula". Several R functions do this. The idea is to specify a functional form in terms of named variables. For example y ~ x is the formula that says y depends linearly on x, so y = m*x + b and whatever routine you put this formula in will no doubt do some magic to estimate m and b. See the examples for usage suggestions.