Introduction
Modern Applied Statistics with
S (MASS) was written by W. N. Venables and B. D. Ripley. The R scripts that
accompany the book are located in the scripts subdirectory of the MASS
package.
This page stores my notes as I work through the chapters of this book. I am using R 2.2.1 and R GUI 1.14 for Mac OS X.
Loading the MASS Package
The command for loading the MASS package is:
library( package = "MASS" )
Chapter 1
The R script for this chapter is library/MASS/scripts/ch01.R.
## Run the script in computer-independent manner
scriptPath <-
paste(
installed.packages()[ "MASS", "LibPath" ],
'/MASS/scripts/ch01.R',
sep = ""
)
source( file = scriptPath, echo = TRUE )
Chapter 2
The R script for this chapter is library/MASS/scripts/ch02.R.
## Run the script in computer-independent manner
scriptPath <-
paste(
installed.packages()[ "MASS", "LibPath" ],
'/MASS/scripts/ch02.R',
sep = ""
)
source( file = scriptPath, echo = TRUE )
Chapter 3
The R script for this chapter is library/MASS/scripts/ch03.R.
## Run the script in computer-independent manner
scriptPath <-
paste(
installed.packages()[ "MASS", "LibPath" ],
'/MASS/scripts/ch03.R',
sep = ""
)
source( file = scriptPath, echo = TRUE )
Chapter 4
The R script for this chapter is library/MASS/scripts/ch04.R. The script
requires that the following package be installed:
- lattice
The script contains an error and will not run to completion when I use R 2.2.1 with R GUI 1.14 under Mac OS X 10.4.5. To fix the problem, comment out lines 19 and 20 of the script:
# if(interactive())
# legend(locator(1), c("Males", "Females"), fill = c(2, 3))
When the script is run, it creates the PostScript output file ch04.ps in R’s
working directory. The script omits turning off the tellis device, which means the
PostScript file may not be viewable until the dev.off() function is called.
## Run the script in computer-independent manner
scriptPath <-
paste(
installed.packages()[ "MASS", "LibPath" ],
'/MASS/scripts/ch04.R',
sep = ""
)
source( file = scriptPath, echo = TRUE )
## Close the trellis device to close the PostScript file
dev.off()
Chapter 5
The R script for this chapter is library/MASS/scripts/ch05.R. The script
requires that the following packages be installed:
- lattice
- polspline
- boot
When the script is run, it creates the PostScript output file ch05.ps in R’s
working directory. The script omits turning off the tellis device, which means the
PostScript file may not be viewable until the dev.off() function is called.
## Run the script in computer-independent manner
scriptPath <-
paste(
installed.packages()[ "MASS", "LibPath" ],
'/MASS/scripts/ch05.R',
sep = ""
)
source( file = scriptPath, echo = TRUE )
## Close the trellis device to close the PostScript file
dev.off()
Chapter 6
The R script for this chapter is library/MASS/scripts/ch06.R. The script
requires that the following packages be installed:
- lattice
- boot
- multcomp
- mvtnorm
When the script is run, it creates the PostScript output file ch06.ps in R’s
working directory. The script omits turning off the tellis device, which means the
PostScript file may not be viewable until the dev.off() function is called.
## Run the script in computer-independent manner
scriptPath <-
paste(
installed.packages()[ "MASS", "LibPath" ],
'/MASS/scripts/ch06.R',
sep = ""
)
source( file = scriptPath, echo = TRUE )
## Close the trellis device to close the PostScript file
dev.off()
Chapter 11
§ 11.1
This section covers visualization methods, including principal component analysis.
## Principal component analysis of the iris3 data. ## The iris3 data are stored in a 3-dimensional array. Convert the data to a ## matrix. ir <- rbind( iris3[ , , 1 ], iris3[ , , 2 ], iris3[ , , 3 ] ) ## Create a vector of factors used for plotting below. ir.species <- factor( c( rep( "s", 50 ), rep( "c", 50 ), rep( "v", 50 ) ) ) ## Create a vector of colors used for plotting below. ir.colors <- c( rep( "chartreuse4", 50 ), rep( "blue", 50 ), rep( "deeppink1", 50 ) ) ## Perform principal component analysis using princomp(). The parentheses ## around the call force printing of the ir.pca object. ( ir.pca <- princomp( log( ir ), cor = TRUE ) ) ## View a summary of the princomp object. (See ?summary.princomp for more ## details.) summary( ir.pca ) ## Create the scree plot of the princomp object. get( getOption( "device" ) )() plot( ir.pca ) ## View the loadings of the principal component analysis. ## This actually calls print( ir.pca$loadings ). loadings( ir.pca ) ## Call the predict() method to convert the data to principal component ## analysis coordinates ("principal components"). ir.pc <- predict( ir.pca ) ## Plot the first two principal components on an equal scale plot without ## displaying the points. This plots the axes and labels. ## The eqscplot() function is found in package "MASS". library( package = "MASS" ) get( getOption( "device" ) )() eqscplot( x = ir.pc[ , 1:2 ], type = "n", xlab = "first principal component", ylab = "second principal component", main = "Principal components of log-transformed iris3 data" ) ## Plot the points using single characters that stand for the species. ## The code in the book (p. 304) calls codes() to convert the factor levels ## to integers, but that function is now defunct. One alternative is to ## use as.integer() instead. ## Note that the resultant figure is inverted around the horizontal axis ## compared to figure 11.1 of the book. This figure is not displayed below. text( x = ir.pc[ , 1:2 ], labels = as.character( ir.species ), col = 3 + as.integer( ir.species ) ) ## Create a pairwise plot of all principal components. ## Use the ir.colors vector for setting the colors. get( getOption( "device" ) )() pairs( x = ir.pc, col = ir.colors ) ## Create a plot of the first two principal components ## using the ir.colors vector for setting the colors. get( getOption( "device" ) )() eqscplot( x = ir.pc[ , 1:2 ], type = "n", xlab = "first principal component", ylab = "second principal component" ) text( x = ir.pc[ , 1:2 ], labels = as.character( ir.species ), col = ir.colors )
Scree plot of principal component analysis of log-transformed iris3 data
Pairwise plots of principal components of log-transformed iris3 data
First two principal components of log-transformed iris3 data
Chapter 12
The R script for this chapter is library/MASS/scripts/ch12.R. The script
requires that the following packages be installed:
- e1071
- tree
- rpart
When the script is run, it creates the PostScript output file ch12.ps
in R’s working directory. The script omits turning off the tellis device, which means the
PostScript file may not be viewable until the dev.off() function is called.
## Run the script in computer-independent manner
scriptPath <-
paste(
installed.packages()[ "MASS", "LibPath" ],
'/MASS/scripts/ch12.R',
sep = ""
)
source( file = scriptPath, echo = TRUE )
## Close the trellis device to close the PostScript file
dev.off()
§ 12.5
This section covers support vector machines. Play with the formula to see which predictor variables give the best response. Also, explore the plots; you can view only two dimensions at a time; for the crabs data, there are 20 different views.
> library( package = MASS ) # for crabs data
> library( package = e1071 ) # for svm
Loading required package: class
> data( crabs ) # load crabs data
> lcrabs <- crabs # make a copy
> lcrabs[, 4:8] <- log( crabs[, 4:8]) # log transform all rows, columns 4 - 8
> pairs( x = crabs[, 4:8 ] ) # pairwise scatter plots of raw data
> pairs( x = lcrabs[, 4:8 ] ) # pairwise scatter plots of transformed data
>
> # Support vector machines
>
> lcrabs.svm <-
svm(
formula = sp ~ FL + RW + CL + CW + BD,
data = lcrabs,
cost = 100,
gamma = 1 )
> summary( lcrabs.svm )
Call:
svm(formula = sp ~ FL + RW + CL + CW + BD, data = lcrabs, cost = 100, gamma = 1)
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 100
gamma: 1
Number of Support Vectors: 42
( 24 18 )
Number of Classes: 2
Levels:
B O
> table(
true = lcrabs$sp,
predicted = predict( lcrabs.svm ) )
predicted
true B O
B 100 0
O 0 100
>
> # Drop one variable from the model
>
> lcrabs.svm <-
svm(
formula = sp ~ FL + RW + CL + CW,
data = lcrabs,
cost = 100,
gamma = 1 )
> table(
true = lcrabs$sp,
predicted = predict( lcrabs.svm ) )
predicted
true B O
B 100 0
O 0 100
> plot(
lcrabs.svm,
data = lcrabs,
formula = FL ~ RW,
slice = list(
CL = mean( lcrabs$CL ),
CW = mean( lcrabs$CW ),
BD = mean( lcrabs$BD ) ) )
>
> # Predict sex
>
> lcrabs.svm <-
svm(
formula = sex ~ FL + RW + CL + CW + BD,
data = lcrabs,
cost = 100,
gamma = 1 )
> table(
true = lcrabs$sex,
predicted = predict( lcrabs.svm ) )
predicted
true F M
F 100 0
M 0 100