Notes on Modern Applied Statistics with S

Table of Contents

Introduction

Modern Applied Statistics with S (MASS) was written by W. N. Venables and B. D. Ripley. The R scripts that accompany the book are located in the scripts subdirectory of the MASS package.

This page stores my notes as I work through the chapters of this book. I am using R 2.2.1 and R GUI 1.14 for Mac OS X.

Loading the MASS Package

The command for loading the MASS package is:

library( package = "MASS" )

Chapter 1

The R script for this chapter is library/MASS/scripts/ch01.R.

##  Run the script in computer-independent manner

scriptPath <-
    paste(
        installed.packages()[ "MASS", "LibPath" ],
        '/MASS/scripts/ch01.R',
        sep = ""
    )
source( file = scriptPath, echo = TRUE )

Chapter 2

The R script for this chapter is library/MASS/scripts/ch02.R.

##  Run the script in computer-independent manner

scriptPath <-
    paste(
        installed.packages()[ "MASS", "LibPath" ],
        '/MASS/scripts/ch02.R',
        sep = ""
    )
source( file = scriptPath, echo = TRUE )

Chapter 3

The R script for this chapter is library/MASS/scripts/ch03.R.

##  Run the script in computer-independent manner

scriptPath <-
    paste(
        installed.packages()[ "MASS", "LibPath" ],
        '/MASS/scripts/ch03.R',
        sep = ""
    )
source( file = scriptPath, echo = TRUE )

Chapter 4

The R script for this chapter is library/MASS/scripts/ch04.R. The script requires that the following package be installed:

The script contains an error and will not run to completion when I use R 2.2.1 with R GUI 1.14 under Mac OS X 10.4.5. To fix the problem, comment out lines 19 and 20 of the script:

# if(interactive())
#     legend(locator(1), c("Males", "Females"), fill = c(2, 3))

When the script is run, it creates the PostScript output file ch04.ps in R’s working directory. The script omits turning off the tellis device, which means the PostScript file may not be viewable until the dev.off() function is called.

##  Run the script in computer-independent manner

scriptPath <-
    paste(
        installed.packages()[ "MASS", "LibPath" ],
        '/MASS/scripts/ch04.R',
        sep = ""
    )
source( file = scriptPath, echo = TRUE )

##  Close the trellis device to close the PostScript file

dev.off()

Chapter 5

The R script for this chapter is library/MASS/scripts/ch05.R. The script requires that the following packages be installed:

When the script is run, it creates the PostScript output file ch05.ps in R’s working directory. The script omits turning off the tellis device, which means the PostScript file may not be viewable until the dev.off() function is called.

##  Run the script in computer-independent manner

scriptPath <-
    paste(
        installed.packages()[ "MASS", "LibPath" ],
        '/MASS/scripts/ch05.R',
        sep = ""
    )
source( file = scriptPath, echo = TRUE )

##  Close the trellis device to close the PostScript file

dev.off()

Chapter 6

The R script for this chapter is library/MASS/scripts/ch06.R. The script requires that the following packages be installed:

When the script is run, it creates the PostScript output file ch06.ps in R’s working directory. The script omits turning off the tellis device, which means the PostScript file may not be viewable until the dev.off() function is called.

##  Run the script in computer-independent manner

scriptPath <-
    paste(
        installed.packages()[ "MASS", "LibPath" ],
        '/MASS/scripts/ch06.R',
        sep = ""
    )
source( file = scriptPath, echo = TRUE )

##  Close the trellis device to close the PostScript file

dev.off()

Chapter 11

§ 11.1

This section covers visualization methods, including principal component analysis.

##  Principal component analysis of the iris3 data.

##  The iris3 data are stored in a 3-dimensional array. Convert the data to a
##  matrix.

ir <- rbind( iris3[ , , 1 ], iris3[ , , 2 ], iris3[ , , 3 ] )

##  Create a vector of factors used for plotting below.

ir.species <- factor( c( rep( "s", 50 ), rep( "c", 50 ), rep( "v", 50 ) ) )

##  Create a vector of colors used for plotting below.

ir.colors <- c( rep( "chartreuse4", 50 ), rep( "blue", 50 ), rep( "deeppink1", 50 ) )

##  Perform principal component analysis using princomp(). The parentheses
##  around the call force printing of the ir.pca object.

( ir.pca <- princomp( log( ir ), cor = TRUE ) )

##  View a summary of the princomp object. (See ?summary.princomp for more
##  details.)

summary( ir.pca )

##  Create the scree plot of the princomp object.

get( getOption( "device" ) )()
plot( ir.pca )

##  View the loadings of the principal component analysis.
##  This actually calls print( ir.pca$loadings ).

loadings( ir.pca )

##  Call the predict() method to convert the data to principal component
##  analysis coordinates ("principal components").

ir.pc <- predict( ir.pca )

##  Plot the first two principal components on an equal scale plot without
##  displaying the points. This plots the axes and labels.
##  The eqscplot() function is found in package "MASS".

library( package = "MASS" )

get( getOption( "device" ) )()
eqscplot(
    x    = ir.pc[ , 1:2 ],
    type = "n",
    xlab = "first principal component",
    ylab = "second principal component",
    main = "Principal components of log-transformed iris3 data" )

##  Plot the points using single characters that stand for the species.
##  The code in the book (p. 304) calls codes() to convert the factor levels
##  to integers, but that function is now defunct. One alternative is to
##  use as.integer() instead.

##  Note that the resultant figure is inverted around the horizontal axis
##  compared to figure 11.1 of the book. This figure is not displayed below.

text(
    x      = ir.pc[ , 1:2 ],
    labels = as.character( ir.species ),
    col    = 3 + as.integer( ir.species ) )

##  Create a pairwise plot of all principal components.
##  Use the ir.colors vector for setting the colors.

get( getOption( "device" ) )()
pairs(
    x   = ir.pc,
    col = ir.colors )

##  Create a plot of the first two principal components
##  using the ir.colors vector for setting the colors.

get( getOption( "device" ) )()
eqscplot(
    x    = ir.pc[ , 1:2 ],
    type = "n",
    xlab = "first principal component",
    ylab = "second principal component" )

text(
    x      = ir.pc[ , 1:2 ],
    labels = as.character( ir.species ),
    col    = ir.colors )
Scree plot of principal component analysis of log-transformed iris3 data

Scree plot of principal component analysis of log-transformed iris3 data

Pairwise plots of principal components of log-transformed iris3 data

Pairwise plots of principal components of log-transformed iris3 data

First two principal components of log-transformed iris3 data

First two principal components of log-transformed iris3 data

Chapter 12

The R script for this chapter is library/MASS/scripts/ch12.R. The script requires that the following packages be installed:

When the script is run, it creates the PostScript output file ch12.ps in R’s working directory. The script omits turning off the tellis device, which means the PostScript file may not be viewable until the dev.off() function is called.

##  Run the script in computer-independent manner

scriptPath <-
    paste(
        installed.packages()[ "MASS", "LibPath" ],
        '/MASS/scripts/ch12.R',
        sep = ""
    )
source( file = scriptPath, echo = TRUE )

##  Close the trellis device to close the PostScript file

dev.off()

§ 12.5

This section covers support vector machines. Play with the formula to see which predictor variables give the best response. Also, explore the plots; you can view only two dimensions at a time; for the crabs data, there are 20 different views.

> library( package = MASS )            # for crabs data
> library( package = e1071 )           # for svm
Loading required package: class
> data( crabs )                        # load crabs data
> lcrabs <- crabs                      # make a copy
> lcrabs[, 4:8] <- log( crabs[, 4:8])  # log transform all rows, columns 4 - 8
> pairs( x = crabs[, 4:8 ] )           # pairwise scatter plots of raw data
> pairs( x = lcrabs[, 4:8 ] )          # pairwise scatter plots of transformed data
>
> # Support vector machines
>
> lcrabs.svm <-
      svm(
          formula = sp ~ FL + RW + CL + CW + BD,
          data    = lcrabs,
          cost    = 100,
          gamma   = 1 )
> summary( lcrabs.svm )

Call:
svm(formula = sp ~ FL + RW + CL + CW + BD, data = lcrabs, cost = 100, gamma = 1)


Parameters:
   SVM-Type:  C-classification
 SVM-Kernel:  radial
       cost:  100
      gamma:  1

Number of Support Vectors:  42

 ( 24 18 )


Number of Classes:  2

Levels:
 B O



> table(
      true        = lcrabs$sp,
      predicted   = predict( lcrabs.svm ) )
    predicted
true   B   O
   B 100   0
   O   0 100
>
> # Drop one variable from the model
>
> lcrabs.svm <-
      svm(
          formula = sp ~ FL + RW + CL + CW,
          data    = lcrabs,
          cost    = 100,
          gamma   = 1 )
> table(
      true        = lcrabs$sp,
      predicted   = predict( lcrabs.svm ) )
    predicted
true   B   O
   B 100   0
   O   0 100
> plot(
                lcrabs.svm,
      data    = lcrabs,
      formula = FL ~ RW,
      slice   = list(
                    CL = mean( lcrabs$CL ),
                    CW = mean( lcrabs$CW ),
                    BD = mean( lcrabs$BD ) ) )
>
> # Predict sex
>
> lcrabs.svm <-
      svm(
          formula = sex ~ FL + RW + CL + CW + BD,
          data    = lcrabs,
          cost    = 100,
          gamma   = 1 )
> table(
      true        = lcrabs$sex,
      predicted   = predict( lcrabs.svm ) )
    predicted
true   F   M
   F 100   0
   M   0 100