Introduction
Statistical Models in S was edited by John M. Chambers and Trevor J. Hastie.
The book’s chapters are:
- An appetizer
- Statistical models
- Data for models
- Linear models
- Analysis of variance; designed experiments
- Generalized linear models
- Generalized additive models
- Local regression models
- Tree-based models
- Nonlinear models
This page stores my notes as I work through the chapters of this book. I am using R 2.2.1 and R GUI 1.14 for Mac OS X.
Data sets
It is my understanding that the data sets used in this book are included with S-PLUS. I can't verify this since I don't have a copy of S-PLUS. Many of the data sets are available in various R packages.
car.test.frame
library( package = rpart ) ?car.test.frame data( car.test.frame )
cu.summary
library( package = rpart ) ?cu.summary data( cu.summary );
galaxy
library( package = ElemStatLearn ) ?galaxy data( galaxy );
market.survey
This data set is probably available from S-PLUS, but I haven't checked yet.
solder
The solder data set from package faraway contains 900 rows. The
solder data set from package rpart contains 720 rows and is
identical to the solder.balance data set from S-PLUS, except that some of the
values of the row.names attribute are different.
Since the solder.balance data set is a subset of the original
solder data set (p. 49), it is likely that the solder
data set from package faraway is the original data set.
library( package = faraway ) ?solder data( solder )
solder.balance
One source for this data set is S-PLUS. This data set can be exported from S-PLUS with the following commands:
library( data ) write.table( solder.balance, file = "solder.balance.txt", sep = "\t" );
The data can be imported into R with the following commands:
solder.balance <-
read.table(
file = "solder.balance.txt",
sep = "\t",
header = TRUE,
row.names = 1 )
The data are identical to the solder data set from package
rpart, except that some of the values of the row.names
attribute are different.
Chapter 1 An Appetizer
§ 1.1
Some of the commands in R 2.2.1 work differently than the equivalent commands
in S in 1991. The default behavior for plot() on a data.frame of this type
is to call pairs().
## Load the data. Define the column classes because the Panel column contains
## numeric values that are actually factors.
solder.balance <-
read.table(
file = "solder.balance.txt",
sep = "\t",
header = TRUE,
row.names = 1,
colClasses = c( NA, "factor", "factor", "factor",
"factor", "factor", "numeric" ) )
## View a graphical summary of the relationship between the response variable
## and the factor variables.
get( getOption( "device" ) )()
plot.design( solder.balance )
## View box plots.
get( getOption( "device" ) )()
par( mfrow = c( 1, 2 ) )
plot( skips ~ Opening + Mask, data = solder.balance )
Chapter 2
§ 2.2.3
At first, I thought that the function model.matrix() extracts the design
matrix from an object such as an lm object, but now I understand that
model.matrix() creates the design matrix. The function requires a formula
and data.
For example, using the trees data set in package datasets
and following the example for model.matrix():
require( package = datasets ) ff <- log( Volume ) ~ log( Height ) + log( Girth ) mm <- model.matrix( object = ff, data = trees ) mm
Chapter 3
§ 3.1.1
## Load the cu.summary data into memory. library( package = rpart ) data( cu.summary )
§ 3.1.2
## Load the solder data into memory. library( package = faraway ) ?solder data( solder )
§ 3.1.3
I can’t find a source of the market.survey and market.frame
data.
Chapter 8
§ 8.2.4
## Load the galaxy data into memory. library( package = ElemStatLearn ) ?galaxy data( galaxy ) ## Add random noise to the data before plotting it. ## Note that the command for R 2.2.1 requires specification of hte ## amount argument in order to get results similar to what is shown ## in the book. ew.jittered <- jitter( galaxy$east.west, factor = 1/2, amount = 0 ) ns.jittered <- jitter( galaxy$north.south, factor = 1/2, amount = 0 ) lim <- range( ew.jittered, ns.jittered ) plot( ew.jittered, ns.jittered, xlim = lim, ylim = lim )