## Introduction

Statistical Models in S was edited by John M. Chambers and Trevor J. Hastie.

The book’s chapters are:

- An appetizer
- Statistical models
- Data for models
- Linear models
- Analysis of variance; designed experiments
- Generalized linear models
- Generalized additive models
- Local regression models
- Tree-based models
- Nonlinear models

This page stores my notes as I work through the chapters of this book. I am using R 2.2.1 and R GUI 1.14 for Mac OS X.

## Data sets

It is my understanding that the data sets used in this book are included with S-PLUS. I can't verify this since I don't have a copy of S-PLUS. Many of the data sets are available in various R packages.

`car.test.frame`

library( package = rpart ) ?car.test.frame data( car.test.frame )

`cu.summary`

library( package = rpart ) ?cu.summary data( cu.summary );

`galaxy`

library( package = ElemStatLearn ) ?galaxy data( galaxy );

`market.survey`

This data set is probably available from S-PLUS, but I haven't checked yet.

`solder`

The `solder`

data set from package `faraway`

contains 900 rows. The
`solder`

data set from package `rpart`

contains 720 rows and is
identical to the `solder.balance`

data set from S-PLUS, except that some of the
values of the `row.names`

attribute are different.

Since the `solder.balance`

data set is a subset of the original
`solder`

data set (p. 49), it is likely that the `solder`

data set from package `faraway`

is the original data set.

library( package = faraway ) ?solder data( solder )

`solder.balance`

One source for this data set is S-PLUS. This data set can be exported from S-PLUS with the following commands:

library( data ) write.table( solder.balance, file = "solder.balance.txt", sep = "\t" );

The data can be imported into R with the following commands:

solder.balance <- read.table( file = "solder.balance.txt", sep = "\t", header = TRUE, row.names = 1 )

The data are identical to the `solder`

data set from package
`rpart`

, except that some of the values of the `row.names`

attribute are different.

## Chapter 1 An Appetizer

### § 1.1

Some of the commands in R 2.2.1 work differently than the equivalent commands
in S in 1991. The default behavior for `plot()`

on a data.frame of this type
is to call `pairs()`

.

## Load the data. Define the column classes because the Panel column contains ## numeric values that are actually factors. solder.balance <- read.table( file = "solder.balance.txt", sep = "\t", header = TRUE, row.names = 1, colClasses = c( NA, "factor", "factor", "factor", "factor", "factor", "numeric" ) ) ## View a graphical summary of the relationship between the response variable ## and the factor variables. get( getOption( "device" ) )() plot.design( solder.balance ) ## View box plots. get( getOption( "device" ) )() par( mfrow = c( 1, 2 ) ) plot( skips ~ Opening + Mask, data = solder.balance )

## Chapter 2

### § 2.2.3

At first, I thought that the function `model.matrix()`

extracts the design
matrix from an object such as an `lm`

object, but now I understand that
`model.matrix()`

creates the design matrix. The function requires a formula
and data.

For example, using the `trees`

data set in package `datasets`

and following the example for `model.matrix()`

:

require( package = datasets ) ff <- log( Volume ) ~ log( Height ) + log( Girth ) mm <- model.matrix( object = ff, data = trees ) mm

## Chapter 3

### § 3.1.1

## Load the cu.summary data into memory. library( package = rpart ) data( cu.summary )

### § 3.1.2

## Load the solder data into memory. library( package = faraway ) ?solder data( solder )

### § 3.1.3

I can’t find a source of the `market.survey`

and `market.frame`

data.

## Chapter 8

### § 8.2.4

## Load the galaxy data into memory. library( package = ElemStatLearn ) ?galaxy data( galaxy ) ## Add random noise to the data before plotting it. ## Note that the command for R 2.2.1 requires specification of hte ## amount argument in order to get results similar to what is shown ## in the book. ew.jittered <- jitter( galaxy$east.west, factor = 1/2, amount = 0 ) ns.jittered <- jitter( galaxy$north.south, factor = 1/2, amount = 0 ) lim <- range( ew.jittered, ns.jittered ) plot( ew.jittered, ns.jittered, xlim = lim, ylim = lim )