Probability and Statistics

See also books on the S programming language and on the SAS programming language.

Publisher’s information Comments
Applied Regression Analysis and Other Multivariable Methods
3rd edition
by David G. Kleinbaum, Lawrence L. Kupper, Keith E. Muller, and Azhar Nizam
Duxbury Press
ISBN 0-534-20910-6

This is the text chosen for Boston University's CAS MA 684, Multivariate Analysis, course, which I took in the spring semester of 2006. The course was an applied statistics course for statistics majors and graduate students in other fields who need to perform statistical analysis of their data.

The book provides a nice mix of theory and application. The chapters are short and concentrate on what main idea each. There is a lot of SAS output included in the book, but strangely the book doesn't include the SAS programs used to generate the output.

The book’s chapters are:

  1. Concepts and examples of research
  2. Classification of variables and the choice of analysis
  3. Basic statistics: a review
  4. Introduction to regression analysis
  5. Straight-line regression analysis
  6. The correlation coefficient and straight-line regression analysis
  7. The analysis-of-variance table
  8. Multiple regression analysis: general considerations
  9. Testing hypotheses in multiple regression
  10. Correlations: multiple, partial, and multiple partial
  11. Confounding and interaction in regression
  12. Regression diagnostics
  13. Polynomial regression
  14. Dummy variables in regression
  15. Analysis of covariance and other methods for adjusting continuous data
  16. Selecting the best regression equation
  17. One-way analysis of variance
  18. Randomized blocks: special case of two-way ANOVA
  19. Two-way ANOVA with equal cell numbers
  20. Two-way ANOVA with unequal cell numbers
  21. Analysis of repeated measures data
  22. The method of maximum likelihood
  23. Logistic regression analysis
  24. Poisson regression analysis
The Elements of Statistical Learning
by Trevor Hastie, Robert Tibshirani, and Jerome Friedman
October 2001
ISBN 0-387-95284-5

This is a beautifully produced book with 200 full color figures. It will take me years to master all the methods presented.

The book’s chapters are:

  1. Introduction
  2. Data manipulation
  3. The S language
  4. Graphics
  5. Univariate statistics
  6. Linear statistical models
  7. Generalized linear models
  8. Non-linear and smooth regression
  9. Tree-based methods
  10. Random and mixed effects
  11. Exploratory multivariate analysis
  12. Classification
  13. Survival analysis
  14. Time series analysis
  15. Spatial statistics
  16. Optimization
A First Course in Probability
6th edition
by Sheldon Ross
Pearson Prentice Hall
ISBN 0-13-033851-6

This is an excellent book on probability. Ross takes some interesting tangents to solve unusual problems.

The chapters are:

  1. Combinatorial analysis
  2. Axioms of probability
  3. Conditional probability and independence
  4. Random variables
  5. Continuous random variables
  6. Jointly distributed random variables
  7. Properties of expectation
  8. Limit theorems
  9. Additional topics in probability
  10. Simulation
Introduction to Mathematical Statistics
6th edition
by Robert V. Hogg, Joseph W. McKean, and Allen T. Craig
Pearson Prentice Hall
ISBN 0-13-008507-3

This book was the required text for Boston University's MET MA 582, Mathematical Statistics, course.

This is a difficult book with enough material for three semesters. The material is well-written and rewards hard work. Most unusual, the book contains some S code that can be run using R or S-PLUS.

The chapters are:

  1. Probability and distributions
  2. Multivariate distributions
  3. Some special distributions
  4. Unbiasedness, consistency, and limiting distributions
  5. Some elementary statistical inference
  6. Maximum likelihood methods
  7. Sufficiency
  8. Optional tests of hypotheses
  9. Inferences about normal models
  10. Nonparametric statistics
  11. Bayesian statistics
  12. Linear models
Introduction to Probability
2nd edition
by Charles M. Grinstead and J. Laurie Snell
American Mathematical Society
ISBN 0-8218-0749-8

A free PDF version of this book is now available.

The chapters are:

  1. Discrete probability distributions
  2. Continuous probability distributions
  3. Combinatorics
  4. Conditional probability
  5. Distributions and densities
  6. Expected value and variance
  7. Sums of random variables
  8. Law of large numbers
  9. Central limits theorem
  10. Generating functions
  11. Markov chains
  12. Random walks
John E. Freund’s Mathematical Statistics with Applications
7th edition
by Irwin Miller and Maryless Miller
Pearson Prentice Hall
ISBN 0-13-142706-7

When I was taking Mathematical Statistics at Boston University, sometimes the assigned text (Introduction to Mathematical Statistics by Hogg, McKean, and Craig) was too difficult. I purchased this book as a supplementary text and discovered that this book had probably provided the source material for the professor’s lectures.

This book covers the same material as Hogg, McKean, and Craig, but on a less rigorous level. There is less emphasis on theory and more emphasis on application.

The chapters are:

  1. Introduction
  2. Probability
  3. Probability distributions and probability densities
  4. Mathematical expectation
  5. Special probability distributions
  6. Special probability densities
  7. Functions of random variables
  8. Sampling distributions
  9. Decision theory
  10. Point estimation
  11. Interval estimation
  12. Hypothesis testing
  13. Tests of hypothesis involving means, variances, and proportions
  14. Regression and correlation
  15. Design and analysis of experiments
  16. Nonparametric tests
Mathematical Statistics and Data Analysis
2nd Edition
by John A. Rice
September 1994
Duxbury Press
ISBN 0-534-20934-3

This statistics textbook is recommended for juniors, seniors, or graduate students who have had a year of introductory statistics, three semesters of calculus (through multivariate calculus), and a semester of linear algebra. I use this book regularly, and I recommend it highly.

Mathematical Statistics with Mathematica
by Colin Rose and Murray D. Smith
March 2002
ISBN 0-387-95234-9

The book includes a CD with mathStatica and a trial version of Mathematica 4.

Multivariate Statistical Methods
3rd Edition
by Donald F. Morrison
November 1990
McGraw-Hill, Inc.
ISBN 0-07-043187-6

Morrison provides an exceptionally clear presentation of multivariate statistics, including discriminant functions, covariance matrices, principal component analysis, and factor analysis. Linear algebra and matrices are used extensively throughout the book. This book is sadly out of print; I obtained my copy from an online used book dealer.

Principal Component Analysis
2nd edition
by I. T. Jolliffe
July 2002
ISBN 0-387-95442-2

This book provides exhaustive coverage of principal component analysis, a popular method of reducing the dimensionality of multivariate data.

Principles of Data Mining
by David Hand, Heikki Mannila, and Padhraic Smyth
The MIT Press
ISBN 0-262-08290-X

This is a very readable introduction to methods used for data mining. The book’s chapters are:

  1. Introduction
  2. Measurement and data
  3. Visualizing and exploring data
  4. Data analysis and uncertainty
  5. A systematic overview of data mining algorithms
  6. Models and patterns
  7. Score functions for data mining algorithms
  8. Search and optimization methods
  9. Descriptive modeling
  10. Predictive modeling for classification
  11. Predictive modeling for regression
  12. Data organization and databases
  13. Finding patterns and rules
  14. Retrieval by content
Stat Labs: Mathematical Statistics Through Applications
by Deborah Nolan and Terry Speed
ISBN 0-387-98974-9

This book provides a series of in-depth case studies that present real-world data sets for analysis by the student. The book’s web site includes special intructions for users of R and S-PLUS.

Statistics for Experimenters
by George E. P. Box, William G. Hunter, and J. Stuart Hunter
October 1978
John Wiley & Sons
ISBN 0-471-09315-7

A classic, this book is a highly readable introduction to experimental design. The 2nd edition was published in 2005.

Statistical Methods in Bioinformatics
by Warren J. Ewens and Gregory R. Grant
July 2001
ISBN 0-387-95229-2

This book presents probability and statistics for sequence analysis. A second edition has since been published.