## Introduction

While I was working through the examples in chapter 9 of
Introductory Statistics with R, I wanted to plot the correlations
of the variables in a data set. However, the results I obtained were not entirely
satisfactory, so I wrote my own function. All examples below use the `cystfibr`

data set from the `ISwR`

package.

## Using `image()`

My first approach was to use the `image()`

function directly. By default,
`image()`

uses the colors generated by `heat.colors()`

. The
result is that correlations of 1.0 are plotted in white, correlations of 0.0 are
plotted in orange, and correlations of –1.0 are plotted in red. These colors make it
difficult to contrast positive and negative correlations. Furthermore, the colors
are not drawn in the same order as the text output from the `cor()`

function;
the diagonal of 1's runs from the lower left to the upper right instead of from the
upper left to the lower right.

## Load the data. library( package = ISwR ) data( cystfibr ) ## View the correlation matrix. old.digits = getOption( "digits" ) options( digits = 2 ) cor( cystfibr, method = "spearman" ) options( digits = old.digits ) ## Use image() to create the color plot of the correlation matrix. get( getOption( "device" ) )() image( z = cor( x = cystfibr, method = "spearman" ), axes = FALSE, zlim = c( -1.0, 1.0 ) ) axis( side = 1, labels = names( cystfibr ), at = seq( 0, 1, length = length( names( cystfibr ) ) ), cex.axis = 0.8 ) axis( side = 2, labels = names( cystfibr ), at = seq( 0, 1, length = length( names( cystfibr ) ) ), cex.axis = 0.8 ) box()

The text output of the correlation matrix is:

> cor( cystfibr, method = "spearman" ) age sex height weight bmp fev1 rv frc tlc pemax age 1.00 -0.163 0.93 0.90 0.51 0.30 -0.58 -0.72 -0.493 0.52 sex -0.16 1.000 -0.23 -0.20 -0.15 -0.54 0.26 0.15 0.056 -0.26 height 0.93 -0.235 1.00 0.96 0.57 0.43 -0.62 -0.66 -0.473 0.59 weight 0.90 -0.201 0.96 1.00 0.73 0.46 -0.70 -0.67 -0.485 0.49 bmp 0.51 -0.146 0.57 0.73 1.00 0.56 -0.69 -0.55 -0.494 0.22 fev1 0.30 -0.542 0.43 0.46 0.56 1.00 -0.68 -0.60 -0.440 0.31 rv -0.58 0.257 -0.62 -0.70 -0.69 -0.68 1.00 0.85 0.589 -0.31 frc -0.72 0.151 -0.66 -0.67 -0.55 -0.60 0.85 1.00 0.672 -0.38 tlc -0.49 0.056 -0.47 -0.48 -0.49 -0.44 0.59 0.67 1.000 -0.15 pemax 0.52 -0.258 0.59 0.49 0.22 0.31 -0.31 -0.38 -0.148 1.00

The graphical output produced by `image()`

is:

Correlation of variables from cystfibr data

## Using `plot.cor()`

from package `sma`

I used `RSiteSearch()`

to find a better solution and hit upon the function
`plot.cor()`

from the `sma`

package, written by
Sandrine Dudoit.
This function provides a good solution for my problem; it generates a color image of the
correlation matrix, with red for a correlation of 1.0, black for a correlation of 0.0, and
green for a correlation of –1.0. The colors are drawn in the same order as the text output,
with the diagonal of 1's running from the upper left to the lower right.

## Load the data. library( package = ISwR ) data( cystfibr ) ## Use plot.cor() to create a color plot of the correlation matrix. library( package = sma ) ## Open a default device. ## Adjust the character size to be more readable for the quartz() device for ## Mac OS X. get( getOption( "device" ) )() par( cex = 1.2 ) ## Create the plot. plot.cor( x = cor( cystfibr, method = "spearman" ), new = FALSE, labels = names( cystfibr ), zlim = c( -1.0, 1.0 ) )

The output is:

Correlation of variables from cystfibr data

This `plot.cor()`

function provides most of what I desire, and I learned a lot
by reading the source code. However, I desired two additional features:

- The use of red and green makes it difficult for persons with red-green color blindness to view the results. I would prefer a function that allows the user to choose the three base colors used to generate the plot.
- The function does not create a legend that indicates the correspondence between colors and the correlation value.

## Using `plotCorrelation()`

The code below generates a color image of the correlation matrix using a range of colors from blue through black to yellow, with blue = 1.0, black = 0.0, and yellow = –1.0. These colors are good for people with red-green color blindness, and they make the correlations easier to identify.

When plotting a matrix using `image()`

, the `x`

coordinates come from
the *column numbers* of the matrix and the `y`

coordinates come from the
*row numbers* of the matrix. So the value of `corr[1,1]`

is drawn at
coordinates (1,1) of the image (the lower left corner), `corr[2,1]`

is drawn at
coordinates (1,2), and `corr[10,1]`

is drawn at coordinates (1,10) of the image
(the upper left corner). Imagine that the matrix is plotted to an image, but the image is
rotated 90 degrees counterclockwise.

Therefore, in order to draw the image in the same order as the values in the matrix (with the diagonal of 1’s running from the upper left to the lower right), it is necessary to reverse the order of the columns of the matrix before drawing.

## Load the data. library( package = ISwR ) data( cystfibr ) ## Start a default device. get( getOption( "device" ) )() ## Create the colors. The blue colors are designed to be lighter than pure ## blue by adding equal amounts of red and green. red <- c( seq( from = 1.0, to = 0.0, by = -0.1 ), seq( from = 0.05, to = 0.5, length = 10 ) ) green <- red blue <- c( rep( x = 0.0, times = 10 ), seq( from = 0.0, to = 1.0, by = 0.1 ) ) colors <- rgb( red = red, green = green, blue = blue ) ## Create the Spearman correlation matrix. corr <- cor( x = cystfibr, method = "spearman" ) ## Reverse the columns of the matrix so it will be drawn correctly. n = ncol( corr ) corr2 <- corr[ , n:1 ] ## Create the image. image( z = corr2, axes = FALSE, col = colors, zlim = c( -1.0, 1.0 ) ) ## Add labels for the y axis. axis( side = 2, labels = colnames( corr2 ), at = seq( 0, 1, length = length( rownames( corr2 ) ) ), cex.axis = 0.8, las = 2) ## Add labels for the x axis, but along the top. axis( side = 3, labels = rownames( corr2 ), at = seq( 0, 1, length = length( colnames( corr2 ) ) ), cex.axis = 0.8, las = 2 )

Correlation of variables from cystfibr data