Introduction
While I was working through the examples in chapter 9 of
Introductory Statistics with R, I wanted to plot the correlations
of the variables in a data set. However, the results I obtained were not entirely
satisfactory, so I wrote my own function. All examples below use the cystfibr
data set from the ISwR package.
Using image()
My first approach was to use the image() function directly. By default,
image() uses the colors generated by heat.colors(). The
result is that correlations of 1.0 are plotted in white, correlations of 0.0 are
plotted in orange, and correlations of –1.0 are plotted in red. These colors make it
difficult to contrast positive and negative correlations. Furthermore, the colors
are not drawn in the same order as the text output from the cor() function;
the diagonal of 1's runs from the lower left to the upper right instead of from the
upper left to the lower right.
## Load the data.
library( package = ISwR )
data( cystfibr )
## View the correlation matrix.
old.digits = getOption( "digits" )
options( digits = 2 )
cor( cystfibr, method = "spearman" )
options( digits = old.digits )
## Use image() to create the color plot of the correlation matrix.
get( getOption( "device" ) )()
image(
z = cor( x = cystfibr, method = "spearman" ),
axes = FALSE,
zlim = c( -1.0, 1.0 ) )
axis(
side = 1,
labels = names( cystfibr ),
at = seq( 0, 1, length = length( names( cystfibr ) ) ),
cex.axis = 0.8 )
axis(
side = 2,
labels = names( cystfibr ),
at = seq( 0, 1, length = length( names( cystfibr ) ) ),
cex.axis = 0.8 )
box()
The text output of the correlation matrix is:
> cor( cystfibr, method = "spearman" )
age sex height weight bmp fev1 rv frc tlc pemax
age 1.00 -0.163 0.93 0.90 0.51 0.30 -0.58 -0.72 -0.493 0.52
sex -0.16 1.000 -0.23 -0.20 -0.15 -0.54 0.26 0.15 0.056 -0.26
height 0.93 -0.235 1.00 0.96 0.57 0.43 -0.62 -0.66 -0.473 0.59
weight 0.90 -0.201 0.96 1.00 0.73 0.46 -0.70 -0.67 -0.485 0.49
bmp 0.51 -0.146 0.57 0.73 1.00 0.56 -0.69 -0.55 -0.494 0.22
fev1 0.30 -0.542 0.43 0.46 0.56 1.00 -0.68 -0.60 -0.440 0.31
rv -0.58 0.257 -0.62 -0.70 -0.69 -0.68 1.00 0.85 0.589 -0.31
frc -0.72 0.151 -0.66 -0.67 -0.55 -0.60 0.85 1.00 0.672 -0.38
tlc -0.49 0.056 -0.47 -0.48 -0.49 -0.44 0.59 0.67 1.000 -0.15
pemax 0.52 -0.258 0.59 0.49 0.22 0.31 -0.31 -0.38 -0.148 1.00
The graphical output produced by image() is:
Correlation of variables from cystfibr data
Using plot.cor() from package sma
I used RSiteSearch() to find a better solution and hit upon the function
plot.cor() from the sma package, written by
Sandrine Dudoit.
This function provides a good solution for my problem; it generates a color image of the
correlation matrix, with red for a correlation of 1.0, black for a correlation of 0.0, and
green for a correlation of –1.0. The colors are drawn in the same order as the text output,
with the diagonal of 1's running from the upper left to the lower right.
## Load the data.
library( package = ISwR )
data( cystfibr )
## Use plot.cor() to create a color plot of the correlation matrix.
library( package = sma )
## Open a default device.
## Adjust the character size to be more readable for the quartz() device for
## Mac OS X.
get( getOption( "device" ) )()
par( cex = 1.2 )
## Create the plot.
plot.cor(
x = cor( cystfibr, method = "spearman" ),
new = FALSE,
labels = names( cystfibr ),
zlim = c( -1.0, 1.0 ) )
The output is:
Correlation of variables from cystfibr data
This plot.cor() function provides most of what I desire, and I learned a lot
by reading the source code. However, I desired two additional features:
- The use of red and green makes it difficult for persons with red-green color blindness to view the results. I would prefer a function that allows the user to choose the three base colors used to generate the plot.
- The function does not create a legend that indicates the correspondence between colors and the correlation value.
Using plotCorrelation()
The code below generates a color image of the correlation matrix using a range of colors from blue through black to yellow, with blue = 1.0, black = 0.0, and yellow = –1.0. These colors are good for people with red-green color blindness, and they make the correlations easier to identify.
When plotting a matrix using image(), the x coordinates come from
the column numbers of the matrix and the y coordinates come from the
row numbers of the matrix. So the value of corr[1,1] is drawn at
coordinates (1,1) of the image (the lower left corner), corr[2,1] is drawn at
coordinates (1,2), and corr[10,1] is drawn at coordinates (1,10) of the image
(the upper left corner). Imagine that the matrix is plotted to an image, but the image is
rotated 90 degrees counterclockwise.
Therefore, in order to draw the image in the same order as the values in the matrix (with the diagonal of 1’s running from the upper left to the lower right), it is necessary to reverse the order of the columns of the matrix before drawing.
## Load the data.
library( package = ISwR )
data( cystfibr )
## Start a default device.
get( getOption( "device" ) )()
## Create the colors. The blue colors are designed to be lighter than pure
## blue by adding equal amounts of red and green.
red <-
c(
seq( from = 1.0, to = 0.0, by = -0.1 ),
seq( from = 0.05, to = 0.5, length = 10 ) )
green <- red
blue <-
c(
rep( x = 0.0, times = 10 ),
seq( from = 0.0, to = 1.0, by = 0.1 ) )
colors <- rgb( red = red, green = green, blue = blue )
## Create the Spearman correlation matrix.
corr <- cor( x = cystfibr, method = "spearman" )
## Reverse the columns of the matrix so it will be drawn correctly.
n = ncol( corr )
corr2 <- corr[ , n:1 ]
## Create the image.
image(
z = corr2,
axes = FALSE,
col = colors,
zlim = c( -1.0, 1.0 ) )
## Add labels for the y axis.
axis(
side = 2,
labels = colnames( corr2 ),
at = seq( 0, 1, length = length( rownames( corr2 ) ) ),
cex.axis = 0.8,
las = 2)
## Add labels for the x axis, but along the top.
axis(
side = 3,
labels = rownames( corr2 ),
at = seq( 0, 1, length = length( colnames( corr2 ) ) ),
cex.axis = 0.8,
las = 2 )
Correlation of variables from cystfibr data