Correlation Plot

Table of Contents

Introduction

While I was working through the examples in chapter 9 of Introductory Statistics with R, I wanted to plot the correlations of the variables in a data set. However, the results I obtained were not entirely satisfactory, so I wrote my own function. All examples below use the cystfibr data set from the ISwR package.

Using image()

My first approach was to use the image() function directly. By default, image() uses the colors generated by heat.colors(). The result is that correlations of 1.0 are plotted in white, correlations of 0.0 are plotted in orange, and correlations of –1.0 are plotted in red. These colors make it difficult to contrast positive and negative correlations. Furthermore, the colors are not drawn in the same order as the text output from the cor() function; the diagonal of 1's runs from the lower left to the upper right instead of from the upper left to the lower right.

##  Load the data.

library( package = ISwR )
data( cystfibr )

##  View the correlation matrix.

old.digits = getOption( "digits" )
options( digits = 2 )
cor( cystfibr, method = "spearman" )
options( digits = old.digits )

##  Use image() to create the color plot of the correlation matrix.

get( getOption( "device" ) )()

image(
    z    = cor( x = cystfibr, method = "spearman" ),
    axes = FALSE,
    zlim = c( -1.0, 1.0 ) )

axis(
    side     = 1,
    labels   = names( cystfibr ),
    at       = seq( 0, 1, length = length( names( cystfibr ) ) ),
    cex.axis = 0.8 )

axis(
    side     = 2,
    labels   = names( cystfibr ),
    at       = seq( 0, 1, length = length( names( cystfibr ) ) ),
    cex.axis = 0.8 )

box()

The text output of the correlation matrix is:

> cor( cystfibr, method = "spearman" )
         age    sex height weight   bmp  fev1    rv   frc    tlc pemax
age     1.00 -0.163   0.93   0.90  0.51  0.30 -0.58 -0.72 -0.493  0.52
sex    -0.16  1.000  -0.23  -0.20 -0.15 -0.54  0.26  0.15  0.056 -0.26
height  0.93 -0.235   1.00   0.96  0.57  0.43 -0.62 -0.66 -0.473  0.59
weight  0.90 -0.201   0.96   1.00  0.73  0.46 -0.70 -0.67 -0.485  0.49
bmp     0.51 -0.146   0.57   0.73  1.00  0.56 -0.69 -0.55 -0.494  0.22
fev1    0.30 -0.542   0.43   0.46  0.56  1.00 -0.68 -0.60 -0.440  0.31
rv     -0.58  0.257  -0.62  -0.70 -0.69 -0.68  1.00  0.85  0.589 -0.31
frc    -0.72  0.151  -0.66  -0.67 -0.55 -0.60  0.85  1.00  0.672 -0.38
tlc    -0.49  0.056  -0.47  -0.48 -0.49 -0.44  0.59  0.67  1.000 -0.15
pemax   0.52 -0.258   0.59   0.49  0.22  0.31 -0.31 -0.38 -0.148  1.00

The graphical output produced by image() is:

Correlation of variables from cystfibr data

Correlation of variables from cystfibr data

Using plot.cor() from package sma

I used RSiteSearch() to find a better solution and hit upon the function plot.cor() from the sma package, written by Sandrine Dudoit. This function provides a good solution for my problem; it generates a color image of the correlation matrix, with red for a correlation of 1.0, black for a correlation of 0.0, and green for a correlation of –1.0. The colors are drawn in the same order as the text output, with the diagonal of 1's running from the upper left to the lower right.

##  Load the data.

library( package = ISwR )
data( cystfibr )

##  Use plot.cor() to create a color plot of the correlation matrix.

library( package = sma )

##  Open a default device.
##  Adjust the character size to be more readable for the quartz() device for
##  Mac OS X.

get( getOption( "device" ) )()
par( cex = 1.2 )

##  Create the plot.

plot.cor(
    x      = cor( cystfibr, method = "spearman" ),
    new    = FALSE,
    labels = names( cystfibr ),
    zlim   = c( -1.0, 1.0 ) )

The output is:

Correlation of variables from cystfibr data

Correlation of variables from cystfibr data

This plot.cor() function provides most of what I desire, and I learned a lot by reading the source code. However, I desired two additional features:

Using plotCorrelation()

The code below generates a color image of the correlation matrix using a range of colors from blue through black to yellow, with blue = 1.0, black = 0.0, and yellow = –1.0. These colors are good for people with red-green color blindness, and they make the correlations easier to identify.

When plotting a matrix using image(), the x coordinates come from the column numbers of the matrix and the y coordinates come from the row numbers of the matrix. So the value of corr[1,1] is drawn at coordinates (1,1) of the image (the lower left corner), corr[2,1] is drawn at coordinates (1,2), and corr[10,1] is drawn at coordinates (1,10) of the image (the upper left corner). Imagine that the matrix is plotted to an image, but the image is rotated 90 degrees counterclockwise.

Therefore, in order to draw the image in the same order as the values in the matrix (with the diagonal of 1’s running from the upper left to the lower right), it is necessary to reverse the order of the columns of the matrix before drawing.

##  Load the data.

library( package = ISwR )
data( cystfibr )

##  Start a default device.

get( getOption( "device" ) )()

##  Create the colors. The blue colors are designed to be lighter than pure
##  blue by adding equal amounts of red and green.

red <-
    c(
        seq( from = 1.0,  to = 0.0, by = -0.1 ),
        seq( from = 0.05, to = 0.5, length = 10 ) )

green <- red

blue <-
    c(
        rep( x = 0.0, times = 10 ),
        seq( from = 0.0, to = 1.0, by = 0.1 ) )

colors <- rgb( red = red, green = green, blue = blue )

##  Create the Spearman correlation matrix.

corr <- cor( x = cystfibr, method = "spearman" )

##  Reverse the columns of the matrix so it will be drawn correctly.

n = ncol( corr )
corr2 <- corr[ , n:1 ]

##  Create the image.

image(
    z    = corr2,
    axes = FALSE,
    col  = colors,
    zlim = c( -1.0, 1.0 ) )

##  Add labels for the y axis.

axis(
    side     = 2,
    labels   = colnames( corr2 ),
    at       = seq( 0, 1, length = length( rownames( corr2 ) ) ),
    cex.axis = 0.8,
    las      = 2)

##  Add labels for the x axis, but along the top.

axis(
    side     = 3,
    labels   = rownames( corr2 ),
    at       = seq( 0, 1, length = length( colnames( corr2 ) ) ),
    cex.axis = 0.8,
    las      = 2 )
Correlation of variables from cystfibr data

Correlation of variables from cystfibr data