Beginning Perl Lesson 12

Table of Contents

Using Perl’s pseudorandom number generator

For the scripts we're going to examine in this lesson, we need some random data. Computers do not generate truly random numbers, so the resulting numbers are called pseudorandom numbers. However, a good algorithm will generate pseudorandom numbers that meet the tests for random numbers.

Perl’s pseudorandom number generator is available through the rand() function. The rand() function returns pseudorandom decimal values from 0 to 1, including 0 but excluding 1.

    $value = rand();

If you give rand() a positive numeric argument, then rand() will return pseudorandom decimal values from 0 to the given number.

    $value = rand( 10 );

Here is a script that generates 1,000 pseudorandom numbers between 0 and 10.

#!/usr/bin/perl
#
#   createRandomData.pl
#   22-Jan-2002
#
#   Conrad Halling
#   conrad.halling@sphaerula.com
#
#   This script uses Perl's pseudorandom number generator, rand(), to create
#   a text file named randomData.txt that contains 1,000 pseudorandom numbers
#   in the interval 0 to 10.
#
#   Use: perl createRandomData.pl

use warnings;

    my( $fileName );

    #   The name of the output file is hard-coded into the script.
    #   Open the file.

    $fileName = 'randomData.txt';
    if ( open( DATAFILE, ">$fileName" ) )
    {
        #   Use a simple foreach loop that will loop 1,000 times.

        foreach ( 1 .. 1000 )
        {
            my( $value );

            #   Get a pseudorandom value between 0 and 10, including 0 but
            #   excluding 10.

            $value = rand( 10 );

            #   Print the value, with 4 decimal places of precision,
            #   to the output file.

            printf( DATAFILE "%.4f\n", $value );
        }
        close( DATAFILE );
    }

    else
    {
        #   The script ends up here if the output file couldn't be opened.
        #   die() with an error message.

        die( "\n  Can't open $fileName for writing: $!.\n\n" );
    }

Subroutines

A subroutine acts like a mini-script inside a script. A subroutine can have its own arguments and its own variables. A subroutine can do anything a script can do. A subroutine can return one or more values to whatever part of your script that called the subroutine.

Writing subroutines is a good way of organizing the code in your script. You want to write a subroutine when you have clearly identified a portion of your script that does one thing in a way that you can isolate from the rest of your script.

Once you have written a subroutine, you have a functionally independent piece of code that is easy to copy and paste into different scripts. It’s also possible to store a subroutine in a Perl module and then call it from any script; we’ll learn how to do that in the next lesson.

Using a subroutine is often termed calling a subroutine. The piece of code that calls the subroutine is often termed the caller.

Variables inside subroutines

We create variables in subroutines just like we have before, using the my() function. The important point here is that variables declared using the my() function cannot be seen by any other code in your script.

Returning results from a subroutine

A subroutine can return nothing (technically, an undefined value), a single value (a scalar), or a list of values (an array). Values are returned using a return statement.

Returning nothing

Here’s a subroutine that returns nothing when it’s called:

    sub printHello
    {
        print( STDOUT "Hello, world!\n" );
        return;
    }

You would use the subroutine like this:

    printHello();

Returning a scalar

Here’s a subroutine that returns a single value (a scalar), the word 'red', when it’s called:

    sub getRed
    {
        my( $result );

        $result = 'red';
        return $result;
    }

You would use the subroutine like this:

    my( $color );

    $color = getRed();

Returning an array

A Perl subroutine can also return many results in an array. Here’s a subroutine that returns an array containing the words 'yellow', 'green', and 'blue'.

    sub getColors
    {
        my( @results );

        @results = ( 'yellow', 'green', 'blue' );
        return @results;
    }

You would use the subroutine like this:

    my( @colors );

    @colors = getColors();

Returning a hash

A Perl subroutine can return a hash, but only in an indirect way, since the hash gets converted into an array before the values are returned. Here’s a subroutine that returns a hash:

    sub getVegetableColors
    {
        my( %vegetableColors );

        %vegetableColors = (
                celery   => 'green',
                tomato   => 'red',
                potato   => 'brown',
                eggplant => 'purple' );

        return %vegetableColors;
    }

What you get back is an array that contains 'celery', 'green', 'tomato', 'red', 'potato', 'brown', 'eggplant', and 'purple'. Fortunately, since you can take an array and turn it into a hash, you’re OK. You would call this function like this:

    my( %veggieColors );

    %veggieColors = getVegetableColors();

Returning mixed variables

You can even return a mix of scalars, arrays, and hashes, as shown in the subroutine GetMoreColors() below, but there’s a side-effect: all returned values get merged into a single array before they’re returned. Unless you know exactly how many items you’ll get back, you won’t know how to sort them back into their original scalar, array, and hash variables.

    sub getMoreColors
    {
        my( $singleColor );
        my( @twoColors );
        my( %vegColors );

        $singleColor = 'magenta';
        @twoColors = ( 'black', 'orange' );
        %vegColors = ( tomato => 'red', potato => 'brown' );

        return ( $singleColor, @twoColors, %vegColors );
    }

When getMoreColors() returns its values, they all get merged into a single array of mixed-up values, 'magenta', 'black', 'orange', 'tomato', 'red', 'potato', and 'brown'.

So what if you want to return a hash, an array, and a scalar? Perl will merge all of these into one array and return that, so it can’t be done in a straightforward way. Next week we’ll talk about references and how you can use references to return complicated mixtures of variable types from a subroutine.

A combined example

Let’s combine all these little examples into one script, and test things to make sure they behave as expected.

#!/usr/bin/perl
#
#   testSubroutines.pl
#   21-Jan-2002
#
#   Conrad Halling
#   conrad.halling@sphaerula.com
#
#   This demonstration script shows how to call subroutines and get and use
#   their return values.

use warnings;

    my( $color );
    my( @colors );
    my( $veggie );
    my( %veggieColors );

    print( STDOUT "\n" );

    print( STDOUT "Results from getRed():\n" );
    $color = GetRed();
    print( STDOUT "  $color\n" );
    print( STDOUT "\n" );

    print( STDOUT "Results from getColors():\n" );
    @colors = getColors();
    foreach $color ( @colors )
    {
        print( STDOUT "  $color\n" );
    }
    print( STDOUT "\n" );

    print( STDOUT "Results from getVeggieColors():\n" );
    %veggieColors = getVegetableColors();
    foreach $veggie ( keys( %veggieColors ) )
    {
        print( STDOUT "  $veggie => $veggieColors{ $veggie }\n" );
    }
    print( STDOUT "\n" );

    print( STDOUT "Results from getMoreColors():\n" );
    @colors = getMoreColors();
    foreach $color ( @colors )
    {
        print( STDOUT "  $color\n" );
    }
    print( STDOUT "\n" );

#   getRed()
#       return a scalar

sub getRed
{
    my( $result );

    $result = 'red';
    return $result;
}


#   getColors()
#       return an array

sub getColors
{
    my( @results );

    @results = ( 'yellow', 'green', 'blue' );
    return @results;
}


#   getVegetableColors
#       return a hash

sub getVegetableColors
{
    my( %vegetableColors );

    %vegetableColors = (
        celery   => 'green',
        tomato   => 'red',
        potato   => 'brown',
        eggplant => 'purple' );

    return %vegetableColors;
}


#   getMoreColors()
#       return a scalar, an array, and a hash; all get combined into a single array

sub getMoreColors
{
    my( $singleColor );
    my( @twoColors );
    my( %vegColors );

    $singleColor = 'magenta';
    @twoColors = ( 'black', 'orange' );
    %vegColors = ( tomato => 'red', potato => 'brown' );

    return ( $singleColor, @twoColors, %vegColors );
}

The results look like this:

Results from getRed():
  red

Results from getColors():
  yellow
  green
  blue

Results from getVeggieColors():
  eggplant => purple
  potato => brown
  celery => green
  tomato => red

Results from getMoreColors():
  magenta
  black
  orange
  potato
  brown
  tomato
  red

Getting arguments from the @_ variable

A subroutine is like a script in that it can receive arguments. As you’ll remember from the last lesson, a script gets its arguments from Perl’s special @ARGV array variable.

A subroutine gets its arguments from a different special array variable, @_.

Here’s a script that divides one number by another using a subroutine. We pass the dividend and the divisor to the subroutine. The subroutine gets these arguments in the @_ variable, and copies the values to its own variables. The subroutine then carries out the division and returns the result, the quotient.

#!/usr/bin/perl
#
#   divide1.pl
#   21-Jan-2002
#
#   Conrad Halling
#   conrad.halling@sphaerula.com

use warnings;

    my( $result );

    $result = divide( 60, 12 );
    print( "$result\n" );

sub divide
{
    my( $dividend );
    my( $divisor );
    my( $quotient );

    ( $dividend, $divisor ) = @_;
    $quotient = $dividend / $divisor;
    return $quotient;
}

There are many styles for getting the arguments in a subroutine. In the style used above,

    ( $dividend, $divisor ) = @_;

we copy the contents of the array into individual scalar variables. When we do this, we have to put the parentheses around the list of scalar variables.

Sometimes, you’ll use the array directly. When you do this, you don’t need to copy the array into another variable.

sub enumerateArray
{
    my( $item );

    foreach $item ( @_ )
    {
        print( STDOUT "$item\n" );
    }
    return;
}

In this example, we use a foreach loop to look at each item in the array.

Finally, you can refer to each item in the array using its index number. Since the array is named @_, the individual items are named $_[ 0 ], $_[ 1 ], etc. We could greatly shorten the divide() subroutine we used above by using the array items directly.

#!/usr/bin/perl
#
#   divide2.pl
#   21-Jan-2002
#
#   Conrad Halling
#   conrad.halling@sphaerula.com

use warnings;

    my( $result );

    $result = divide( 60, 12 );
    print( "$result\n" );

sub divide
{
    my( $quotient );

    $quotient = $_[ 0 ] / $_[ 1 ];
    return $quotient;
}

You’ll notice that this code isn’t nearly as readable, and you’d have to add comments to explain what the code is doing.

We can shorten the code even more by removing the $quotient variable from the subroutine.

#!/usr/bin/perl
#
#   divide3.pl
#   21-Jan-2002
#
#   Conrad Halling
#   conrad.halling@sphaerula.com

use warnings;

    my( $result );

    $result = divide( 60, 12 );
    print( "$result\n" );

sub divide
{
    return $_[ 0 ] / $_[ 1 ];
}

This is as brief (and as obscure) as it gets.

Revised structure of scripts

Now that we’re going to use subroutines in our scripts, we’re going to revise the structure of our scripts. This revised structure gives us more control over our variables and helps Perl warn us about things like typographical errors when using variable names in our scripts.

Adding the use strict; line

The first change we’re going to make is that we’ll always include the line

use strict;

near the top of our code, like this:

use strict;
use warnings;

This line forces us to declare all of our variables and allows Perl to catch typographical errors that might confuse Perl. This is important as our scripts become more complex.

Marking off main code

The second change we’ll make is that we’ll mark off the main part of our script with a label, MAIN:, and braces (‘{’ and ‘}’) around the code. The label is a convenience for anyone reading the code; it simply marks where the execution of the script will begin. The braces make sure that any variables we declare in our main section of code are invisible to our subroutines. This looks like this:

MAIN:
{
    print( "Hello, world!\n" );
}

Placing subroutines after main code

Finally, we’ll put our subroutines below the main section of our code.

An example

When we restructure the divide1.pl script according to these guidelines, we get the script below:

#!/usr/bin/perl
#
#   divide4.pl
#   21-Jan-2002
#
#   Conrad Halling
#   conrad.halling@sphaerula.com

use strict;
use warnings;

MAIN:
{
    my( $result );

    $result = divide( 60, 12 );
    print( "$result\n" );
}

sub divide
{
    my( $dividend );
    my( $divisor );
    my( $quotient );

    ( $dividend, $divisor ) = @_;
    $quotient = $dividend / $divisor;
    return $quotient;
}

From now on, we'll organize our scripts in this order:

Summary example script

The summary example script reads numeric values from a text file and calculates the mean and standard deviation of the values. The script contains three subroutines that separate the script’s functionality into discrete units. The first subroutine gets the values from the file and puts them into an array variable. The second subroutine takes the array and calculates the mean. The third subroutine takes the array and calculates the standard deviation.

Because we’ve used three subroutines, the main part of our script is very short. Also, we’ve used descriptive names for our subroutines, so the code is self-documenting (meaning we can understand what’s going on by just reading the code, including variable and subroutine names, without having to read the comments).

As you may remember from your statistics classes, the equation for calculating the mean of a set of values is:

equation for calculating the mean

where x bar is the sample mean, x sub i is each value, and n is the number of values. This formula has been converted into the subroutine calculateMean() in the example script.

The formula for calculating the standard deviation of a sample is:

usual equation for calculating the standard deviation

where s is the standard deviation, x sub i is each value, x bar is the mean, and n is the number of values. But in order to use this formula, you have to have already calculated the mean.

It turns out that there’s an alternative formula for the standard deviation that’s easier for a calculator or computer to use. This formula doesn’t require that the mean be calculated first:

equation for calculating the standard deviation

where x sub i is each value and n is the number of values. This latter formula has been converted into the subroutine calculateStdDev() in the script.

In order to run this script, you’ll need this data file, which contains 1,000 random numbers. Or you can create your own data file using the createRandomData.pl script given above.

#!/usr/bin/perl
#
#   simpleStats1.pl
#   22-Jan-2002
#
#   Conrad Halling
#   conrad.halling@sphaerula.com
#
#   This script reads a set of numeric values from a text file, where each
#   value is on its own line, and calculates the mean and standard deviation
#   of the values.
#
#           Use: perl simpleStats1.pl dataFile
#       Example: perl simpleStats1.pl randomData.txt

use strict;
use warnings;

MAIN:
{
    my( $mean );
    my( $stddev );
    my( @values );

    #   Check arguments.
    #   The script requires the name of the file containing the data.

    if ( 1 != @ARGV )
    {
        die( "\n  Use: perl $0 dataFile\n\n" );
    }

    #   Get the data from the data file.

    @values = getValuesFromFile( $ARGV[ 0 ] );

    #   Calculate the mean and the standard deviation.

    $mean = calculateMean( @values );
    $stddev = calculateStdDev( @values );

    #   Display the results to four decimal points of precision.

    printf( STDOUT
        "mean = %.4f\nstandard deviation = %.4f\n",
        $mean,
        $stddev );
}


#   getValuesFromFile()
#
#   This subroutine opens the given file, reads the values from the lines of
#   the file, and returns an array containing the values.

sub getValuesFromFile
{
    my( $fileName );    #   name of the data file
    my( $dataLine );    #   the contents of each line read from the file
    my( $result );      #   the result of the open() function
    my( @values );      #   the array of values we get from the data file

    #   Initialize the variable from the argument array.

    ( $fileName ) = @_;

    #   Open the input file and check for errors.

    $result = open( DATAFILE, "<$fileName" );
    if ( ! $result )
    {
        die( "\n  Can't open file $fileName for reading: $!.\n\n" );
    }

    #   Read the data from the file.
    #   Push each value onto the array.
    #   No format checking is done here other than skipping blank lines.

    while ( defined( $dataLine = <DATAFILE> ) )
    {
        next if ( $dataLine =~ m/^\s+$/ );

        chomp( $dataLine );
        push( @values, $dataLine );
    }

    #   Close the input file.

    close( DATAFILE );

    #   Return the array.

    return @values;
}


#   calculateMean()
#
#   Given an array of values, calculate and return the mean.

sub calculateMean
{
    my( $mean );   #   the mean of the values, calculated by this subroutine
    my( $n );      #   the number of values
    my( $sum );    #   the sum of the values
    my( $x );      #   set to each value

    #   Initialize variables.

    $n = 0;
    $sum = 0;

    #   Use a foreach loop to get each value from the array @_, which
    #   is the argument array and which contains the values.

    foreach $x ( @_ )
    {
        #   Add each value to the growing sum.

        $sum += $x;

        #   Increment the count of the number of values.

        $n++;
    }

    #   Calculate the mean from the sum and the number of values.

    $mean = $sum / $n;

    #   Return the value of the mean.

    return $mean;
}


#   calculateStdDev()
#
#   Given an array of values, calculate and return the standard deviation
#   of the values.
#
#   The easy way to compute the standard deviation on a calculator or computer
#   is to use the following formulas:
#
#        variance = ( sumOfSquares - ( ( sum * sum ) / n ) ) / ( n - 1 ) )
#       std. dev. = sqrt( variance )
#
#   where sumOfSquares is the sum of the squares of the values, and sum
#   is the sum of the values.

sub calculateStdDev
{
    my( $n );               #   the number of values
    my( $stddev );          #   the standard deviation, calculated by this
                            #     subroutine, from the variance
    my( $sum );             #   the sum of the values
    my( $sumOfSquares );    #   the sum of the squares of the values
    my( $variance );        #   the variance, calculated by this subroutine
    my( $x );               #   set to each value

    #   Initialize variables.

    $n = 0;
    $sum = 0;
    $sumOfSquares = 0;

    #   Use a foreach loop to get each value from the array @_, which
    #   is the argument array and which contains the values.

    foreach $x ( @_ )
    {
        #   Add the value to the growing sum.

        $sum += $x;

        #   Increment $n, the number of values.

        $n++;

        #   Add the square of the value to the growing sum of the squares.

        $sumOfSquares += $x * $x;
    }

    #   Calculate the variance.

    $variance = ( $sumOfSquares - ( ( $sum * $sum ) / $n ) ) / ( $n - 1 );

    #   Calculate the standard deviation, which is the square root of the
    #   variance.

    $stddev = sqrt( $variance );

    #   Return the calculated standard deviation.

    return $stddev;
}

If you run this script with the provided data file, randomData.txt, you should get the following output:

> perl simpleStats1.pl randomData.txt
mean = 4.9238
standard deviation = 2.8450

If you create your own data file using the createRandomData.pl script, the output will be different. Statistical theory predicts that the sample mean should be exactly 5.0000 and the sample standard deviation about 2.8868. Here are the results I got when I ran the createRandomData.pl and the simpleStats1.pl scripts three separate times:

> perl createRandomData.pl
> perl simpleStats1.pl randomData.txt
mean = 5.1585
standard deviation = 2.8836
> perl createRandomData.pl
> perl simpleStats1.pl randomData.txt
mean = 5.0226
standard deviation = 2.8925
> perl createRandomData.pl
> perl simpleStats1.pl randomData.txt
mean = 4.9792
standard deviation = 2.8496

Homework Assignment

Using the divide1.pl script above as a starting point, write a script containing subroutines that perform multiplication, division, addition, and subtraction. Use each subroutine in your script.