Using Perl’s pseudorandom number generator
For the scripts we're going to examine in this lesson, we need some random data. Computers do not generate truly random numbers, so the resulting numbers are called pseudorandom numbers. However, a good algorithm will generate pseudorandom numbers that meet the tests for random numbers.
Perl’s pseudorandom number generator is available through the rand() function.
The rand() function returns pseudorandom decimal values from 0 to 1, including
0 but excluding 1.
$value = rand();
If you give rand() a positive numeric argument, then
rand() will return pseudorandom decimal values from 0 to the given
number.
$value = rand( 10 );
Here is a script that generates 1,000 pseudorandom numbers between 0 and 10.
#!/usr/bin/perl
#
# createRandomData.pl
# 22-Jan-2002
#
# Conrad Halling
# conrad.halling@sphaerula.com
#
# This script uses Perl's pseudorandom number generator, rand(), to create
# a text file named randomData.txt that contains 1,000 pseudorandom numbers
# in the interval 0 to 10.
#
# Use: perl createRandomData.pl
use warnings;
my( $fileName );
# The name of the output file is hard-coded into the script.
# Open the file.
$fileName = 'randomData.txt';
if ( open( DATAFILE, ">$fileName" ) )
{
# Use a simple foreach loop that will loop 1,000 times.
foreach ( 1 .. 1000 )
{
my( $value );
# Get a pseudorandom value between 0 and 10, including 0 but
# excluding 10.
$value = rand( 10 );
# Print the value, with 4 decimal places of precision,
# to the output file.
printf( DATAFILE "%.4f\n", $value );
}
close( DATAFILE );
}
else
{
# The script ends up here if the output file couldn't be opened.
# die() with an error message.
die( "\n Can't open $fileName for writing: $!.\n\n" );
}
Subroutines
A subroutine acts like a mini-script inside a script. A subroutine can have its own arguments and its own variables. A subroutine can do anything a script can do. A subroutine can return one or more values to whatever part of your script that called the subroutine.
Writing subroutines is a good way of organizing the code in your script. You want to write a subroutine when you have clearly identified a portion of your script that does one thing in a way that you can isolate from the rest of your script.
Once you have written a subroutine, you have a functionally independent piece of code that is easy to copy and paste into different scripts. It’s also possible to store a subroutine in a Perl module and then call it from any script; we’ll learn how to do that in the next lesson.
Using a subroutine is often termed calling a subroutine. The piece of code that calls the subroutine is often termed the caller.
Variables inside subroutines
We create variables in subroutines just like we have before, using the
my() function. The important point here is that variables declared
using the my() function cannot be seen by any other code in your
script.
Returning results from a subroutine
A subroutine can return nothing (technically, an undefined value), a single
value (a scalar), or a list of values (an array). Values are returned using a
return statement.
Returning nothing
Here’s a subroutine that returns nothing when it’s called:
sub printHello
{
print( STDOUT "Hello, world!\n" );
return;
}
You would use the subroutine like this:
printHello();
Returning a scalar
Here’s a subroutine that returns a single value (a scalar), the word
'red', when it’s called:
sub getRed
{
my( $result );
$result = 'red';
return $result;
}
You would use the subroutine like this:
my( $color );
$color = getRed();
Returning an array
A Perl subroutine can also return many results in an array. Here’s a subroutine that returns an array containing the words 'yellow', 'green', and 'blue'.
sub getColors
{
my( @results );
@results = ( 'yellow', 'green', 'blue' );
return @results;
}
You would use the subroutine like this:
my( @colors );
@colors = getColors();
Returning a hash
A Perl subroutine can return a hash, but only in an indirect way, since the hash gets converted into an array before the values are returned. Here’s a subroutine that returns a hash:
sub getVegetableColors
{
my( %vegetableColors );
%vegetableColors = (
celery => 'green',
tomato => 'red',
potato => 'brown',
eggplant => 'purple' );
return %vegetableColors;
}
What you get back is an array that contains 'celery', 'green', 'tomato', 'red', 'potato', 'brown', 'eggplant', and 'purple'. Fortunately, since you can take an array and turn it into a hash, you’re OK. You would call this function like this:
my( %veggieColors );
%veggieColors = getVegetableColors();
Returning mixed variables
You can even return a mix of scalars, arrays, and hashes, as shown in the
subroutine GetMoreColors() below, but there’s a side-effect:
all returned values get merged into a single array before they’re
returned. Unless you know exactly how many items you’ll get back, you
won’t know how to sort them back into their original scalar, array, and
hash variables.
sub getMoreColors
{
my( $singleColor );
my( @twoColors );
my( %vegColors );
$singleColor = 'magenta';
@twoColors = ( 'black', 'orange' );
%vegColors = ( tomato => 'red', potato => 'brown' );
return ( $singleColor, @twoColors, %vegColors );
}
When getMoreColors() returns its values, they all get merged into a single
array of mixed-up values, 'magenta', 'black', 'orange',
'tomato', 'red', 'potato', and 'brown'.
So what if you want to return a hash, an array, and a scalar? Perl will merge all of these into one array and return that, so it can’t be done in a straightforward way. Next week we’ll talk about references and how you can use references to return complicated mixtures of variable types from a subroutine.
A combined example
Let’s combine all these little examples into one script, and test things to make sure they behave as expected.
#!/usr/bin/perl
#
# testSubroutines.pl
# 21-Jan-2002
#
# Conrad Halling
# conrad.halling@sphaerula.com
#
# This demonstration script shows how to call subroutines and get and use
# their return values.
use warnings;
my( $color );
my( @colors );
my( $veggie );
my( %veggieColors );
print( STDOUT "\n" );
print( STDOUT "Results from getRed():\n" );
$color = GetRed();
print( STDOUT " $color\n" );
print( STDOUT "\n" );
print( STDOUT "Results from getColors():\n" );
@colors = getColors();
foreach $color ( @colors )
{
print( STDOUT " $color\n" );
}
print( STDOUT "\n" );
print( STDOUT "Results from getVeggieColors():\n" );
%veggieColors = getVegetableColors();
foreach $veggie ( keys( %veggieColors ) )
{
print( STDOUT " $veggie => $veggieColors{ $veggie }\n" );
}
print( STDOUT "\n" );
print( STDOUT "Results from getMoreColors():\n" );
@colors = getMoreColors();
foreach $color ( @colors )
{
print( STDOUT " $color\n" );
}
print( STDOUT "\n" );
# getRed()
# return a scalar
sub getRed
{
my( $result );
$result = 'red';
return $result;
}
# getColors()
# return an array
sub getColors
{
my( @results );
@results = ( 'yellow', 'green', 'blue' );
return @results;
}
# getVegetableColors
# return a hash
sub getVegetableColors
{
my( %vegetableColors );
%vegetableColors = (
celery => 'green',
tomato => 'red',
potato => 'brown',
eggplant => 'purple' );
return %vegetableColors;
}
# getMoreColors()
# return a scalar, an array, and a hash; all get combined into a single array
sub getMoreColors
{
my( $singleColor );
my( @twoColors );
my( %vegColors );
$singleColor = 'magenta';
@twoColors = ( 'black', 'orange' );
%vegColors = ( tomato => 'red', potato => 'brown' );
return ( $singleColor, @twoColors, %vegColors );
}
The results look like this:
Results from getRed(): red Results from getColors(): yellow green blue Results from getVeggieColors(): eggplant => purple potato => brown celery => green tomato => red Results from getMoreColors(): magenta black orange potato brown tomato red
Getting arguments from the @_ variable
A subroutine is like a script in that it can receive arguments. As you’ll
remember from the last lesson, a script gets its
arguments from Perl’s special @ARGV array variable.
A subroutine gets its arguments from a different special array variable,
@_.
Here’s a script that divides one number by another using a subroutine.
We pass the dividend and the divisor to the subroutine. The subroutine
gets these arguments in the @_ variable, and copies the
values to its own variables. The subroutine then carries out the division
and returns the result, the quotient.
#!/usr/bin/perl
#
# divide1.pl
# 21-Jan-2002
#
# Conrad Halling
# conrad.halling@sphaerula.com
use warnings;
my( $result );
$result = divide( 60, 12 );
print( "$result\n" );
sub divide
{
my( $dividend );
my( $divisor );
my( $quotient );
( $dividend, $divisor ) = @_;
$quotient = $dividend / $divisor;
return $quotient;
}
There are many styles for getting the arguments in a subroutine. In the style used above,
( $dividend, $divisor ) = @_;
we copy the contents of the array into individual scalar variables. When we do this, we have to put the parentheses around the list of scalar variables.
Sometimes, you’ll use the array directly. When you do this, you don’t need to copy the array into another variable.
sub enumerateArray
{
my( $item );
foreach $item ( @_ )
{
print( STDOUT "$item\n" );
}
return;
}
In this example, we use a foreach loop to look at each
item in the array.
Finally, you can refer to each item in the array using its index number.
Since the array is named @_, the individual items are
named $_[ 0 ], $_[ 1 ], etc. We could
greatly shorten the divide() subroutine we used above
by using the array items directly.
#!/usr/bin/perl
#
# divide2.pl
# 21-Jan-2002
#
# Conrad Halling
# conrad.halling@sphaerula.com
use warnings;
my( $result );
$result = divide( 60, 12 );
print( "$result\n" );
sub divide
{
my( $quotient );
$quotient = $_[ 0 ] / $_[ 1 ];
return $quotient;
}
You’ll notice that this code isn’t nearly as readable, and you’d have to add comments to explain what the code is doing.
We can shorten the code even more by removing the $quotient
variable from the subroutine.
#!/usr/bin/perl
#
# divide3.pl
# 21-Jan-2002
#
# Conrad Halling
# conrad.halling@sphaerula.com
use warnings;
my( $result );
$result = divide( 60, 12 );
print( "$result\n" );
sub divide
{
return $_[ 0 ] / $_[ 1 ];
}
This is as brief (and as obscure) as it gets.
Revised structure of scripts
Now that we’re going to use subroutines in our scripts, we’re going to revise the structure of our scripts. This revised structure gives us more control over our variables and helps Perl warn us about things like typographical errors when using variable names in our scripts.
Adding the use strict; line
The first change we’re going to make is that we’ll always include the line
use strict;
near the top of our code, like this:
use strict; use warnings;
This line forces us to declare all of our variables and allows Perl to catch typographical errors that might confuse Perl. This is important as our scripts become more complex.
Marking off main code
The second change we’ll make is that we’ll mark off the main part of
our script with a label, MAIN:, and braces (‘{’ and
‘}’) around the code. The label is a convenience for anyone reading the
code; it simply marks where the execution of the script will begin. The braces
make sure that any variables we declare in our main section of code are
invisible to our subroutines. This looks like this:
MAIN:
{
print( "Hello, world!\n" );
}
Placing subroutines after main code
Finally, we’ll put our subroutines below the main section of our code.
An example
When we restructure the divide1.pl script according to these guidelines,
we get the script below:
#!/usr/bin/perl
#
# divide4.pl
# 21-Jan-2002
#
# Conrad Halling
# conrad.halling@sphaerula.com
use strict;
use warnings;
MAIN:
{
my( $result );
$result = divide( 60, 12 );
print( "$result\n" );
}
sub divide
{
my( $dividend );
my( $divisor );
my( $quotient );
( $dividend, $divisor ) = @_;
$quotient = $dividend / $divisor;
return $quotient;
}
From now on, we'll organize our scripts in this order:
-
the
#!(shebang) line - a comment containing the name of the script
- a comment containing the date the script was written or last modified
- a comment containing the name and address of the author of the script
-
the
use strict;line -
the
use warnings;line, which helps Perl warn us when we’re doing something not quite right -
the
MAIN:label and curly braces, which set off the main part of our script - our subroutines
Summary example script
The summary example script reads numeric values from a text file and calculates the mean and standard deviation of the values. The script contains three subroutines that separate the script’s functionality into discrete units. The first subroutine gets the values from the file and puts them into an array variable. The second subroutine takes the array and calculates the mean. The third subroutine takes the array and calculates the standard deviation.
Because we’ve used three subroutines, the main part of our script is very short. Also, we’ve used descriptive names for our subroutines, so the code is self-documenting (meaning we can understand what’s going on by just reading the code, including variable and subroutine names, without having to read the comments).
As you may remember from your statistics classes, the equation for calculating the mean of a set of values is:
where x bar is the sample mean, x sub i is each value, and
n is the number of values. This formula has been converted into the
subroutine calculateMean() in the example script.
The formula for calculating the standard deviation of a sample is:
where s is the standard deviation, x sub i is each value, x bar is the mean, and n is the number of values. But in order to use this formula, you have to have already calculated the mean.
It turns out that there’s an alternative formula for the standard deviation that’s easier for a calculator or computer to use. This formula doesn’t require that the mean be calculated first:
where x sub i is each value and n is the number of values.
This latter formula has been converted into the subroutine
calculateStdDev() in the script.
In order to run this script, you’ll need this data file, which contains 1,000 random numbers. Or you can create your own data file using the createRandomData.pl script given above.
#!/usr/bin/perl
#
# simpleStats1.pl
# 22-Jan-2002
#
# Conrad Halling
# conrad.halling@sphaerula.com
#
# This script reads a set of numeric values from a text file, where each
# value is on its own line, and calculates the mean and standard deviation
# of the values.
#
# Use: perl simpleStats1.pl dataFile
# Example: perl simpleStats1.pl randomData.txt
use strict;
use warnings;
MAIN:
{
my( $mean );
my( $stddev );
my( @values );
# Check arguments.
# The script requires the name of the file containing the data.
if ( 1 != @ARGV )
{
die( "\n Use: perl $0 dataFile\n\n" );
}
# Get the data from the data file.
@values = getValuesFromFile( $ARGV[ 0 ] );
# Calculate the mean and the standard deviation.
$mean = calculateMean( @values );
$stddev = calculateStdDev( @values );
# Display the results to four decimal points of precision.
printf( STDOUT
"mean = %.4f\nstandard deviation = %.4f\n",
$mean,
$stddev );
}
# getValuesFromFile()
#
# This subroutine opens the given file, reads the values from the lines of
# the file, and returns an array containing the values.
sub getValuesFromFile
{
my( $fileName ); # name of the data file
my( $dataLine ); # the contents of each line read from the file
my( $result ); # the result of the open() function
my( @values ); # the array of values we get from the data file
# Initialize the variable from the argument array.
( $fileName ) = @_;
# Open the input file and check for errors.
$result = open( DATAFILE, "<$fileName" );
if ( ! $result )
{
die( "\n Can't open file $fileName for reading: $!.\n\n" );
}
# Read the data from the file.
# Push each value onto the array.
# No format checking is done here other than skipping blank lines.
while ( defined( $dataLine = <DATAFILE> ) )
{
next if ( $dataLine =~ m/^\s+$/ );
chomp( $dataLine );
push( @values, $dataLine );
}
# Close the input file.
close( DATAFILE );
# Return the array.
return @values;
}
# calculateMean()
#
# Given an array of values, calculate and return the mean.
sub calculateMean
{
my( $mean ); # the mean of the values, calculated by this subroutine
my( $n ); # the number of values
my( $sum ); # the sum of the values
my( $x ); # set to each value
# Initialize variables.
$n = 0;
$sum = 0;
# Use a foreach loop to get each value from the array @_, which
# is the argument array and which contains the values.
foreach $x ( @_ )
{
# Add each value to the growing sum.
$sum += $x;
# Increment the count of the number of values.
$n++;
}
# Calculate the mean from the sum and the number of values.
$mean = $sum / $n;
# Return the value of the mean.
return $mean;
}
# calculateStdDev()
#
# Given an array of values, calculate and return the standard deviation
# of the values.
#
# The easy way to compute the standard deviation on a calculator or computer
# is to use the following formulas:
#
# variance = ( sumOfSquares - ( ( sum * sum ) / n ) ) / ( n - 1 ) )
# std. dev. = sqrt( variance )
#
# where sumOfSquares is the sum of the squares of the values, and sum
# is the sum of the values.
sub calculateStdDev
{
my( $n ); # the number of values
my( $stddev ); # the standard deviation, calculated by this
# subroutine, from the variance
my( $sum ); # the sum of the values
my( $sumOfSquares ); # the sum of the squares of the values
my( $variance ); # the variance, calculated by this subroutine
my( $x ); # set to each value
# Initialize variables.
$n = 0;
$sum = 0;
$sumOfSquares = 0;
# Use a foreach loop to get each value from the array @_, which
# is the argument array and which contains the values.
foreach $x ( @_ )
{
# Add the value to the growing sum.
$sum += $x;
# Increment $n, the number of values.
$n++;
# Add the square of the value to the growing sum of the squares.
$sumOfSquares += $x * $x;
}
# Calculate the variance.
$variance = ( $sumOfSquares - ( ( $sum * $sum ) / $n ) ) / ( $n - 1 );
# Calculate the standard deviation, which is the square root of the
# variance.
$stddev = sqrt( $variance );
# Return the calculated standard deviation.
return $stddev;
}
If you run this script with the provided data file,
randomData.txt, you should get the following output:
> perl simpleStats1.pl randomData.txt mean = 4.9238 standard deviation = 2.8450
If you create your own data file using the createRandomData.pl script, the
output will be different. Statistical theory predicts that the sample mean should be
exactly 5.0000 and the sample standard deviation about 2.8868. Here are the results I got
when I ran the createRandomData.pl and the simpleStats1.pl
scripts three separate times:
> perl createRandomData.pl > perl simpleStats1.pl randomData.txt mean = 5.1585 standard deviation = 2.8836 > perl createRandomData.pl > perl simpleStats1.pl randomData.txt mean = 5.0226 standard deviation = 2.8925 > perl createRandomData.pl > perl simpleStats1.pl randomData.txt mean = 4.9792 standard deviation = 2.8496
Homework Assignment
Using the divide1.pl script above as a starting point,
write a script containing subroutines that perform multiplication, division, addition, and
subtraction. Use each subroutine in your script.