Introduction
Pod::Usage is a Perl module that allows the programmer to use POD
(Plain Old Documentation) for creating a usage message
when the user doesn’t supply the desired arguments to a script.
Reading the Documentation
To use Pod::Usage effectively, you must understand how to write documentation
in POD format, you must understand how to use the Getopt::Long module, and
you must understand how to use the Pod::Usage module.
# Shell commands for reading documentation: perldoc perlpod perldoc Getopt::Long perldoc Pod::Usage
A Complete Example
The documentation on all three of these is good, but it takes some close reading followed
by some experimentation to put it all
together. I applied what I learned to a script I was using as part of a bacterial genome
annotation project. Here’s what the code looked like when I finished. Note that this
script will not run correctly without the Bifx::RefSeq module, which I am not yet ready
to distribute, but this script will serve as a template for new scripts that use
Pod::Usage.
#!/usr/local/bin/perl
=head1 NAME
parseRefSeqGenomes.pl - parse RefSeq genomes from NCBI
=head1 SYNOPSIS
Use:
perl parseRefSeqGenomes.pl [--help] [--man]
[--clearDb] [--writeToDb]
--path path --division name
--out name
Examples:
perl parseRefSeqGenomes.pl --help
perl parseRefSeqGenomes.pl --man
# Load all RefSeq data:
perl parseRefSeqGenomes.pl --clearDb --writeToDb
--path /data/genomes --division bct
--out bct.genomes.pep.fasta
=head1 DESCRIPTION
This script is part of the genome annotation pipeline. The script scans
the GenBank-format RefSeq genomes from NCBI and inserts the data into the
MySQL database; optionally, the script creates a Fasta-format output file
containing the protein sequences.
=head1 ARGUMENTS
parseRefSeqGenomes.pl takes the following arguments:
=over 4
=item help
--help
(Optional.) Displays the usage message.
=item man
--man
(Optional.) Displays all documentation.
=item path
--path name
(Required.) Sets the path to the directory in which the division is found.
=item division
--division name
(Required.) Sets the name of the GenBank division, a subdirectory of the path
containing one or more GenBank flat files.
=over 8
=item bct
The RefSeq microbial genomes from NCBI.
=back
=item writeToDb
--writeToDb
(Optional.) Include this argument if you want to write data to the database.
=item clearDb
--clearDb
(Optional.) Include this argument if you want to delete all data from all
database tables before parsing and inserting begins.
=item outFileName
--outFileName name
(Optional.) The name of the fasta-format output file containing the protein
sequences extracted from the CDS features of each accession. If not specified,
no fasta data will be created.
=back
=head1 AUTHOR
Conrad Halling, E<lt>conrad.halling@sphaerula.comE<gt>.
=head1 COPYRIGHT
This program is distributed under the Artistic License.
=head1 DATE
31-Mar-2006
=cut
package main;
use 5.8.7;
use strict;
use warnings;
use IO::File;
use Getopt::Long (); # Resist name-space pollution!
use Pod::Usage (); # Ditto!
use lib '/Users/challing/Perl'; # Darwin, Crick
use Bifx::RefSeq;
MAIN:
{
# Check arguments.
my( $help, $man, $clearDb, $writeToDb, $division, $path, $outFileName );
Getopt::Long::GetOptions(
'help' => \$help,
'man' => \$man,
'clearDb' => \$clearDb,
'writeToDb' => \$writeToDb,
'division=s' => \$division,
'path=s' => \$path,
'out=s' => \$outFileName );
# Check for requests for help or for man (full documentation):
Pod::Usage::pod2usage( -verbose => 1 ) if ( $help );
Pod::Usage::pod2usage( -exitstatus => 0, -verbose => 2 ) if ( $man );
# Check for required variables.
unless ( defined( $path ) && defined( $division ) )
{
Pod::Usage::pod2usage( -exitstatus => 2 );
}
# Create the refSeq object.
my $refSeq =
Bifx::RefSeq->new(
path => $path,
division => $division );
# Clear the database if requested.
if ( defined( $clearDb ) )
{
$refSeq->clearDb();
}
# Parse the files.
# Right now, this loads data into the Accession table,
# then creates the Fasta file.
$refSeq->buildTables(
outFileName => $outFileName,
writeToDb => $writeToDb );
exit( 0 );
}
Output Examples
Using perldoc
Because the script contains documentation in POD format, the documentation
can be viewed using the perldoc command. The -T option
prevents use of the system pager when viewing the documentation. (Enter the command
perldoc perldoc to learn more about how to use perldoc.)
$ perldoc -T parseRefSeqGenomes.pl
PARSEREFSEQGENOMES(1) User Contributed Perl DocumentationPARSEREFSEQGENOMES(1)
NAME
parseRefSeqGenomes.pl - parse RefSeq genomes from NCBI
SYNOPSIS
Use:
perl parseRefSeqGenomes.pl [--help] [--man]
[--clearDb] [--writeToDb]
--path path --division name
--out name
Examples:
perl parseRefSeqGenomes.pl --help
perl parseRefSeqGenomes.pl --man
# Load all RefSeq data:
perl parseRefSeqGenomes.pl --clearDb --writeToDb
--path /data/genomes --division bct
--out bct.genomes.pep.fasta
DESCRIPTION
This script is part of the genome annotation pipeline. The script scans
the GenBank-format RefSeq genomes from NCBI and inserts the data into
the MySQL database; optionally, the script creates a Fasta-format out-
put file containing the protein sequences.
ARGUMENTS
parseRefSeqGenomes.pl takes the following arguments:
help
--help
(Optional.) Displays the usage message.
man
--man
(Optional.) Displays all documentation.
path
--path name
(Required.) Sets the path to the directory in which the division is
found.
division
--division name
(Required.) Sets the name of the GenBank division, a subdirectory
of the path containing one or more GenBank flat files.
bct The RefSeq microbial genomes from NCBI.
writeToDb
--writeToDb
(Optional.) Include this argument if you want to write data to the
database.
clearDb
--clearDb
(Optional.) Include this argument if you want to delete all data
from all database tables before parsing and inserting begins.
outFileName
--outFileName name
(Optional.) The name of the fasta-format output file containing the
protein sequences extracted from the CDS features of each acces-
sion. If not specified, no fasta data will be created.
AUTHOR
Conrad Halling, <conrad.halling@sphaerula.com>.
COPYRIGHT
This program is distributed under the Artistic License.
DATE
31-Mar-2006
perl v5.8.7 2006-03-31 PARSEREFSEQGENOMES(1)
$
Running the Script with Incorrect Arguments
When the script is run with incorrect arguments (including no arguments at all), just the
SYNOPSIS section of the documentation is displayed.
$ perl parseRefSeqGenomes.pl
Usage:
Use:
perl parseRefSeqGenomes.pl [--help] [--man]
[--clearDb] [--writeToDb]
--path path --division name
--out name
Examples:
perl parseRefSeqGenomes.pl --help
perl parseRefSeqGenomes.pl --man
# Load all RefSeq data:
perl parseRefSeqGenomes.pl --clearDb --writeToDb
--path /data/genomes --division bct
--out bct.genomes.pep.fasta
$
Using the --help Option
When the script is run with the --help option, the SYNOPSIS
and ARGUMENTS sections of the documentation are displayed.
$ perl parseRefSeqGenomes.pl --help
Usage:
Use:
perl parseRefSeqGenomes.pl [--help] [--man]
[--clearDb] [--writeToDb]
--path path --division name
--out name
Examples:
perl parseRefSeqGenomes.pl --help
perl parseRefSeqGenomes.pl --man
# Load all RefSeq data:
perl parseRefSeqGenomes.pl --clearDb --writeToDb
--path /data/genomes --division bct
--out bct.genomes.pep.fasta
Arguments:
parseRefSeqGenomes.pl takes the following arguments:
help
--help
(Optional.) Displays the usage message.
man
--man
(Optional.) Displays all documentation.
path
--path name
(Required.) Sets the path to the directory in which the division is
found.
name
--division name
(Required.) Sets the name of the GenBank division, a subdirectory of
the path containing one or more GenBank flat files.
bct The RefSeq microbial genomes from NCBI.
writeToDb
--writeToDb
(Optional.) Include this argument if you want to write data to the
database.
clearDb
--clearDb
(Optional.) Include this argument if you want to delete all data
from all database tables before parsing and inserting begins.
outFileName
--outFileName name
(Optional.) The name of the fasta-format output file containing the
protein sequences extracted from the CDS features of each accession.
If not specified, no fasta data will be created.
$
Using the --man Option
When the script is run with the --man option, all of the documentation
is displayed using the system pager; this is equivalent to using the command
perldoc parseRefSeqGenomes.pl.