Pod::Usage

Table of Contents

Introduction

Pod::Usage is a Perl module that allows the programmer to use POD (Plain Old Documentation) for creating a usage message when the user doesn’t supply the desired arguments to a script.

Reading the Documentation

To use Pod::Usage effectively, you must understand how to write documentation in POD format, you must understand how to use the Getopt::Long module, and you must understand how to use the Pod::Usage module.

#   Shell commands for reading documentation:

perldoc perlpod
perldoc Getopt::Long
perldoc Pod::Usage

A Complete Example

The documentation on all three of these is good, but it takes some close reading followed by some experimentation to put it all together. I applied what I learned to a script I was using as part of a bacterial genome annotation project. Here’s what the code looked like when I finished. Note that this script will not run correctly without the Bifx::RefSeq module, which I am not yet ready to distribute, but this script will serve as a template for new scripts that use Pod::Usage.

#!/usr/local/bin/perl

=head1 NAME

parseRefSeqGenomes.pl - parse RefSeq genomes from NCBI

=head1 SYNOPSIS

Use:

    perl parseRefSeqGenomes.pl [--help] [--man]
                               [--clearDb] [--writeToDb]
                               --path path --division name
                               --out name

Examples:

    perl parseRefSeqGenomes.pl --help

    perl parseRefSeqGenomes.pl --man

    #   Load all RefSeq data:

    perl parseRefSeqGenomes.pl --clearDb --writeToDb
                               --path /data/genomes --division bct
                               --out bct.genomes.pep.fasta

=head1 DESCRIPTION

This script is part of the genome annotation pipeline. The script scans
the GenBank-format RefSeq genomes from NCBI and inserts the data into the
MySQL database; optionally, the script creates a Fasta-format output file
containing the protein sequences.

=head1 ARGUMENTS

parseRefSeqGenomes.pl takes the following arguments:

=over 4

=item help

  --help

(Optional.) Displays the usage message.

=item man

  --man

(Optional.) Displays all documentation.

=item path

  --path name

(Required.) Sets the path to the directory in which the division is found.

=item division

  --division name

(Required.) Sets the name of the GenBank division, a subdirectory of the path
containing one or more GenBank flat files.

=over 8

=item bct

The RefSeq microbial genomes from NCBI.

=back

=item writeToDb

  --writeToDb

(Optional.) Include this argument if you want to write data to the database.

=item clearDb

  --clearDb

(Optional.) Include this argument if you want to delete all data from all
database tables before parsing and inserting begins.

=item outFileName

  --outFileName name

(Optional.) The name of the fasta-format output file containing the protein
sequences extracted from the CDS features of each accession. If not specified,
no fasta data will be created.

=back

=head1 AUTHOR

Conrad Halling, E<lt>conrad.halling@sphaerula.comE<gt>.

=head1 COPYRIGHT

This program is distributed under the Artistic License.

=head1 DATE

31-Mar-2006

=cut


package main;
use 5.8.7;
use strict;
use warnings;
use IO::File;
use Getopt::Long ();    #   Resist name-space pollution!
use Pod::Usage ();      #   Ditto!

use lib '/Users/challing/Perl'; #   Darwin, Crick
use Bifx::RefSeq;


MAIN:
{
    #   Check arguments.

    my( $help, $man, $clearDb, $writeToDb, $division, $path, $outFileName );

    Getopt::Long::GetOptions(
        'help'          =>  \$help,
        'man'           =>  \$man,
        'clearDb'       =>  \$clearDb,
        'writeToDb'     =>  \$writeToDb,
        'division=s'    =>  \$division,
        'path=s'        =>  \$path,
        'out=s'         =>  \$outFileName );

    #   Check for requests for help or for man (full documentation):

    Pod::Usage::pod2usage( -verbose => 1 ) if ( $help );
    Pod::Usage::pod2usage( -exitstatus => 0, -verbose => 2 ) if ( $man  );

    #   Check for required variables.

    unless ( defined( $path ) && defined( $division ) )
    {
        Pod::Usage::pod2usage( -exitstatus => 2 );
    }

    #   Create the refSeq object.

    my $refSeq =
        Bifx::RefSeq->new(
            path        =>  $path,
            division    =>  $division );

    #   Clear the database if requested.

    if ( defined( $clearDb ) )
    {
        $refSeq->clearDb();
    }

    #   Parse the files.
    #   Right now, this loads data into the Accession table,
    #   then creates the Fasta file.

    $refSeq->buildTables(
        outFileName => $outFileName,
        writeToDb   => $writeToDb );

    exit( 0 );
}

Output Examples

Using perldoc

Because the script contains documentation in POD format, the documentation can be viewed using the perldoc command. The -T option prevents use of the system pager when viewing the documentation. (Enter the command perldoc perldoc to learn more about how to use perldoc.)

$ perldoc -T parseRefSeqGenomes.pl
PARSEREFSEQGENOMES(1) User Contributed Perl DocumentationPARSEREFSEQGENOMES(1)



NAME
       parseRefSeqGenomes.pl - parse RefSeq genomes from NCBI

SYNOPSIS
       Use:

           perl parseRefSeqGenomes.pl [--help] [--man]
                                      [--clearDb] [--writeToDb]
                                      --path path --division name
                                      --out name

       Examples:

           perl parseRefSeqGenomes.pl --help

           perl parseRefSeqGenomes.pl --man

           #   Load all RefSeq data:

           perl parseRefSeqGenomes.pl --clearDb --writeToDb
                                      --path /data/genomes --division bct
                                      --out bct.genomes.pep.fasta

DESCRIPTION
       This script is part of the genome annotation pipeline. The script scans
       the GenBank-format RefSeq genomes from NCBI and inserts the data into
       the MySQL database; optionally, the script creates a Fasta-format out-
       put file containing the protein sequences.

ARGUMENTS
       parseRefSeqGenomes.pl takes the following arguments:

       help
             --help

           (Optional.) Displays the usage message.

       man
             --man

           (Optional.) Displays all documentation.

       path
             --path name

           (Required.) Sets the path to the directory in which the division is
           found.

       division
             --division name

           (Required.) Sets the name of the GenBank division, a subdirectory
           of the path containing one or more GenBank flat files.

           bct     The RefSeq microbial genomes from NCBI.

       writeToDb
             --writeToDb

           (Optional.) Include this argument if you want to write data to the
           database.

       clearDb
             --clearDb

           (Optional.) Include this argument if you want to delete all data
           from all database tables before parsing and inserting begins.

       outFileName
             --outFileName name

           (Optional.) The name of the fasta-format output file containing the
           protein sequences extracted from the CDS features of each acces-
           sion. If not specified, no fasta data will be created.

AUTHOR
       Conrad Halling, <conrad.halling@sphaerula.com>.

COPYRIGHT
       This program is distributed under the Artistic License.

DATE
       31-Mar-2006



perl v5.8.7                       2006-03-31             PARSEREFSEQGENOMES(1)
$

Running the Script with Incorrect Arguments

When the script is run with incorrect arguments (including no arguments at all), just the SYNOPSIS section of the documentation is displayed.

$ perl parseRefSeqGenomes.pl
Usage:
    Use:

        perl parseRefSeqGenomes.pl [--help] [--man]
                                   [--clearDb] [--writeToDb]
                                   --path path --division name
                                   --out name

    Examples:

        perl parseRefSeqGenomes.pl --help

        perl parseRefSeqGenomes.pl --man

        #   Load all RefSeq data:

        perl parseRefSeqGenomes.pl --clearDb --writeToDb
                                   --path /data/genomes --division bct
                                   --out bct.genomes.pep.fasta

$

Using the --help Option

When the script is run with the --help option, the SYNOPSIS and ARGUMENTS sections of the documentation are displayed.

$ perl parseRefSeqGenomes.pl --help
Usage:
    Use:

        perl parseRefSeqGenomes.pl [--help] [--man]
                                   [--clearDb] [--writeToDb]
                                   --path path --division name
                                   --out name

    Examples:

        perl parseRefSeqGenomes.pl --help

        perl parseRefSeqGenomes.pl --man

        #   Load all RefSeq data:

        perl parseRefSeqGenomes.pl --clearDb --writeToDb
                                   --path /data/genomes --division bct
                                   --out bct.genomes.pep.fasta

Arguments:
    parseRefSeqGenomes.pl takes the following arguments:

    help
          --help

        (Optional.) Displays the usage message.

    man
          --man

        (Optional.) Displays all documentation.

    path
          --path name

        (Required.) Sets the path to the directory in which the division is
        found.

    name
          --division name

        (Required.) Sets the name of the GenBank division, a subdirectory of
        the path containing one or more GenBank flat files.

        bct     The RefSeq microbial genomes from NCBI.

    writeToDb
          --writeToDb

        (Optional.) Include this argument if you want to write data to the
        database.

    clearDb
          --clearDb

        (Optional.) Include this argument if you want to delete all data
        from all database tables before parsing and inserting begins.

    outFileName
          --outFileName name

        (Optional.) The name of the fasta-format output file containing the
        protein sequences extracted from the CDS features of each accession.
        If not specified, no fasta data will be created.

$

Using the --man Option

When the script is run with the --man option, all of the documentation is displayed using the system pager; this is equivalent to using the command perldoc parseRefSeqGenomes.pl.