SERAM(1)                OGMP SEQUENCE UTILITIES                 SERAM(1)

NAME
    seram - a tool to retrieve sequences in batch and call applications
            (scripts) on them.

SYNOPSIS
    seram -s sequences_file -f scriptname -r|-R [-I] [-D] [-T info]

DESCRIPTION
    seram is a tool that allows a user or another program to request
    a list of sequences using ferret(1) and execute a custom script for
    each of these sequences, or only once for all of them.

OPTIONS
    -s sequences_file is the file containing sequences specifications
    (called the "seqspec" file). Each line in this file is either
    a path to a file containing a sequence or an NCBI format identifier
    for a sequence. Paths are prefixed by the characters "F-" and
    identifiers by "S-". Relative paths are transformed into absolute
    paths by searching for the file relative to the current directory
    from which seram was started; if this fails, the file is searched
    relative to the $FERRETBANK directory (hardcoded in seram). Here
    is an example seqspec file:

        # Example seqspec file
        F-/usr/local/myseqbank/a_sequence
        F-another_sequence
        S-gb|access|locus other information

    The first line is an absolute path to a_sequence. In the second line,
    the path to another_sequence is a relative one. The third line
    is a NCBI identifier for a genbank sequence. Comments must start
    with a "#" and be on separate lines.

    -f scriptname is the file containing the template for the script to
    be executed. It can be any kind of script: bourne shell, c-shell,
    perl, etc as long as the first line contains the correct "#!"
    sequence. The script doesn't need to have the executable bit set,
    since this file is going to be copied by seram into a temporary
    working directory and its executable bit set there. Unless the -I
    option is specified, all occurences of the two characters "{}"
    will be replaced by the filename of a sequence before execution.
    The script will be started with its current directory set to
    $FERRETBANK.

    -r or -R The -r option tells seram that the script must be run for
    EACH sequence specified in the seqspec file; the -R option tells
    it that the script must be run only once, when ALL sequences
    are ready (i.e. have been received by ferret(1)). In the case of
    -R, all sequences are concatenated together in a temporary file
    before being supplied to the script.

    -I The presence or absence of this flag tells seram HOW to
    feed sequences to the script. When present, the sequences are fed
    by standard input to the script, which is executed exactly as
    supplied by the user of seram. When it is absent, all occurences
    of the two characters "{}" in the script are substituted for the
    filename of the sequence.

    -D Tells seram to DELETE the two files scriptname and sequences_file
    after reading them. This can be useful when they are temporary
    files created by another application which doesn't want to wait
    until seram has completed it's work (which can be very long if
    ferret(1) expects some sequences to be received by e-mail) before
    deleting them. Since seram makes a copy of these files in
    a temporary directory managed by itself, the original files are no
    longer needed after seram is invoked and therefore when the -D
    option is present the two files will be deleted soon after
    execution.

    -F Tells seram to ask ferret to retrieve the Full reports associated
    with the sequence specifications, rather than simply the sequence.

    -T info Supply to seram some more info about the program that called
    it. info can be any descriptive text string.

EXECUTION
    seram does the following proccessing on its input:

        1- Validate command line arguments.
        2- Validate the seqspec file. This include:
           - Making sure all paths after F- lines are accessibles,
           - Making sure databases specified by S- lines in the seqspec
             file are supported by ferret(1),
           - Reporting errors and commenting out offending lines.
        3- Send a ferret(1) request for each S- line in the seqspec
           file (if any). Some support files are created when this step
           starts (see the section JOB-SPECIFIC SUPPORT FILES).
        4- If the -r option was specified, seram will:
           a- Execute the script for all sequences specified as F-,
           b- Access ferret(1)'s index file to find out which sequences
              specified as S- have been received, and process them too,
           c- Removed from the seqspec list the lines processed in 4a and 4b,
           d- Wait 15 minutes and try again at step 4b, as long as
              there are still sequences being waited for.
        5- If the -R option was specified, seram will:
           a- Access ferret(1)'s index file to find out which sequences
              specified as S- are still being waited for,
           b- If any such sequence exist, wait 15 mins. and go back to 5a,
           c- Else, all sequences are available, and seram will
              concatenate them all into a temporary file and feed
              this file to the script.

    If seram receives a HANGUP (kill -1) signal while in the 15 minutes
    waiting period, it will stop waiting and immediately proceed to
    checking the arrival states of the expected sequences.

JOB-SPECIFIC SUPPORT FILES
    Starting at step 3 of the preceding execution breakdown, seram creates
    in a private working directory (usually $FERRETBANK/Jobs) four files
    that are a snapshot of the state of the current processing. These files
    can be used if the script is killed and the user wants to restart
    the processing. The "seram.restart.*" file contains the command(s)
    necessary to restart everything. The "seram.script.*" file contains
    a copy of the script template exactly as supplied by the -f
    command line argument, the "seram.data.*" file contains
    to-be-processed sequence specifications (which may not be
    exactly like the sequences_file originally supplied in argument
    since some sequences may have been processed already) and the
    "seram.info.*" file contain relevant textual information about the
    job (this file is mostly used by badger(1)).

KILLING SERAM
    A seram job can be cleanly killed by removing the "seram.info.*" file
    related to that job.

REPORTING
    If sequences are still expected after 24 hours, seram will report
    that to the user by sending her/him an e-mail message. Also, any
    sequence specification which resulted in an error will be reported
    the same way. When the job completes, a final report with information
    regarding all correctly and incorrectly processed specification will be
    sent. If an serious error occurs, then a FATAL error message will be
    sent to both the user who submitted the job and to the OPERATOR,
    as defined by the ferret(1) configuration file.

EXAMPLE
    Here is an example of a working seram invocation. Let scriptfile
    be this:

        ---- Begin scriptfile ----
        #!/bin/sh
        # This is a bourne shell script which send a sequence to a user.
        if [ "{}" = \{\} ] ; then
            cat | mail -s ScriptOutput username      # Seq via stdin
        else
            cat "{}" | mail -s ScriptOutput username # Seq file replaces {}
        fi
        ---- End scriptfile ----

    and sequences_file contains this:

        ---- Begin sequences_file ----
        # A seqspec file
        F-/usr/local/ferretbank/human-complete-genome
        F-human-mitochondrial-genome
        S-emb|X54252|MTCE C. elegans complete mitochondrial genome
        ---- End sequences_file ----

    When seram is invoked as "seram -f scriptfile -s sequences_file -r",
    the script will be run three times for the three sequences,
    with the possibility that seram will have to wait for the third sequence
    to be retrieved by ferret(1).

    When seram is invoked as "seram -f scriptfile -s sequences_file -R",
    the script will wait until the third sequence is received by
    ferret(1) and THEN run the script once with the three sequences
    files concatenated.

    Note that this script is made to work both with or without the
    -I option of seram; when run it will detect if the sequence is
    supplied via stdin or through a filename substituted in it's body.
    
FILES
    scriptfile                 Supplied as argument.
    sequences_file             Supplied as argument.
    $PERLLIB/ogmp/ferret.conf  A configuration file where many absolute
                               paths are defined. $PERLLIB is the standard
                               perl library path.
    $FERRETBANK/               The directory where ferret(1) stores sequences.
    $FERRETBANK/Reports        The directory where ferret(1) stores reports.
    $FERRETBANK/Support/INDEX  Index of sequences/reports.
    $FERRETBANK/Jobs/          A working directory where the seram.xxx.yyy.zz
                               files are created.
    seram.info.$USER.$PID      Main info related to job
    seram.data.$USER.$PID      List of sequences specifications
    seram.script.$USER.$PID    Script to be run for each/all sequences
    seram.restart.$USER.$PID   Small script used to restart this job.

BUGS
    seram is dependant on ferret(1); if ferret doesn't properly update the
    INDEX file, seram may wait for sequences indefinitely. The tool
    badger(1) can be used to monitor (and cancel) seram jobs that have
    been running for too long, or to restart killed seram jobs.

SEE ALSO
    ferret(1), badger(1)

AUTHORS
    Pierre Rioux,
    Tim Littlejohn (Project Management)
    Organelle Genome Megasequencing Program, March 1994, January 1995.