
                           Maildrop tips and tricks

   Last update: 10/11/1998.
   ______________________________________________________________________

   The include statement                           
   The $LINES and $SIZE environment                
   Partial matches using the ! operator            
   Short-circuit evaluation using logical operators
   The foreach statement                           
   The $HOME/.tmp directory                        
   ______________________________________________________________________

   Here's a list of tips and tricks to write efficient mail filters using
   maildrop.  Maildrop  -  just  like  any  programming language - can do
   certain things better than others.


The include statement

   When  maildrop  is  started, it reads $HOME/.mailfilter, or some other
   file  containing  mail  filtering  instructions. Before doing anything
   else,  this  entire file is read and semi-compiled. Any grammatical or
   syntax  errors  will  result  in maildrop terminating with a temporary
   error code, without doing anything with the message.

   The  entire  file  must  be  'semi-compiled".  The  'semi-compilation'
   process   consists  of  building  an  logical  representation  of  the
   filtering statements in memory.

   If  you  have  a  long and complicated set of instructions, it will be
   semi-compiled  whether  or  not  it  will  actually  be  executed. For
   example, consider the following code:

if ( /^Subject: rosebud/ )
{
    ...
    some
    really
    long
    set
    of
    instructions
    ...
}

   If  the  contents  of the "if" construct are large, maildrop may spend
   considerable  amount  of  time  and  memory  semi-compiling  filtering
   instructions  that  may  never  be  used.  Consider  putting all these
   instructions  into a separate file, and using the include statement to
   execute them.

   The  include  statement is processed only when it is actually executed
   by  maildrop.  That  carries  certain  side  effects.  One  feature of
   maildrop  is  the  semi-compilation  which  weeds  out  errors  in the
   filtering  recipe  file, holding all mail and not doing anything until
   the  errors are fixed. The include statement is not processed until it
   is  actually  executed.  If there are errors in the file referenced by
   the  include  statement,  maildrop  will  not know about them until it
   executes  the  include  statement. When maildrop detects the error, it
   will terminate with a temporary error code, holding all mail. However,
   any  filtering  instructions  before  the  include  statement would've
   already been executed.

The $LINES and $SIZE environment variables.

   These variables are initialized to the number of lines in the message,
   and  the  size  of the message in bytes. They may not be exact, due to
   the  presence  of the From_ line, or -A arguments on the command line,
   but they will always be a reasonable approximation.

   Scanning  headers  for  patterns  will  take  the  same amount of time
   whether  the  body of the message has 100 or 1,000 lines. However, the
   "b"  option  causes  the message body to be searched. If someone sends
   you a 5 megabyte binary file, each pattern matched against the message
   body will take a significant time to complete.

   Whenever  possible,  the  first  thing  you  should  do  is check both
   variables  to  see if the message  is excessively large. If so, divert
   the message to a separate mailbox, or reject it.

   Weighted  scoring  is  especially sensitive to the size of the message
   being  delivered.  Normally,  as  soon  as  a  pattern match is found,
   maildrop terminates the pattern scan. If weighted scoring is used then
   the  pattern  scan  is  immediately  restarted,  until  the end of the
   message is reached.

   A  fixed  overhead  is required to start a pattern scan going, so that
   weighted   scoring   that   matches   a   very   short  pattern,  like
   /[:lower:]/:Dbw,1,1  will  take  a  significant  amount of CPU time to
   complete.  Maildrop  is  simply  not  optimized  for  these  kinds  of
   situations.

   Nevertheless, on a Pentium 200, /[:lower:]/:Dbw,1,1 perform reasonably
   well for messages less than 60K long.

   Here's  an  example of using $LINES and $SIZE to divert large messages
   to a separate mailbox:

    if ($LINES > 1000 || $SIZE > 60000)
    {
        to "mail/IN.large"
    }

    .
    .
    .

Partial matches using the ! operator

   Maildrop  implements  the  !  operator  in patterns by breaking up the
   pattern  into  two  separate patterns, which are compiled and executed
   separately.

   Whenever  the  first pattern is succesfully matched, maildrop runs the
   second pattern. If the second pattern does not match, maildrop resumes
   matching the first pattern until it matches again.

   Since   starting   a  pattern  match  involves  some  fixed  overhead,
   minimizing  the  number  of  times  the  first pattern will match will
   reduce the amount of time it takes to match the entire pattern.

   For  example:  you're  trying  to extract an IP address from the first
   Received header. Here's your first attempt at doing so:

    /^Received:.*![0-9\.]+/
    IP=$MATCH2

   This will not work at all. From the manual page for maildropfilter:

       When  there  is  more  than  one  way  to  match a string,
       maildrop favors matching as much  as  possible  initially.

   So this fragment will set IP to the last digit, or period found in the
   first  Received:  header,  because  the first half of the pattern will
   match as much as possible.

   Here's your second attempt, then:

    /^Received:.*[^0-9\.]![0-9\.]+/
    IP=$MATCH2

   This  will  work,  except  the  first  half of the pattern will end up
   matching every character in the Received: header that's not a digit or
   a  period.  Maildrop  will then stop and try to run the second half of
   the pattern.

   Your  mail  server  will  probably  put  a  left bracket before the IP
   address  of the relay. Noting that, the following code picks up the IP
   address as efficiently as possible:

    /^Received:.*\[![0-9\.]+/
    IP=$MATCH2

Short-circuit evaluation using logical operators

   The  logical  operators || and && work in maildrop just like they work
   in C. Consider the following expression:

    if (/ ... some pattern ... / || / ... some other pattern ... /)
    {
        .
        .
        .
    }

   If  the  first  pattern is found, maildrop will not execute the second
   pattern, since the logical expression will be true no matter what. You
   can  use  that to your advantage. If you have two patterns, and one of
   them  takes  more  time  to process, put the other first. Even if both
   patterns  are  relatively quick, if you expect one of them to be found
   almost every time, put that one first so that it will not be necessary
   to run the second pattern most of the time.

   The same concept works for the && operator:

    if (/ ... some pattern ... / && / ... some other pattern ... /)
    {
        .
        .
        .
    }

   If  the  first  pattern  is  not  found, maildrop will not execute the
   second  pattern,  since the logical expression will be false no matter
   what.

The foreach statement

   The foreach statement is implemented as follows. Maildrop will execute
   the  regular expression, and compile a list of all the patterns in the
   message  matched  by the regular expression. Afterwards, maildrop will
   execute the foreach statement, once for each matched pattern.

   It  is  important  to  note that maildrop will create a list in memory
   containing  every  pattern  matched  by the regular expression. if the
   regular  expression  is  found  many  times  in the message, this will
   require  a  lot  of  memory.  For example, the following is a Very Bad
   Thing[tm]:

    foreach /[:alpha:]/
    {
        .
        .
        .
    }

   Maildrop  does  not  include  any  built-in  limits on resource usage,
   except  for  the watchdog timer. Therefore, as a system administrator,
   it  is  your  responsibility  to  limit resources used by the maildrop
   process.

   This  implementation  of  the foreach statement is the one that yields
   the  least  number  of  surprises.  Observe  that  the contents of the
   foreach  construct can modify the message using the xfilter statement.
   Furthermore, the regular expression may include variable substitution,
   and those variables could be modified by the foreach statement. If the
   foreach  statement  were  to  be  executed "on the fly", these factors
   would yield unpredictable - and rather messy - results.

The $HOME/.tmp directory

   Under  certain conditions, maildrop will need to use a temporary file.
   Temporary  files  are  created  in the $HOME/.tmp directory. Temporary
   files  are  used  to  store large messages (instead of storing them in
   memory).  Temporary  files may also be used by the xfilter command, or
   in other situations.

   Maildrop   will   automatically  delete  all  temporary  files  before
   terminating. However, if the maildrop process is killed, or terminates
   abnormally,  a  temporary  file  may remain and take up disk space. To
   have  those leftover files automatically cleaned up, put the following
   code  in  /etc/profile,  or  some  other file that all users run after
   logging in:

    find $HOME/.tmp -depth -mtime +7 \( \( \
        -type d -exec rmdir {} \; \) -o -exec rm -f {} \; \) 2>/dev/null
