next up previous contents
Next: 6. Writing programs to Up: Aspell .33.7.1 alpha A Previous: 4.
Managing Word Lists   Contents

Subsections

  * 5.1 Specifying Options
      + 5.1.1 At the Command Line
          o 5.1.1.1 Boolean
          o 5.1.1.2 Value
          o 5.1.1.3 List
      + 5.1.2 Via a Configuration File
          o 5.1.2.1 Boolean
          o 5.1.2.2 Value
          o 5.1.2.3 List
      + 5.1.3 Via an Environmental Variable
  * 5.2 The Options
      + 5.2.1 Basic Options
      + 5.2.2 Dictionary Options
      + 5.2.3 Run-together Word Options
      + 5.2.4 Filter Options
      + 5.2.5 Aspell Utility Options
  * 5.3 Dumping Configuration Values
  * 5.4 Notes on various Options
      + 5.4.1 Pertaining to which word lists to use
          o 5.4.1.1 How Aspell Selects an Appropriate Dictionary
          o 5.4.1.2 About Multi Dictionaries
          o 5.4.1.3 Provided Word Lists
      + 5.4.2 Notes on Various Filters and Filter Modes
          o 5.4.2.1 None Mode
          o 5.4.2.2 Url Filter/Mode
          o 5.4.2.3 Email Filter/Mode
          o 5.4.2.4 SGML Filter/Mode
          o 5.4.2.5 TEX Filter/Mode
      + 5.4.3 Notes on the Prefix Option
      + 5.4.4 Notes on Typo-Analysis and the Keyboard Definition File
      + 5.4.5 Notes on the Different Suggestion Modes

--------------------------------------------------------------------------


5. Customizing Aspell

The behavior of Aspell can be changed by any number of options which can
be specified at either the command line, the environmental variable
ASPELL_CONF, a personal configuration file, or a global configuration
file. Options specified on the command line override options specified by
the environmental variable. Options specified by the environmental
variable override options specified by either of the configurations files.
Finally options specified by the personal configuration file override
options specified in the global configuration file. Options specified in
the environmental variable ASPELL_CONF, a personal configuration file, or
a global configuration file will take effect no matter how Aspell is used
which includes being used by other applications.

Aspell has three basic type of options: boolean, value, and list. Boolean
options are either enabled or disabled, value options take a specific
value, and list options can either have entries added or removed from the
list.

5.1 Specifying Options

5.1.1 At the Command Line

All options specified at the command line have the following basic format:

    --option[=value]

where the '=' can be replaced by whitespace.

However some options also have single letter abbreviations of the form:

    -letter[optional whitespacevalue]

5.1.1.1 Boolean

To enable a boolean option simply special the option with out any
corresponding value. For example to ignore accents when checking words use
"--ignore-accents". To disable a boolean option prefix the option name
with a "dont-". For example to not ignore accents when checking words
use "--dont-ignore-accents".

If a boolean option has a single letter abbreviation simply give the
letter corresponding to either enabling or disabling the option with out
any corresponding value. For example to consider run-together words legal
use "-C" or to consider them illegal use "-B"

5.1.1.2 Value

To specify a value option simply specify the option with its corresponding
value. For example to set the filter mode to Tex use "--mode=tex".

If a value option has a single letter shortcut simply specify the single
letter short cut with its corresponding value. For example to use a large
american dictionary use "-d american-lrg".

5.1.1.3 List

To add a value to the list prefix the option name with a "add-" and then
specify the value to add. For example to add the URL filter use
"--add-filter url". To remove a value from a list option prefix the
option name with a "rem-" and then specify the value to remove. For
example to remove the URL filter use "--rem-filter url". To remove all
items from a list prefix the option name with a "rem-all" without
specify any value. For example to remove all filters use
"--rem-all-filter".

5.1.2 Via a Configuration File

Aspell can also accept options via a personal or global configuration
file. The exact files to used are specified by the options per-conf and 
conf respectfully but the personal configuration file is normally
".aspell.conf" located in the HOME directory and the global one is
normally "aspell.conf" which is located in the etc directory which is
normally "/usr/etc" or "/usr/local/etc". To find out the particular
values for your particular system use "aspell dump config".

Each line of the configuration file has the format:

    option [value]

There may any number of spaces between the option and the value however it
can only be spaces, ie there is no '=' between the option name and the
value.

Comments may also be included by preceding them with a '#' as anything
from a "#" to a newline is ignored. Blank lines are also allowed.

Values set in the personal configuration file override those in the global
file. Options specified at either the command line or via an environmental
variable override those specified by either configuration file.

5.1.2.1 Boolean

To specify a boolean option simply include the option followed by a
"true" to enable it or a "false" to disable it. For example to allow
run-together words use "run-together true".

5.1.2.2 Value

To specify a value option simply include the option followed by the
corresponding option. For example to set the default language to german
use "lang german".

5.1.2.3 List

To add a value to the list prefix the option name with a "add-" and then
specify the value to add. For example to add the URL filter use
"add-filter url". To remove a value from a list option prefix the option
name with a "rem-" and then specify the value to remove. For example to
remove the URL filter use "rem-filter url". To remove all items from a
list prefix the option name with a "rem-all" without specify any value.
For example to remove all filters use "rem-all-filter".

5.1.3 Via an Environmental Variable

The environmental variable ASPELL_CONF may also be used and it overrides
any options set in the configuration file. The format of the string is
exactly the same as the configuration file except that semicolons ( ; )
are used instead of newlines.

5.2 The Options

The following is a list of available options broken down by category. Each
entry has the following format:

option[,single letter abbreviations]
    (type) description

Where single letter options are specified as they would appear at the
command line, ie with the preceding dash. Boolean single letter options
are specified in the following format:

    -abbreviation to enable|-abbreviation to disable

Option is one of the following: boolean, string, file, dir, integer, or 
list. String, file, dir, and integer types are all value options which can
only take a specific type of value.

5.2.1 Basic Options

conf
    (file) main configuration file
conf-dir
    (dir) location of main configuration file
data-dir
    (dir) location of language data files
local-data-dir
    (dir) alternative location of language data files. This directory is
    searched before data-dir. It defaults to the same directory the actual
    main word list is in (which is not necessarily dict-dir).
filter
    (list) add or removes a filter
home-dir
    (dir) location for personal files
ignore,-W
    (integer) ignore words <= n chars
ignore-case
    (boolean) ignore case when checking words
ignore-accents
    (boolean)ignore accents when checking words
ignore-repl
    (boolean) ignore commands to store replacement pairs
save-repl
    (boolean) save the replacement word list on save all
lang
    (string) default language to use when creating a dictionary or all
    else fails
language-tag
    (string) language code to use when selecting a dictionary, it follows
    the same format of the LANG environmental variable on most systems. In
    fact it defaults to the value of the LANG environmental variable if it
    is set.
mode
    (string) sets the filter mode. Mode is one if none, url, email, sgml,
    or tex. (The short cut options '-e' may be used for email, '-H' for
    Html/Sgml, or '-t' for Tex)
per-conf
    (file) personal configuration file
personal,-p
    (file) personal word list file name
prefix
    (dir) prefix directory
set-prefix
    (boolean) set the prefix based on executable location (only works on
    Win32 and when compiled with --enable-win32-relocatable)
repl
    (file) replacements list file name
keyboard
    (file) the base name of the keyboard definition file to use (see
    section 5.4.4)
sug-mode
    (mode) suggestion mode = ultra | fast | normal | bad-spellers (see
    section 5.4.5)

5.2.2 Dictionary Options

The following options may be used to control which dictionaries to use and
how they behave (see section 5.4.1 for more information):

master,-d
    (string) base name of the main dictionary to use. The default Aspell
    installation provided the following dictionaries: american, british,
    and canadian.
dict-dir
    (dir) location of the main word list
extra-dicts
    (list) extra dictionaries to use
strip-accents
    (boolean) strip accents from all words in the dictionary

5.2.3 Run-together Word Options

These may be used to control the behavior of run-together words (see
section 7.4 for more information):

run-together,-C|-B
    (boolean) consider run-together words legal
run-together-limit
    (integer) maximum numbers that can be strung together
run-together-min
    (integer) minimal length of interior words

5.2.4 Filter Options

These options modify the behavior of the various filters (see section 
5.4.2 for more information):

add|rem-email-quote
    (list) email quote characters
email-margin
    (integer) num chars that can appear before the quote char
sgml-check
    (list) sgml tags to always check.
sgml-extension
    (list) sgml file extensions.
tex-command
    (list) TEX commands
tex-check-comments
    (boolean) check TEX comments

5.2.5 Aspell Utility Options

These options are may only be specified at the command line as there are
aspell utility specific:

backup,-b|-x
    (boolean) create a backup file by appending ".bak" to the file name.
    (Only applies when the command is check)
time
    (boolean) time load time and suggest time in pipe mode.
reverse
    (boolean) reverse the order of the suggestions list.

5.3 Dumping Configuration Values

To find out the current value of all the options use the command "aspell
dump config". This will dump the current configuration to standard
output. The format of the contents dumped is such that it can be used as
either the global or personal configuration file.

5.4 Notes on various Options


5.4.1 Pertaining to which word lists to use

5.4.1.1 How Aspell Selects an Appropriate Dictionary

Aspell will go through the following steps to find an appropriate
dictionary:

 1. If the master options is set in any fashion (via the command line, the
    ASPELL_CONF environmental variable, or a configuration file) look for
    a dictionary of that name. If one could not be found complain.
 2. If the language-tag (not lang) option or LANG environmental variable
    is set and master option is not then use it (giving preference to 
    language-tag over LANG) to search for an appropriate dictionary.
    Aspell will use the same strategy that Pspell does, which is based on
    the installed .pwli files, to find an appropriate word list. For more
    information of how this is done see the Pspell manual.
 3. If 2 fails than look for a dictionary of the same name of current
    setting of the lang options. This will currently work even if the
    language name is invalid, but this fact should not be relied upon as
    it is an implementation detail.
 4. Finally, if all else fails, complain.

5.4.1.2 About Multi Dictionaries

As with precious versions of aspell you can specify the main dictionary to
use via the -d or --master option. However as of Aspell .32 you can now
also:

 1. Specify more than word list to use with the extra-dicts option.
 2. Optionally have all accents striped form the word lists using 
    strip-accents option. This is not the same thing as the ignore-accents
    option. As enabling the ignore-accents would accept both cafe and caf
    (notice the accent on the e), but only enabling strip-accents would
    only accent cafe, even if caf is in the original dictionary. Specify 
    strip-accents is just like using a word list with out the accents.
 3. Specify special "multi" dictionaries.

A "multi" dictionary is a special file which basically a list of
dictionary files to use. A multi dictionary must end is .multi and has
roughly the same format of a configuration file where the two valid keys
are add and strip-accents. The add key is used for adding individual word
lists, or other "multi" files. The strip-accents key is used to control
if accents are striped from the dictionaries. Unlike the global
strip-accent option this option only effects word lists that came after
the option. For example:

    strip-accents yes
    add english
    strip-accents no
    add must-accent

will strip accents from the english word list but not the must-accent word
list. If the global strip-accents option is specified the local
strip-accents options are ignored.

5.4.1.3 Provided Word Lists

Aspell now provides multi dictionaries for three variates of english: 
american-med, british-med, and canadian-med. The word lists themselves all
contain accented words however the strip-accents option is enabled by
default for all the individual word lists. If you wish to use the accented
words you can set the global strip-accents option to false or create a new
multi word list.

Great care has been taken so that that only one spelling for any
particular word is included in the main list. When two variants were
considered equal I randomly picked one for inclusion in the main word
list. Unfortunately this means that my choice in how to spell a word may
not match your choice. If this is the case you can try to include one of
the special variant dictionaries with the add-extra-dict option. You can
chose from english-variant-0, english-variant-1, and english-variant-2.
Each of these word lists included all the others from the previous variant
level, thus there is no need to include more than one. English-variant-0
includes most variants which are considered almost equal,
english-variant-1 include variants which are also generally considered
acceptable, and english-variant-2 contains variants which are seldom used.
These special variant dictionaries are an experimental feature so please
let me know if you take advantage of them. If no one seams to be using
them I may no longer provide them in a future release of Aspell.

Many other dictionary sizes and varieties can be created. See the scowl/
directory in the source distribution for information on the different
varieties you can create and section 4 for how to create an individual
dictionary.


5.4.2 Notes on Various Filters and Filter Modes

Aspell now has rudimentary filter support. You can either select from
individual filters or chose a filter mode. To select a filter mode use the
mode option. You may chose from none, url, email, sgml, and tex. The
default mode is url. Individual filters can be added with the option 
add-filter and remove with the rem-filter option. The currently available
filters are url, email, sgml, tex as well as a bunch of filters which
translate the text from one format to another.

5.4.2.1 None Mode

This mode is exactly what it says. It turns off all filters.

5.4.2.2 Url Filter/Mode

The url filter/mode skips over URL's, host names, and email addresses.
Because this filter is almost always useful and rarely does any harm it is
enabled in all modes except none. To turn it off either select the none
mode or use rem-filter option after the desired mode is selected.

5.4.2.3 Email Filter/Mode

The email filter/mode skips over quoted text. It currently does not
support skipping over headers however a future version should. In the mean
time I suggest you use Aspell with Newsbody which can be found at http://
home.worldonline.dk/~byrial/newsbody/. The option email-skip controls the
number of characters that can appear before the email quote char, the
default is 10. The option add|rem-email-quote controls the characters that
are considered quote characters, the default is ">' and '|'.


5.4.2.4 SGML Filter/Mode

The sgml filter/mode will skip over sgml commands. It currently does not
handle nested < > unless they are in quotes. It also does it handle the
null end tag (net) minimization feature of sgml such as

    <emphasis/important/

The option add|rem-sgml-check controls which sgml tags should always be
checked. The default is "alt".

The option add|rem-sgml-extension controls which file extensions are
recognized as sgml/html files. The default is html, htm, php, and sgml.
The extension are not case sensitive so extensions like .HTM will also be
recognized.

The sgml mode also enables a filter which will recognize sgml charter
commands such as &amp; and convert it into the proper iso8859-1 character.
Currently only the iso8859-1 character set is used however in future
versions it will convert it to the encoding that is specified in the
language date file. You can specifically turn on this filter by enable the
SGML&charset/charset filter.

5.4.2.5 TEX Filter/Mode

The tex (all lowercase) filter/mode skips over TEX commands and parameters
and/or options to certain command. It also skips over TEX comments by
default. The option [dont-]tex-check-comments controls whether or not
aspel will skip over TEX comments. The option add|rem-tex-command controls
which TEX commands should have certain parameters and/or options also
skipped over. Commands that are not specified will have all there
parameters and/or options checked. The format for each item is

    command  a list of p,P,o and Os

The first item is simple the command name. The second item controls which
parameters to skip over. A 'p' skips over a parameter while a 'P' won't.
Similar an 'o' will skip over an optional parameter while a 'O' won't. The
first letter on the list will apply to the first parameter, the second
letter will apply to the second parameter etc. If there are more
parameters than letters Aspell will simply check them as normal. For
example the option

    add-tex-command rule pp

will skip over the first two parameters of the "rule" command while the
option

    add-tex-command foo Pop

will check the first parameter of the "foo" command, skip over the next
optional parameter, if it is present, and will skip over the second
parameter -- even if the optional parameter is not present -- and will
check any additional parameters.

A'*' at the end of the command is simply ignored. For example the option

    enlargethispage p

will ignore the first parameter in both enlargethispage and
enlargethispage*.

To remove a command simple use the rem-tex-command option. For example

    rem-tex-command foo

will remove the command foo, if present, from the list of TEX commands.

5.4.3 Notes on the Prefix Option

The prefix option is there to allow Aspell to easily be relocated.
Changing prefix will change all directory names relative to the new prefix
that are not explicitly set. For example if prefix was "/usr/local/
aspell" and dict-dir has a default value of "/usr/local/aspell/dict"
than changing prefix to "/opt/aspell" will also change the default value
of dict-dir to "/opt/aspell/dict". Note that modifying prefix will only
effect the default compiled in values of directories. If a directory
option is explicitly given a value than changing the value of prefix has
no effect on that directory option.


5.4.4 Notes on Typo-Analysis and the Keyboard Definition File

Aspell .33 and better will, in general, give a higher priority to certain
misspelling which are likely to be due to typos such as "teh" instead of
"the" or "hapoy" instead of "happy". However in order to do this
well Aspell needs to know the layout of the keyboard. The keyboard
definition file simply identifies keys that are right next to each other.
The file has an extension of .kbd and each line consists of two letters
corresponding to two keys that are right next to each other. For example
the line "as" will indicate that 'a' and 's' are right next to each
other. If "as" is listed as a entry it is not necessary to list "sa"
as an entry as that will be done automatically. Also by "right next to
each other" I mean to keys that are close enough together that it is easy
to type one instead of the other. On most keyboards this means keys that
are to the left or to the right of each other and not keys that are below
or above it.

The default for this option is normally "standard". However the default
can be changed via the language data file. The normal default,
"standard", should work well for most QWERTY like keyboard layouts. It
may need minor adjusting for foreign keyboards and will need to be
completely rewritten for a Dvorak layout. When creating a keyboard
definition file for a foreign language please keep in mind that Aspell
completely ignores accents when scoring words so that the key 'o' and ''
will appear to be the same key to aspell even if they are in fact separate
keys on your keyboard.


5.4.5 Notes on the Different Suggestion Modes

In order to understand what these suggestion modes do, a basic
understanding of how aspell works is required. See section 8 for that. The
suggestion modes are as follows.

ultra
    This method will use the fastest method available to come up with
    decent suggestions. This currently means that it will look for
    soundslikes within one edit distance apart without doing any typo
    analysis. It is slower than Ispell by a factor of 1.5 to 2 when a
    single word list is used. It speed is only minor affected by the size
    of the word list, if at all, but it is strongly effected by the number
    of word lists use. In this mode Aspell gets about 87% of the words
    from my small test kernel of misspelled words. (Go to http://
    aspell.sourceforge.net/testfor more info on the test kernel as well as
    comparisons of this version of Aspell with previous versions and other
    spell checkers.)
fast
    This method is like ultra except that it also performs typo analysis
    unless it is turned off by setting the keyboard to none. The typo
    analysis brings words which are likely to be due to typos to the
    beginning of the list but slows things down by a factor of about two.
    This mode should get around the same number of words that the ultra
    method does.
normal
    This method looks for soundslikes within two edit distance apart and
    perform typo-analysis unless it is turned off. Is is around 10 times
    slower than fast mode with the english word list but returns better
    suggestions. Its speed is directly proportional to the size of the
    word list. This mode gets 93% of the words.
bad-spellers
    This method also looks for soundslikes within two edit distances apart
    but is more tailored for the bad speller where as fast or normal are
    more tailed to strike a good balance between typos and true
    misspellings. This mode never performs typo-analysis and returns a
    huge number of words for the really bad spellers who can't seam to get
    the spelling anything close to what it should be. If the misspelled
    word looks anything like the correct spelling it is bound to be found
    somewhere on the list of 100 or more suggestions. This mode gets 98%
    of the words.

--------------------------------------------------------------------------
next up previous contents
Next: 6. Writing programs to Up: Aspell .33.7.1 alpha A Previous: 4.
Managing Word Lists   Contents
Kevin Atkinson 2001-08-19
