Imagine the following situation: When receiving German texts
containing umlaut characters the latter may not be properly displayed in
a Unix terminal window or output device. Furthermore, a simple search
and replace of individual characters does only make sense if the
translation relation between input and output is known. Having to deal
with several sources of information, i. e. text processors and/or
machine platforms, these relations may vary depending on the source.
The umlaut utility allows you to specify the type of input and output
seperately. The input filter is responsible for selecting the
individual characters. This filter consists of associations of the
seven umlauts, Ä, ä, Ö, ö, Ü, ü and ß
to certain values, that is, if such a value occurs in the input stream
it is dealt with as a umlaut character. Recognized umlaut characters
are then passed on to the second filter, the output filter (see
below). None-umlaut characters are simply forwarded to the output.
Apart from the built-in default associations the individual
definitions may be specified on the command line or a special
definition file.
The second filter manipulates the utility's output behaviour. At
present, translations to ASCII-sequences, HTML, TeX and LaTeX are
supported. ASCII-sequences mean that individual character values are
transferred to one of the strings 'Ae', 'ae', 'Oe', 'oe', 'Ue', 'ue'
or 'sz'. HTML-sequences replace single values by Ä and ä
instead of 'Ae', 'ae', a.s.o. TeX and LaTex control sequences for
umlauts are \"A or \"a and "A or "a, respectively.
Additionally to the character translation individual characters can
simply be thrown away, if their value exceeds 127. This option
garantees 7-bit-clean output.