Regular Expressions and Character Sets

This table shows the Check Point implementation of standard regular expression metacharacters.

Non-Printable Characters

Character Types

Supported Character Sets

The DLP Gateway scans texts in the UTF-8 Unicode character encoding. It therefore converts the messages and files that it scans from its initial encoding to UTF-8.

Before it can change the encoding of the message or file, the DLP Gateway must identify the encoding. The DLP Gateway does this using the meta data or the MIME Headers. If none of the two exist, the default gateway encoding is used.

The DLP Gateway determines the encoding of the message or file it scans as follows:

  1. If the file contains meta data, the DLP Gateway reads the encoding from there. For example: Microsoft Word files contain the encoding in the file.

  2. Some files have no meta data, but do have MIME headers. Text files or the body of an email, for example. For those files the DLP Gateway reads the encoding from the MIME headers:

    Content-Type: text/plain; charset="iso-2022-jp"

  3. Some files do not have meta data or MIME headers. For those files, the DLP Gateway assumes that the encoding of the original message or file is the default encoding of the gateway. A log message is written to $DLPDIR/log/dlpe_problem_files.log:

    Charset for file <file name> is not provided. Using the default: <charset name>

    The out-of-the-box default encoding is Windows Code Page 1252 (Latin I). This can be changed.

To change the default encoding of the DLP Gateway:

  1. On the DLP Gateway, edit the file:

    • R77, R77.10, R77.20 - $DLPDIR/config/dlp.conf

    • R77.30 - $FWDIR/conf/file_convert.conf

  2. In the engine section , search for the default_charset_for_text_files field.

    For example:

    :default_charset_for_text_files (windows-1252)

    Use one of the supported aliases as the value of this field. Each character set has one or more optional aliases.

    For example, to make the default character set encoding Russian KOI8-R, change the field value as follows:

    :default_charset_for_text_files (KOI8-R)

If the DLP Gateway cannot use an encoding for a message or file, an error message shows in $DLPDIR/log/dlpe_problem_files.log:

File <file name> has unsupported charset: <charset name>. Trying to convert anyway

If the DLP Gateway cannot use an encoding, it is possible that it cannot convert the message (or parts of it) to UTF-8. If that is so, the DLP Gateway will not fully scan the message.

Character Set Aliases