Regular Expressions
Regular expressions are special characters that match or capture portions of a field. This sections covers special characters supported by Check Point and the rules that govern them.
Overview of Regular Expressions
Some IPS protections allow granular configuration with regular expressions. A regular expression is made up of two basic types of characters:
Metacharacters: characters that have special meaning, such as \ ( . | and *.
Simple characters: Any character that is not a metacharacter, for example, an alpha-numeric char that is not preceded by a backslash. These characters are treated as literals.
For example, in the Header Rejection protection, you can configure header patterns with regular expressions. The protection blocks packets with matching headers. The Header Rejection protection blocks PeoplePage Spyware by matching packet headers with the (O|o)(C|c)(S|s)(L|l)ab (A|a)uto(U|u)pdater regular expression.
Metacharacters
Some metacharacters are recognized anywhere in a pattern, except within square brackets; other metacharacters are recognized only in square brackets.
The Check Point set of regular expressions has been enhanced for R70 and higher.
Metacharacter
|
Meaning
|
Earlier?
|
\ (backslash)
|
escape character, and other meanings
|
partial
|
[ ] (square brackets)
|
character class definition
|
yes
|
( ) (parenthesis)
|
subpattern
|
yes
|
{ } (curly brackets)
|
min/max quantifier
|
no
|
. (dot)
|
match any character
|
yes
|
? (question mark)
|
zero or one quantifier
|
yes
|
* (asterisk)
|
zero or more quantifier
|
yes
|
+ (plus)
|
one or more quantifier
|
yes
|
| (vertical bar)
|
start alternative branch
|
yes
|
^ (circumflex anchor)
|
anchor pattern to beginning of buffer
|
yes
|
$ (dollar anchor)
|
anchor pattern to end of buffer
|
yes
|
Backslash
The meaning of the backslash (\) character depends on the context. The following explanations are not all supported in earlier versions.
In R70 and above, backslash escapes metacharacters inside and outside character classes.
Escaping Symbols
If the backslash is followed by a non-alphanumeric character, it removes its metacharacter function. For example, \* matches an asterisk, not "any character".
You cannot use \ to escape a letter that is not a metacharacter. For example, because "g" is not a metacharacter, you cannot use \g.
Encoding Non-Printable Characters
To use non-printable characters in patterns, escape the reserved character set.
Character
|
Description
|
\a
|
alarm; the BEL character (hex 07)
|
\cx
|
"control-x", where x is any character
|
\e
|
escape (hex 1B)
|
\f
|
formfeed (hex 0C)
|
\n
|
newline (hex 0A)
|
\r
|
carriage return (hex 0D)
|
\t
|
tab (hex 09)
|
\ddd
|
character with octal code ddd
|
\xhh
|
character with hex code hh
|
Specifying Character Types
To specify types of characters in patterns, escape the reserved character.
Character
|
Description
|
\d
|
any decimal digit [0-9]
|
\D
|
any character that is not a decimal digit
|
\s
|
any whitespace character
|
\S
|
any character that is not whitespace
|
\w
|
any word character (underscore or alphanumeric character)
|
\W
|
any non-word character (not underscore or alphanumeric)
|
Square Brackets
Square brackets ([ ]) designate a character class and match a single character in the string. Inside a character class, only the character class metacharacters (backslash, circumflex anchor and hyphen) have special meaning.
You must use a backslash when you use character class metacharacters as literals inside a character class only. Square brackets that are used as literals must always be escaped with backslash, both inside and outside a character class.
For example, [[abc] should be written: [\[abc]
Metacharacter
|
Meaning
|
\ (backslash)
|
general escape character
|
^ (circumflex anchor)
|
negate the class, if this is the first character in the brackets (If ^ is not the first, it is not a metacharacter.)
|
Parentheses
Parentheses ( ) designate a subpattern. To match with either an open-parenthesis or closing-parenthesis, use the backslash to escape the symbol.
Hyphen
A hyphen '-' indicates a character range inside a character class. When used as a simple character in a character class, it must be escaped by using a backslash '\'.
For example: [a-z] matches the lower-case alphabet.
Dot
Outside a character class, a dot (.) matches any one character in the string.
For example: .* matches zero or more occurrences of any character
Inside a character class, it matches a dot (.).
Quantifiers
Various metacharacters indicate how many instances of a character, character set or character class should be matched. A quantifier must not follow another quantifier, an opening parenthesis, or be the expression’s first character.
These quantifiers can follow any of the following items:
- a literal data character
- an escape such as \d that matches a single character
- a character class
- a sub-pattern in parentheses
Curly Brackets
Curly brackets { } are general repetition quantifiers. They specify a minimum and maximum number of permitted matches.
{match the string if at least n times, match the string if not more than n times}
For example: a{2,4} matches aa, aaa, or aaaa, but not a or aaaaa
{n} - exactly n times
{n,} - no maximum limit
For example:
\d{8} matches exactly 8 digits[aeiou]{3,} matches at least 3 successive vowels, but may match many more
|
Note - A closing curly bracket '}' that is not preceded by an opening curly bracket '{' is treated as a simple character.
It is good practice to use a backslash, '\}', when using a closing curly bracket as a simple character.
|
Question Mark
Outside a character class, a question mark (?) matches zero or one character in the string. It is the same as using {0,1}.
For example: c([ab]?)r matches car, cbr, and cr
Inside a character class, it matches a question mark: [?] matches ? (question mark).
Asterisk
Outside a character class, an asterisk (*) matches any number of characters in the string. It is the same as using {0,}.
For example: c([ab]*)r matches car, cbr, cr, cabr, and caaabbbr
Inside a character class, it matches an asterisk: [*] matches * (asterisk).
Plus
Outside a character class, a plus (+) matches one or more characters in the string. It is the same as using {1,}.
For example: c([ab]+)r matches character strings such as car, cbr, cabr, caaabbbr; but not cr
Inside a character class, it matches a plus: [+] matches + (plus).
Vertical Bar
A vertical bar (|) is used to separate alternative patterns.
If the right side is empty, this symbol indicates the NULL string: a| matches a or empty string.
For example: a|b matches a or b
Circumflex Anchor
A circumflex anchor (^; also known as a caret) is used to match only the beginning of a buffer. The circumflex is treated as an anchor only when it is the first character in the pattern and can also be used to negate a character class, but only if it is the first character of the class.
A circumflex anchor that is used as literal must always be escaped with backslash, both inside and outside character class.
Dollar Anchor
A dollar anchor ($) is used as a metacharacter only if it is the last character of a pattern and only to match the end of a buffer.
A dollar anchor that is used as literal must be escaped with backslash when it is not inside a character class.
For example: ab$ matches a string that ends in ab
Internal Options
To configure for compilation options from within the pattern, enclose the option strings between curly brackets, with a colon at the end: { }:
To specify multiple option strings, use the semicolon (;) as a separator.
An internal option setting must appear at the beginning of the pattern, and are applied to the whole pattern.
For example: {case;literal}:*a matches the string "*a"
The option strings are described in the following table.
Internal Option Strings
Option String
|
Description
|
case
|
Treat all characters in the pattern as case-sensitive
|
caseless
|
Treat all characters in the pattern as case-insensitive
|
literal
|
Treat all characters in the pattern as literals (metacharacters are treated as regular characters)
|
LSS(string)
|
Force string to be the pattern's LSS
|
Earlier Versions
If you have gateways of earlier versions, and you create a regular expression for a protection enabled on such a gateway, IPS checks if the pattern is supported. If a pattern does not support both earlier versions and the new version of Check Point regular expressions, you are notified.
If you have earlier gateways as well as newer ones, and you want to configure a protection against a pattern, you can do one of the following:
- Change the pattern to use metacharacters that are supported by both the newer version of Check Point software and the earlier versions.
- Configure GuiDBedit for both patterns.
Support for Internal Option Settings
Internal compilation options are not supported in earlier versions.
Support for Backslash
- Escaping symbols: In earlier versions, the backslash to escape metacharacters applies only outside character classes. For example:
\* matches *; but [\*] matches "\*". - Specifying character types: In earlier versions, this usage of backslash is not supported.
- Encoding non-printable characters: In earlier versions, this usage of backslash is not supported.
Support for Square Brackets
To make a closing square bracket be part of a character class (to match ] as a character in a string):
- In earlier versions: use the closing bracket as the first character in the class; or, if using the circumflex anchor, it may come after the circumflex anchor.
[]] or [^]] - In R70 and above: escape the closing bracket with a backslash.
[\]]
Support for Quantifiers
In earlier versions, only * (zero or any number), + (one or more), and ? (zero or more) are supported. The minimum/maximum quantifiers (using curly brackets) are not supported in earlier versions.
Support for Circumflex and Dollar Anchors
If the protection against the pattern is for earlier gateways as well as for newer ones, do not write a circumflex or dollar in the middle of the pattern.
If you want to specify a literal circumflex or dollar outside square brackets, always add a preceding backslash.
- In earlier versions, the circumflex or dollar anchor is always a metacharacter (unless preceded by backslash or inside a character class).
- In R70 and above, the circumflex anchor is a metacharacter only if it is the first character of a pattern; and the dollar anchor is a metacharacter only if it is the last character of a pattern.
Support for Hyphen
A hyphen (-) is used to specify a range of characters in a character class. For example, [a-z] matches the lower-case alphabet.
- In earlier versions, if a hyphen is required as a character without special meaning, it must be the first or last character in a character class.
- In R70 and above, if a hyphen is required as a regular character, it must be escaped with a backslash.
|