Open Frames Download Complete PDF Send Feedback Print This Page

Previous

Regular Expressions

Regular expressions are special characters that match or capture portions of a field. This sections covers special characters supported by Check Point and the rules that govern them.

In This Appendix

Overview of Regular Expressions

Metacharacters

Internal Options

Earlier Versions

Overview of Regular Expressions

Some IPS protections allow granular configuration with regular expressions. A regular expression is made up of two basic types of characters:

Metacharacters: characters that have special meaning, such as \ ( . | and *.

Simple characters: Any character that is not a metacharacter, for example, an alpha-numeric char that is not preceded by a backslash. These characters are treated as literals.

For example, in the Header Rejection protection, you can configure header patterns with regular expressions. The protection blocks packets with matching headers. The Header Rejection protection blocks PeoplePage Spyware by matching packet headers with the (O|o)(C|c)(S|s)(L|l)ab (A|a)uto(U|u)pdater regular expression.

Metacharacters

Some metacharacters are recognized anywhere in a pattern, except within square brackets; other metacharacters are recognized only in square brackets.

The Check Point set of regular expressions has been enhanced for R70 and higher.

Metacharacter

Meaning

Earlier?

\ (backslash)

escape character, and other meanings

partial

[ ] (square brackets)

character class definition

yes

( ) (parenthesis)

subpattern

yes

{ } (curly brackets)

min/max quantifier

no

. (dot)

match any character

yes

? (question mark)

zero or one quantifier

yes

* (asterisk)

zero or more quantifier

yes

+ (plus)

one or more quantifier

yes

| (vertical bar)

start alternative branch

yes

^ (circumflex anchor)

anchor pattern to beginning of buffer

yes

$ (dollar anchor)

anchor pattern to end of buffer

yes

Backslash

The meaning of the backslash (\) character depends on the context. The following explanations are not all supported in earlier versions.

In R70 and above, backslash escapes metacharacters inside and outside character classes.

Escaping Symbols

If the backslash is followed by a non-alphanumeric character, it removes its metacharacter function. For example, \* matches an asterisk, not "any character".

You cannot use \ to escape a letter that is not a metacharacter. For example, because "g" is not a metacharacter, you cannot use \g.

Encoding Non-Printable Characters

To use non-printable characters in patterns, escape the reserved character set.

Character

Description

\a

alarm; the BEL character (hex 07)

\cx

"control-x", where x is any character

\e

escape (hex 1B)

\f

formfeed (hex 0C)

\n

newline (hex 0A)

\r

carriage return (hex 0D)

\t

tab (hex 09)

\ddd

character with octal code ddd

\xhh

character with hex code hh

Specifying Character Types

To specify types of characters in patterns, escape the reserved character.

Character

Description

\d

any decimal digit [0-9]

\D

any character that is not a decimal digit

\s

any whitespace character

\S

any character that is not whitespace

\w

any word character (underscore or alphanumeric character)

\W

any non-word character (not underscore or alphanumeric)

Square Brackets

Square brackets ([ ]) designate a character class and match a single character in the string. Inside a character class, only the character class metacharacters (backslash, circumflex anchor and hyphen) have special meaning.

You must use a backslash when you use character class metacharacters as literals inside a character class only. Square brackets that are used as literals must always be escaped with backslash, both inside and outside a character class.

For example, [[abc] should be written: [\[abc]

Metacharacter

Meaning

\ (backslash)

general escape character

^ (circumflex anchor)

negate the class, if this is the first character in the brackets
(If ^ is not the first, it is not a metacharacter.)

Parentheses

Parentheses ( ) designate a subpattern. To match with either an open-parenthesis or closing-parenthesis, use the backslash to escape the symbol.

Hyphen

A hyphen '-' indicates a character range inside a character class. When used as a simple character in a character class, it must be escaped by using a backslash '\'.

For example: [a-z] matches the lower-case alphabet.

Dot

Outside a character class, a dot (.) matches any one character in the string.

For example: .* matches zero or more occurrences of any character

Inside a character class, it matches a dot (.).

Quantifiers

Various metacharacters indicate how many instances of a character, character set or character class should be matched. A quantifier must not follow another quantifier, an opening parenthesis, or be the expression’s first character.

These quantifiers can follow any of the following items:

  • a literal data character
  • an escape such as \d that matches a single character
  • a character class
  • a sub-pattern in parentheses

Curly Brackets

Curly brackets { } are general repetition quantifiers. They specify a minimum and maximum number of permitted matches.

{match the string if at least n times, match the string if not more than n times}

For example: a{2,4} matches aa, aaa, or aaaa, but not a or aaaaa

{n} - exactly n times

{n,} - no maximum limit

For example:

  • \d{8} matches exactly 8 digits
  • [aeiou]{3,} matches at least 3 successive vowels, but may match many more

Note - A closing curly bracket '}' that is not preceded by an opening curly bracket '{' is treated as a simple character.

It is good practice to use a backslash, '\}', when using a closing curly bracket as a simple character.

Question Mark

Outside a character class, a question mark (?) matches zero or one character in the string. It is the same as using {0,1}.

For example: c([ab]?)r matches car, cbr, and cr

Inside a character class, it matches a question mark: [?] matches ? (question mark).

Asterisk

Outside a character class, an asterisk (*) matches any number of characters in the string. It is the same as using {0,}.

For example: c([ab]*)r matches car, cbr, cr, cabr, and caaabbbr

Inside a character class, it matches an asterisk: [*] matches * (asterisk).

Plus

Outside a character class, a plus (+) matches one or more characters in the string. It is the same as using {1,}.

For example: c([ab]+)r matches character strings such as car, cbr, cabr, caaabbbr; but not cr

Inside a character class, it matches a plus: [+] matches + (plus).

Vertical Bar

A vertical bar (|) is used to separate alternative patterns.

If the right side is empty, this symbol indicates the NULL string: a| matches a or empty string.

For example: a|b matches a or b

Circumflex Anchor

A circumflex anchor (^; also known as a caret) is used to match only the beginning of a buffer. The circumflex is treated as an anchor only when it is the first character in the pattern and can also be used to negate a character class, but only if it is the first character of the class.

A circumflex anchor that is used as literal must always be escaped with backslash, both inside and outside character class.

Dollar Anchor

A dollar anchor ($) is used as a metacharacter only if it is the last character of a pattern and only to match the end of a buffer.

A dollar anchor that is used as literal must be escaped with backslash when it is not inside a character class.

For example: ab$ matches a string that ends in ab

Internal Options

To configure for compilation options from within the pattern, enclose the option strings between curly brackets, with a colon at the end: { }:

To specify multiple option strings, use the semicolon (;) as a separator.

An internal option setting must appear at the beginning of the pattern, and are applied to the whole pattern.

For example: {case;literal}:*a matches the string "*a"

The option strings are described in the following table.

Internal Option Strings

Option String

Description

case

Treat all characters in the pattern as case-sensitive

caseless

Treat all characters in the pattern as case-insensitive

literal

Treat all characters in the pattern as literals (metacharacters are treated as regular characters)

LSS(string)

Force string to be the pattern's LSS

Earlier Versions

If you have gateways of earlier versions, and you create a regular expression for a protection enabled on such a gateway, IPS checks if the pattern is supported. If a pattern does not support both earlier versions and the new version of Check Point regular expressions, you are notified.

If you have earlier gateways as well as newer ones, and you want to configure a protection against a pattern, you can do one of the following:

  • Change the pattern to use metacharacters that are supported by both the newer version of Check Point software and the earlier versions.
  • Configure GUIDBedit for both patterns.

Support for Internal Option Settings

Internal compilation options are not supported in earlier versions.

Support for Backslash

  • Escaping symbols: In earlier versions, the backslash to escape metacharacters applies only outside character classes. For example: \* matches *; but [\*] matches "\*".
  • Specifying character types: In earlier versions, this usage of backslash is not supported.
  • Encoding non-printable characters: In earlier versions, this usage of backslash is not supported.

Support for Square Brackets

To make a closing square bracket be part of a character class (to match ] as a character in a string):

  • In earlier versions: use the closing bracket as the first character in the class; or, if using the circumflex anchor, it may come after the circumflex anchor. []] or [^]]
  • In R70 and above: escape the closing bracket with a backslash. [\]]

Support for Quantifiers

In earlier versions, only * (zero or any number), + (one or more), and ? (zero or more) are supported. The minimum/maximum quantifiers (using curly brackets) are not supported in earlier versions.

Support for Circumflex and Dollar Anchors

If the protection against the pattern is for earlier gateways as well as for newer ones, do not write a circumflex or dollar in the middle of the pattern.

If you want to specify a literal circumflex or dollar outside square brackets, always add a preceding backslash.

  • In earlier versions, the circumflex or dollar anchor is always a metacharacter (unless preceded by backslash or inside a character class).
  • In R70 and above, the circumflex anchor is a metacharacter only if it is the first character of a pattern; and the dollar anchor is a metacharacter only if it is the last character of a pattern.

Support for Hyphen

A hyphen (-) is used to specify a range of characters in a character class. For example, [a-z] matches the lower-case alphabet.

  • In earlier versions, if a hyphen is required as a character without special meaning, it must be the first or last character in a character class.
  • In R70 and above, if a hyphen is required as a regular character, it must be escaped with a backslash.
 
Top of Page ©2014 Check Point Software Technologies Ltd. All rights reserved. Download Complete PDF Send Feedback Print