Detector Engine

Query Structure

Detectors are composed of rules, or queries that are compiled into an efficient detector and are run with the Code Security engine against files.

Each query is a group of patterns, called a pattern_group and is hierarchical (a pattern group can contain more pattern groups and so on).

A pattern group is a collection of patterns with an aggregate relation.

Copy

    pattern_group:
      aggregate: or | and | append
      patterns:
      - pattern: "(:?key|token|secret|password|pwd|passwd)=(.*)" # assignment
        pattern_type: single
      - pattern: "hello" # assignment
        pattern_type: multi

Prematch Testers

A prematch tester is a test that runs before applying an in-depth matching and detection logic. As an example, it is more appropriate to use bail out detection for a small binary file with class documentation.

test_content_prematch

This meta-tester uses content classification and inference engine. It is a collection of testers that are useful when deciding if a certain file is worth getting a deep dive into.

By testing for content, you can:

Filter for an unexpected binary file.
Ensure a non-empty file goes through for further detection.
Be able to run on classes of files, where Code Security has classified those by their content nature.

Content class	Example
`code/infra`	Ruby, Python
`data/infra`	SQL, JSON JavaScript Object Notation. A lightweight data interchange format.
`binary`	Binary files
`docs`	Markdown, Text
`tests`	Unit tests, other test code
`examples`	Example code, demo code and others
`vendor`	3rd party code sitting in `node_modules` and others
`files`	A general file class not fitting a single class.

Usage

Copy

pattern: ".*"
test_content_prematch:
    binary: false
    minlen: 20,
    maxlen: 2000,
    content_classes: 
    - code/infra # our own classfication engine results
    # content_classes_not: 
    # - Code # the inverse of content_class
    content_types:
    - Python # a programmingg language *name* (if you want extension, there's ways for that too)
    content_types_not: 
    # - Ruby # the inverse of content_types

Test positive

Copy

<a Python file, size at 20-2000 bytes>

Test negative

Copy

<an SQL file, or a small file, or a binary file, etc.>

test_regex_prematch

You can test for a specific pre-match structure before Code Security deep dives into further matching.

By testing for Regex prematch, you can:

Make sure a certain file structure exists before applying further testing, such as variable assignments.
Verify that a certain 'sentinel' word exists in a large file by applying a generic word lookup, before applying a more specific matching.

Usage

Copy

pattern: "pass:(.*)"
test_regex_prematch:
    - on: 0 # on full text
      pattern: "aws\\.amazon\\.com"

Test positive

Copy

<large documentation file>
Here is how to connect to our database
1. Log into AWS console (console.aws.amazon.com)
2. Use following details:
DB pass: shazam123

Test negative

Copy

<Big file, not containing any mention of AWS detail>

Content Testers

test_fingerprints

Code Security can create one-way fingerprints for you to use when you want to detect pieces of information you cannot reveal.

By using test_fingerprints, you can:

Detect credit cards
Find classified or private domains or hosts

First, you must generate your fingerprint. It is done locally on your machine using a secure and salted one-way hash:

Copy

$ $HOME/.spectral/spectral fingerprint --text <your private text>
< fingerprint >

Then, copy the resulting fingerprint.

Usage

Copy

pattern: "host=([a-zA-Z0-9_-.]+)"
test_fingerprints:
  - on: 1
    with: "<your fingerprint>"
    is: true

Copy

Note that by specifying the character class and narrowing it down, we give some
useful information to attackers looking to bruteforce private information. Always be mindful that your character classes and secrets are wide enough.

Test positive

Copy

<private host>

Test negative

Copy

<any other text>

test_from_env

You can collect secrets from your ENV, rather than encode those as fingerprints and still search for them in your code. Code Security supports fetching those from your ENV, and relaying to the detector to use.

By using test_from_env, you can:

Detect secrets that you already have in your environment (local machine or CI) without exposing them.
Find secrets that you do not want to expose in a persistent way.

To test, make sure to export it first:

Copy

$ SOME_SECRET_VAR=shazam $HOME/.spectral/spectral scan --nosend

Usage

Copy

pattern: "host=(.*)"
test_from_env:
  - on: 1
    with: "SOME_SECRET_VAR"
    is: true

Test positive

Copy

shazam

Test negative

Copy

foobar

test_luhn

The Luhn algorithm is used for check-sum of a credit card and many forms of Social Security Number (SSN) numbers of the US, Canada and Israel.

By testing for Luhn, you can:

Ensure a number is a valid credit card number.
Verify that a given string match passes as a valid SSN, which helps identify fake from test strings.

Usage

Copy

pattern: "account=([0-9]+)"
test_luhn:
  - on: 1
    is: true

Test positive

Copy

79927398713

Test negative

Copy

79927398710

References

Wikipedia

test_number

Available from: v1.4.2

Test for an generic representation of a number.

By testing for numbers, you can rule out a value that is supposed to be a password or a token.

Usage

Copy

pattern: "key=(.*)"
test_number:
  - on: 1
    is: false

Test positive

Copy

key=<random token>

Note that by returning false and is: false, test_number provides a positive outcome.

Test negative

Copy

key=0.1234

test_base64, test_base64bin

Verify that a text is a base64 encoded or binary encoded. Supports all common variants of encoding (URL safe and others).

By testing for base64, you can:

Ensure that a match is base64 and fail fast in a sequence of tests when you are looking for a token.
Validate that a string is base64 encoded given you suspect that it may contain sensitive information.

Usage

Copy

pattern: "account_encoded='([[:alnum:]/+]+[=]{0,2})'"
test_base64:
  - on: 1
    is: true

Test positive

Copy

account_encoded='eyAiYWNjb3VudCI6ICJzZWNyZXQtbnVtYmVyIiB9'

Test negative

Copy

account_encoded='replace_me'

The binary variant first decodes the base64 encoded string, and then tests whether it is binary or not:

Copy

pattern: "account_encoded='([0-9]+)'"
test_base64bin:
  - on: 1
    is: true

test_binary

As Code Security detectors are binary-aware, you can test for binary matches in any capturing expression.

By testing for binary data, you can flag and avoid matches that are false and contain no text.

Usage

Copy

pattern: "token=(.*)"
test_binary:
  - on: 1
    is: false

Test positive

Copy

<BINARY DATA>token=<BINARY_DATA>

Test negative

Copy

token=48SfRa4idxxUVyPAejafXxwjkreyj8MoJkjV

The binary variant first decodes the base64 encoded string, and then tests whether it is binary or not:

Copy

pattern: "account_encoded='([0-9]+)'"
test_base64bin:
  - on: 1
    is: true

test_maxlen, test_minlen

Test for content size, minimum or maximum.

By testing for content size, you can:

Ensure to fail fast for very short strings or very large content, and skip the match.
Validate that on top of the various structural captures that you have done, you end up with a reasonable sized match.

Usage

Copy

pattern: "account_encoded='(.*)'"
test_minlen:
  - on: 1
    score: 2

Test positive

Copy

account_encoded='eyAiYWNjb3VudCI6ICJzZWNyZXQtbnVtYmVyIiB9'

Test negative

Copy

account_encoded='XX'

In the same way, you can use maxlen:

Copy

pattern: "account_encoded='(.*)'"
test_maxlen:
  - on: 1
    score: 2000

Structural Testers

test_jwt

A JWT(JSON Web Token) test is an Internet proposed standard for creating data with optional signature and/or optional encryption, whose payload holds JSON that asserts claims, often used for service-to-service authentication.

By testing for JWT, you can:

Make sure the key structure fits a standard JWT.
Verify that a certain JWT is semantically valid (header is valid).

Usage

Copy

pattern: "token=(\\S+)"
test_jwt:
  - on: 1
    is: true

Test positive

Copy

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJsb2dnZWRJbkFzIjoiYWRtaW4iLCJpYXQiOjE0MjI3Nzk2Mzh9.gzSraSYS8EXBxLN_oWnFSRgCzcmJmMjLiuyu5CSpyHI

Test negative

Copy

bad_token

References

JSON Web Token at Wikipedia

test_uri

A URI or URL parsing test. A given string is tested to be a valid URI.

By testing for URI, you can:

Isolate URLs that are sensitive before applying further matching logic.
Detect various kinds of authentication, such as Bearer, Basic and more, given a URL request structure (for example curl'ing URLs).

Usage

Copy

pattern: "curl\\s.*(http.*)"
test_uri:
  - on: 1
    is: true

Test positive

Copy

curl -L -o https://dev.acme.corp/secure/credentials.json -H"Authorization: Bearer <token>"

Test negative

Copy

sh curl.sh arg1 arg2

test_tvar

Test for various template variables, common in configuration and IaC files.

By testing for template variables, you can filter for legitimate configuration that was built with proper template variables instead of hardcoded secrets.

Usage

Copy

pattern: "DB_PASS=(.*)"
test_tvar:
  - on: 1
    is: false

Test positive

Note - is: false is a positive outcome if the candidate does not contain a template variable.

Copy

DB_PASS=my-secret-password

Test negative

Copy

DB_PASS={{.Env.DBPass}}

test_changeme

Available from: v1.4.2

Test for various changeme values. Developer sometimes indicate a value to be replaced by various commonly-known idioms, such as fixme and XXX, which is also known as changeme.

By testing for changeme:

You can filter for mock values, or "TODO: replace this" values.
Use this in combination with other testers to create a powerful detector.

Usage

Copy

pattern: "DB_PASS=(.*)"
test_changeme:
  - on: 1
    is: false

Test positive

Note - is: false is a positive outcome if the candidate does not contain a changeme value.

Copy

DB_PASS="<real password>"

Test negative

Copy

DB_PASS="XXX"

test_assignment

Available from: v1.4.2

Test for an assignment structure.

By testing for assignment:

You can set the scene for detectors which are only interested in one part of an assignment clause.
Combine an expected assignment with another tester to create a more powerful detector.

Usage

Copy

pattern: "DB_PASS(.*)"
test_assignment:
  - on: 0 # on the complete expression
    is: true
test_token:
  - on: 1
    is: true

Test positive

Copy

DB_PASS=<random token>

Test negative

Copy

DB_PASS, foo, bar

test_uuid

Test if a given string is a UUID. It supports all UUID types and formats (with or without hyphens, and with or without a prefix).

By testing for UUID, you can ignore suspect strings that are randomly generated but in fact are IDs (database IDs or other).

Usage

Copy

pattern: "key=(.*)"
test_uuid:
  - on: 1
    is: false

Test positive Note is: false so a positive outcome is candidate NOT containing a UUID:

Copy

key=my-secret-key

Test negative

Copy

key=<UUID representing a DB table primary key>

test_regex, test_regex_not

A test_regex is a tester that can verify a structural form after a match is a found. You can verify the match further.

By using test_regex, you can:

Apply a clearer set of validations, readable and maintainable.
Split verification into stages to pronounce a specific use case:
- Capture something vague. For example, Bearer (.*)).
- Run a semantic tester. For example, test_token on the token part of the bearer.
- Run a structural tester. For example, "it should look like a curl request" with test_regex.
Apply verification that is beyond a Regex DFA capabilities. For example, a state machine with more aggressive but performant backtracking can first be achieved by running two separate ones and combining later.

As an array based tester, an AND relation is created between elements, and short-circuiting (failing fast) is applied.

test_regex - all must apply, fail if one does not apply
test_regex_not - all must not apply, fail if one applies

Usage

Copy

pattern: "token=(.+)"
test_regex:
    - on: 1
    pattern: "([0-9].*){2}" # the value include at least 2 numbers
    - on: 1
    pattern: "([a-zA-Z].*){2}" # the value include at least 2 letters

Test positive

Copy

token=48SfRa4idxxUVyPAejafXxwjkreyj8MoJkjV

Test negative

Copy

token=env.get('token')

Usage (test_regex_not)

Copy

pattern: "token=(\\S+)"
test_regex_not:
    - on: 1
    pattern: "[$][a-zA-Z0-9_-]+" # the value include valid template variable.
    - on: 1
    pattern: "(?i)(exmaple|test|fake|1234|abcde|xxxx|foobar)" # the value include some word or pattern that can tell that this is just a token placeholder.

Test positive

Copy

token=48SfRa4idxxUVyPAejafXxwjkreyj8MoJkjV

Test negative

Copy

token=$my_token
token=48SfRa4idxxUVyPAejafXxwjkreyjEXMAPLE
token=testRa4idxxUVyPAejafXxwjkreyj8MoJkjV
token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
token=test1234abcdefoobarfake

Semantic Testers

test_cword

Test for the percentage of common words in a given string. Based on a unique and massive tech-related common words dictionary model.

By testing for common words. you can:

Rule out non-machine generated keys.
Validate that a given match passes as a machine generated secret.

Usage

Copy

pattern: "pass=(.*)"
test_cword:
  - on: 1
    from: 0.0 # defines a range of accepted percentage
    to: 0.2   # low percentage of common words (up to 20%)

Test positive

Copy

zx28821a{_)

Test negative

Copy

hello

test_zx

Test for password strength based on the popular zxcvbn library.

By testing for zx (abbreviated), you can:

Detect strong passwords amongst fake.
Apply existing policies for enforcing password strength.

Usage

Copy

pattern: "pass=(.*)"
test_zx:
  - on: 1
    score: 4.0 # same standard score scale (0-4) from zxcvbn

Test positive

Copy

zxHELLOyw{_)

Test negative

Copy

foobar

test_pass

Test for password strength (own model). Pick a threshold on a scale of 0-100.0. A password with strength > 80 is considered strong.

By using test_pass, you can detect strong passwords amongst fake.

Usage

Copy

pattern: "pass=(.*)"
test_pass:
  - on: 1
    score: 80.0 # scale: 0-100

Test positive

Copy

zxHELLOyw{_)

Test negative

Copy

foobar

test_token

Test for tokens, keys, and machine-generated secrets (own model).

By using test_pass, you can:

Detect real tokens, keys, and secrets.
Verify that a machine generated token is secret by model attributes.

Usage

Copy

    - pattern: "token=(.*)"
      pattern_type: multi 
      test_token:
      - on: 1
        score: 0.6  # True if the score is bigger then 0.6 
                    # max is 1, min is 0

Test positive

Copy

token=48SfRa4idxxUVyPAejafXxwjkreyj8MoJkjV

Test negative

Copy

token=AnotherVariableOfClientData[0];

test_entropy

A normalized entropy test. We do not recommend this test as entropy is a metric not optimized for finding secrets and sensitive information. You can use entropy if you use legacy infrastructure and policies.

Usage

Copy

    - pattern: "token=(.*)"
      pattern_type: multi 
      test_entropy:
      - on: 1
        score: 4.0  # True if the entropy of the value is bigger then 4
                    # max is 5, min is 0

Test positive

Copy

token=48SfRa4idxxUVyPAejafXxwjkreyj8MoJkjV

Test negative

Copy

token=G6q5oRa4idxxxxxxxxxxxxxwjkreyj8MoJkjV
token=FooBarFooBarFooBarFooBarFooBarFooBar
token=asdfdsafdsfasdfadsfadsfdasfasdfsdafsdf

Testing Detector

To test, you can selectively include your new detectors by using --just-ids and/or --just-tags. With these you can use any of the common Code Security commands:

If you want to run your new rule on your entire Github org:

Copy

$HOME/.spectral/spectral github ... --just-ids PRV001

Alternatively, just to scan your current repo:

Copy

$HOME/.spectral/spectral run ... --just-tags acme-security

Submit the Detector for Review

Contact Check Point Support Center to review your detector. Ensure to redact sensitive information in the detector before your submit it. Check Point can help you build it and give you a free detector building session.