Codeprinting

Code Security can detect code copies, partial copies and fuzzy copies. If a piece of your sensitive code, configuration or any textual assets must be in a specific predefined place, Code Security can create a custom detector that looks for stray copies of it or partial (modified) copies of it.

You can use codeprinting to:

Locate a configuration sprawl. A securely stored sensitive configuration file that individuals copy-paste between projects that are deemed unsafe.
Trace a file found in mobile apps by mistake and is now delivered to many end-user devices as part of an APK build.
Locate a complete codebase that is misplaced on a production server, a sandbox computer or other unauthorized devices.

Creating Effective Codeprints

Code Security helps you create codeprints securely and locally. Your code is never transmitted anywhere, and all codeprint hashes are produced with a local and secure hashing algorithm.

Codeprint hashes are a safe one-way hash converted into a textual string, which also hold a comparative trait which Code Security uses to measure code copies, partial copies, or fuzzy copies. You can store these in your custom detectors for detecting code copying.

Warning - Keep your codeprints private to your organization. While codeprints are not secrets, and cannot be reversed to the original text, virtually all hashes or one-way functions, such as MD5, SHA256, and others can be used to extract indirect knowledge about an organization.

Quick Start

Creating a codeprint essentially is creating a custom detector with your codeprints in it.

To create a codeprint, run:

$ $HOME/.spectral/spectral fingerprint --codeprint [FILE1] [FILE2] ...

Pick files that represent code, configuration, docs or other pieces of information that is unique to your organization or are deemed sensitive.

Code Security generates a detector for you and generates an output:

Copy

rules:
  - id: CPRT001
    applies_to:
      - ".*$"
    description: Detect code copies via secure codeprinting
    name: Codeprint detector
    severity: info
    tags:
      - base
      - codeprints
    pattern_group:
      patterns:
        - pattern: ".*"
          match_on_path: true
          pattern_type: single
          test_codeprints:
          - print: ".."
          - print: ".."

applies_to - Use this to block any unwanted file for scanning.
match_on_path - Rewires spectral to look at file paths and not content. You can use pattern to apply a secondary filtering rule (regex).
test_codeprints - Actual codeprints.

You can copy or pipe to your own spectral/rules/rules.yaml file and store the file in a secure location.

Do's

For each file Code Security scans, it matches against one of the codeprints in the list, so you can add more than one codeprint.
If you have a sensitive file that you want to codeprint, you can create a detector just for that one.
If you have a large codebase or assets you want to protect, try to identify the most unique-to-you files and create a codeprint for all of those.

Don'ts

Use a very small file (smaller than 2 KB), because it might not contain enough data to be unique.
Avoid using a public-domain, or a file that is not originally yours, such as a piece of open source code. It matches all the instances in the open source library, which can be used by a lot of codebases

Security

Codeprint is one-way. Code Security compresses and encrypts to avoid brute-googling of the simhashes.