Protecting Data by Fingerprint

Many Data Types identify data by classifying it according to keywords or file attributes such as document type, name, or size. Classifications and attributes are used to describe the data. The fingerprint Data Type Classification of data in a Check Point Security Policy for the Content Awareness Software Blade. does not rely on a description of the data. The fingerprint Data Type identifies the data according to a unique signature known as a fingerprint. A fingerprint accurately identifies confidential files or parts of confidential files.

Fingerprint Data Type can accurately identify files that the organization considers confidential. This Data Type accurately matches files or parts of it.

Generating the unique signature:

First you identify a repository. A repository is a network location that contains files that must not go outside of the organization. The Data Loss Prevention Software Blade Specific security solution (module): (1) On a Security Gateway, each Software Blade inspects specific characteristics of the traffic (2) On a Management Server, each Software Blade enables different management capabilities. scans these data files and generates a unique signature for each file.
When a file passes through a DLP Gateway, the file is scanned and a signature generated.
When the file passes through the DLP Gateway, the DLP Gateway compares the signature of this file against the signatures of files in the repository. If there is a signature match, the DLP Gateway prevents the scanned file from distribution outside of the organization.

Repository Scanning

Files in the repository are constantly changing. New files are added, existing files modified or deleted. To keep file signatures up to date, the repository must be scanned on a regular basis. By default, the repository is automatically scanned every day. If a file is added or modified after a scan, the file's signature is not updated until the next scheduled scan.

Supported file shares for repositories:

CIFS
NFS

Note - Scans of a repository that has already been scanned takes less time. Unchanged files in a repository are skipped.

Filtering the Repository for Efficiency

A large repository might also contain many files that are not confidential and do not need to be scanned.

The scan can be made more efficient by:

Accurately defining the location of data in the repository.

Select only those folders that are known to contain confidential files.

You may need help from the related department heads to do this.

For example not all the folders in the Finance department may contain confidential information.

These folders do not have to be included in the scan.
Only scanning files that match specific Data Types, for example spreadsheet files or credit card numbers.

If you add Credit Card Numbers as the Data Type in the filter, all the files in the repository that contain credit card numbers are scanned and fingerprinted.

If Spreadsheet file is selected as the Data Type in the filter, only spreadsheet files in the repository are scanned and fingerprinted.

Granularity

Complete files do not have to go outside of an organization for data to be lost.

Confidential data can be lost if sections from files in the repository are copied into other files, copied to email or posted to the web.

A file in the repository may be saved locally and then modified in a way that it no longer matches the unique fingerprint signature.

To identify such incidents, a partial match between files scanned by the DLP Gateway and files in the repository can be configured.

A partial match can be:

According to a percentage value

The number of text segments in the sent file is divided by the number of text segments in the repository file, and the result expressed as a percentage.

A match occurs if this percentage is higher than the percentage configured on the General Properties page of the Data Type.
A number of identical text segments

A match occurs when the number of identical text segments in a scanned file and a file in the repository is higher than the number configured on the General Properties page of the Data Type.

Scan Times

For large repositories, a scan can run all day.

To prevent this, you might want to limit the scan to a specified range of hours.

If a scan does not complete before the time range expires, the scan recommences where it stopped when the next scheduled scan occurs.

Generating Logs

Repository scans generate logs that can be viewed in the Logs & Monitor view. In the Logs & Monitor view, the Fingerprint query shows all logs generated by a scan.

Cases when logs are generated

The fingerprint Data Type is matched.

In the log:
- The Matched File field shows which file in the repository matches the scanned data.
- The Matched File Percentage field shows percentage of segments in the scanned data that match segments from the file in the repository. A 100% match means the scanned data and the file in the repository are identical.
- The Matched File Text Segments shows how many segments of the scanned data were matched to segments in the repository file.
A Whitelist files scan has been started
A whitelist repository scan is running
A Whitelist files scan has ended successfully
A repository scan has been started
A repository scan is running
A repository scan ends successfully

Note - Running logs are generated every two hours. For a scan that lasts less than two hours, only the start and finish logs appear.

Log Details

Fingerprint

Fingerprint scan details

Parameter	Description
Scan ID	A unique scan identification to distinguish between logs
Next Scheduled Scan Date	Time the scan started
Duration	How long the scan lasted
Scan Status	The status can be Running, Paused, Canceled, or Success
Number of errors	Number of errors encountered.

Parameter	Description
Repository root path	The upper level repository
Current directory	Current directory being scanned
Directories	The total number of directories in the repository selected in data locations.
Repository size (MB)	The size of the repository
Repository Files	The number of files in the repository
Directories scanned	The number of directories scanned so far
Scanned size (MB)	The number MBs scanned so far
Scanned files	The number of files scanned so far
Unreachable directories	Number of sub directories in the repositories that could not be opened during the scan.
Fingerprinted files	The number of files with a fingerprint signature
Filtered files	The number of files that were not scanned because they did not meet the criteria set on the Repository Scan Filter page. For example file size, modification date, or Data Type.
Scan speed (KBs)	The speed of the scan
Progress	Percentage of the repository so far scanned
Remaining time	Estimated time to scan completion

Creating a fingerprint Data Type

In the Data Type Wizard, select Fingerprint.
Enter a name and informative comments for the Data Type.

This is the name that appears on the Data Loss Prevention > Repositories page.
Click Next.
In the Fingerprint window:
1. Click the Gateways arrow button to select Security Gateways with the Data Loss Prevention Software Blade enabled.
  
  By default, the DLP Blades object appears.
  
  This object represents all DLP Gateways.
  
  Only Security Gateways selected here scan the repository and enforce the fingerprint data type.
2. Define a network path to the repository
3. If the repository defined in the network path requires a username and password to access it, enter the relevant authentication credentials.
Click Test Connectivity.

This tests that DLP Gateways defined in the list with Security Gateways (step 4a) can access the repository using the (optional) assigned authentication credentials.
Click the Match Similarity arrow.

This option matches similarity between the document in the repository and the document being examined by the DLP Gateway.

You can specify an exact match with a document in the repository, or a partial match based on one of these:
- A percentage value
- Number of matched text segments.
Click Next.

Select Configure additional Data Type Properties after clicking Finish if you want to configure more properties.
Click Finish.

The New data type wizard closes.

The data type appears in the list of data types and also on the Repositories page.
Install the Access Control policy.

Configuring more fingerprint properties

In the Data Types window or Repositories window, double-click fingerprint object to open it for editing. These properties can be configured:

Parameter

Description

General

Change the data entered in the Data Type wizard.

Data Owners

Add users or user groups that own the data.

Data owners can be notified when the fingerprint data type is matched by a rule Set of traffic parameters and other conditions in a Rule Base (Security Policy) that cause specified actions to be taken for a communication session. in the DLP policy.

Advanced Matching

Add CPcode scripts to apply more match criteria after the fingerprint data type is matched by a rule.

Scan Scheduling

Configure when the document repository is scanned to update the fingerprint data type.

The default time object (Every-Day) has no time restrictions configured.

This means that a scan runs without time restrictions after the fingerprint data type is added to a policy rule.

If the DLP Gateway's resources and network bandwidth are an issue, limit the scan to off-peak hours.

Repository Scan Filter

This page offers more scanning criteria:

Scan files matching the following data types

This property lets you scan documents in the repository according to more data types, for example credit card numbers.

If you add Credit Card Numbers as the data type, all the files in the repository that contain credit card numbers are fingerprinted.

If "spreadsheet files" are selected as the data type, only spreadsheet files in the repository are fingerprinted.

Scan files according to size

Only files of the specified maximum and minimum size are included in the fingerprint.

Scan files according to modification date

Only files that match the specified modification dates are included in the fingerprint.

Note - After a change to the filters (adding or removing a data type, selecting a different file size or modification date) the DLP Gateway regards all files in the repository as new. In a large repository, this results in a long scan. The fingerprint is only enforced after the end of this scan.

Data locations

Use the Data Locations tree to include or not include repository sub-folders.

If you want the fingerprint data type to prevent only one document type from leaving the organization, put that document in a folder that contains no other document.

Select only that folder as the data location.

NFS Repository scanning in NATed Environments

NAT (for example in a clustered environment where each member's connections are translated to the Virtual IP address of the cluster Two or more Security Gateways that work together in a redundant configuration - High Availability, or Load Sharing.), prevents repository scanning when the repository is located on an NFS server.

To enable repository scanning you must disable Hide NAT on all NFS services.

The members of a cluster must be configured to send NFS related traffic using the member's IP address in the Source field of the packet, and not the Virtual IP of the cluster.

Disabling Hide NAT on NFS services in a cluster

On the Security Management Server Dedicated Check Point server that runs Check Point software to manage the objects and policies in a Check Point environment within a single management Domain. Synonym: Single-Domain Security Management Server., edit the required table.def file (see the R81 Security Management Administration Guide).
Search for the line:

no_hide_services_ports

These are the services and ports not included in Hide NAT.

Enter the required ports and protocols:

no_hide_services_ports = { <111, 17>, <111, 6>, <4046, 17>, <4046, 6> }

Notes:

If a list of services and ports already exists, add these numbers to the end of the list.
New settings in the table.def file apply globally to all Security Gateways and clusters of the applicable version.

Save the changes in the file and exit the editor.
Install the Access Control policy on the ClusterXL object.