Protecting Data by Fingerprint
Many Data Types identify data by classifying it according to keywords or file attributes such as document type, name, or size. Classifications and attributes are used to describe the data. The fingerprint Data Type Classification of data in a Check Point Security Policy for the Content Awareness Software Blade. does not rely on a description of the data. The fingerprint Data Type identifies the data according to a unique signature known as a fingerprint. A fingerprint accurately identifies confidential files or parts of confidential files.
Fingerprint Data Type can accurately identify files that the organization considers confidential. This Data Type accurately matches files or parts of it.
Generating the unique signature:
-
First you identify a repository. A repository is a network location that contains files that must not go outside of the organization. The Data Loss Prevention Software Blade
Specific security solution (module): (1) On a Security Gateway, each Software Blade inspects specific characteristics of the traffic (2) On a Management Server, each Software Blade enables different management capabilities. scans these data files and generates a unique signature for each file.
-
When a file passes through a DLP Gateway, the file is scanned and a signature generated.
-
When the file passes through the DLP Gateway, the DLP Gateway compares the signature of this file against the signatures of files in the repository. If there is a signature match, the DLP Gateway prevents the scanned file from distribution outside of the organization.
Repository Scanning
Files in the repository are constantly changing. New files are added, existing files modified or deleted. To keep file signatures up to date, the repository must be scanned on a regular basis. By default, the repository is automatically scanned every day. If a file is added or modified after a scan, the file's signature is not updated until the next scheduled scan.
Supported file shares for repositories:
-
CIFS
-
NFS
|
Note - Scans of a repository that has already been scanned takes less time. Unchanged files in a repository are skipped. |
Filtering the Repository for Efficiency
A large repository might also contain many files that are not confidential and do not need to be scanned.
The scan can be made more efficient by:
-
Accurately defining the location of data in the repository.
Select only those folders that are known to contain confidential files.
You may need help from the related department heads to do this.
For example not all the folders in the Finance department may contain confidential information.
These folders do not have to be included in the scan.
-
Only scanning files that match specific Data Types, for example spreadsheet files or credit card numbers.
If you add Credit Card Numbers as the Data Type in the filter, all the files in the repository that contain credit card numbers are scanned and fingerprinted.
If Spreadsheet file is selected as the Data Type in the filter, only spreadsheet files in the repository are scanned and fingerprinted.
Granularity
Complete files do not have to go outside of an organization for data to be lost.
Confidential data can be lost if sections from files in the repository are copied into other files, copied to email or posted to the web.
A file in the repository may be saved locally and then modified in a way that it no longer matches the unique fingerprint signature.
To identify such incidents, a partial match between files scanned by the DLP Gateway and files in the repository can be configured.
A partial match can be:
-
According to a percentage value
The number of text segments in the sent file is divided by the number of text segments in the repository file, and the result expressed as a percentage.
A match occurs if this percentage is higher than the percentage configured on the General Properties page of the Data Type.
-
A number of identical text segments
A match occurs when the number of identical text segments in a scanned file and a file in the repository is higher than the number configured on the General Properties page of the Data Type.
Scan Times
For large repositories, a scan can run all day.
To prevent this, you might want to limit the scan to a specified range of hours.
If a scan does not complete before the time range expires, the scan recommences where it stopped when the next scheduled scan occurs.
Generating Logs
Repository scans generate logs that can be viewed in the Logs & Monitor view. In the Logs & Monitor view, the Fingerprint query shows all logs generated by a scan.

-
The fingerprint Data Type is matched.
In the log:
-
The Matched File field shows which file in the repository matches the scanned data.
-
The Matched File Percentage field shows percentage of segments in the scanned data that match segments from the file in the repository. A 100% match means the scanned data and the file in the repository are identical.
-
The Matched File Text Segments shows how many segments of the scanned data were matched to segments in the repository file.
-
-
A Whitelist files scan has been started
-
A whitelist repository scan is running
-
A Whitelist files scan has ended successfully
-
A repository scan has been started
-
A repository scan is running
-
A repository scan ends successfully
|
Note - Running logs are generated every two hours. For a scan that lasts less than two hours, only the start and finish logs appear. |

-
In the Data Type Wizard, select Fingerprint.
-
Enter a name and informative comments for the Data Type.
This is the name that appears on the Data Loss Prevention > Repositories page.
-
Click Next.
-
In the Fingerprint window:
-
Click the Gateways arrow button to select Security Gateways with the Data Loss Prevention Software Blade enabled.
By default, the DLP Blades object appears.
This object represents all DLP Gateways.
Only Security Gateways selected here scan the repository and enforce the fingerprint data type.
-
Define a network path to the repository
-
If the repository defined in the network path requires a username and password to access it, enter the relevant authentication credentials.
-
-
Click Test Connectivity.
This tests that DLP Gateways defined in the list with Security Gateways (step 4a) can access the repository using the (optional) assigned authentication credentials.
-
Click the Match Similarity arrow.
This option matches similarity between the document in the repository and the document being examined by the DLP Gateway.
You can specify an exact match with a document in the repository, or a partial match based on one of these:
-
A percentage value
-
Number of matched text segments.
-
-
Click Next.
Select Configure additional Data Type Properties after clicking Finish if you want to configure more properties.
-
Click Finish.
The New data type wizard closes.
The data type appears in the list of data types and also on the Repositories page.
-
Install the Access Control policy.

In the Data Types window or Repositories window, double-click fingerprint object to open it for editing. These properties can be configured:

To use the fingerprint Data Type, you must:
-
Add the fingerprint Data Type to a DLP rule
-
Install the Access Control policy on the DLP Gateway.
After the fingerprint Data Type is included in a policy, a scheduled scan occurs.
After the scan successfully finishes, the fingerprint Data Type is enforced.
If you want to manually start a scan of the repository:
-
In the Repositories window, select the fingerprint Data Type.
-
In the summary pane for the Data Type, click Start.
-
NFS Repository scanning in NATed Environments
NAT (for example in a clustered environment where each member's connections are translated to the Virtual IP address of the cluster Two or more Security Gateways that work together in a redundant configuration - High Availability, or Load Sharing.), prevents repository scanning when the repository is located on an NFS server.
To enable repository scanning you must disable Hide NAT on all NFS services.
The members of a cluster must be configured to send NFS related traffic using the member's IP address in the Source field of the packet, and not the Virtual IP of the cluster.

-
On the Security Management Server
Dedicated Check Point server that runs Check Point software to manage the objects and policies in a Check Point environment within a single management Domain. Synonym: Single-Domain Security Management Server., edit the required
table.def
file (see the R81 Security Management Administration Guide). -
Search for the line:
no_hide_services_ports
These are the services and ports not included in Hide NAT.
-
Enter the required ports and protocols:
no_hide_services_ports = { <111, 17>, <111, 6>, <4046, 17>, <4046, 6> }
Notes:
-
If a list of services and ports already exists, add these numbers to the end of the list.
-
New settings in the
table.def
file apply globally to all Security Gateways and clusters of the applicable version.
-
-
Save the changes in the file and exit the editor.
-
Install the Access Control policy on the ClusterXL object.