Protecting Data by Fingerprint

Many Data Types identify data by classifying it according to keywords or file attributes such as document type, name, or size. Classifications and attributes are used to describe the data. The fingerprint Data TypeClosed Classification of data in a Check Point Security Policy for the Content Awareness Software Blade. does not rely on a description of the data. The fingerprint Data Type identifies the data according to a unique signature known as a fingerprint. A fingerprint accurately identifies confidential files or parts of confidential files.

Fingerprint Data Type can accurately identify files that the organization considers confidential. This Data Type accurately matches files or parts of it.

Generating the unique signature:

Repository Scanning

Files in the repository are constantly changing. New files are added, existing files modified or deleted. To keep file signatures up to date, the repository must be scanned on a regular basis. By default, the repository is automatically scanned every day. If a file is added or modified after a scan, the file's signature is not updated until the next scheduled scan.

Supported file shares for repositories:

  • CIFS

  • NFS

Note - Scans of a repository that has already been scanned takes less time. Unchanged files in a repository are skipped.

Filtering the Repository for Efficiency

A large repository might also contain many files that are not confidential and do not need to be scanned.

The scan can be made more efficient by:

  • Accurately defining the location of data in the repository.

    Select only those folders that are known to contain confidential files.

    You may need help from the related department heads to do this.

    For example not all the folders in the Finance department may contain confidential information.

    These folders do not have to be included in the scan.

  • Only scanning files that match specific Data Types, for example spreadsheet files or credit card numbers.

    If you add Credit Card Numbers as the Data Type in the filter, all the files in the repository that contain credit card numbers are scanned and fingerprinted.

    If Spreadsheet file is selected as the Data Type in the filter, only spreadsheet files in the repository are scanned and fingerprinted.

Granularity

Complete files do not have to go outside of an organization for data to be lost.

Confidential data can be lost if sections from files in the repository are copied into other files, copied to email or posted to the web.

A file in the repository may be saved locally and then modified in a way that it no longer matches the unique fingerprint signature.

To identify such incidents, a partial match between files scanned by the DLP Gateway and files in the repository can be configured.

A partial match can be:

  • According to a percentage value

    The number of text segments in the sent file is divided by the number of text segments in the repository file, and the result expressed as a percentage.

    A match occurs if this percentage is higher than the percentage configured on the General Properties page of the Data Type.

  • A number of identical text segments

    A match occurs when the number of identical text segments in a scanned file and a file in the repository is higher than the number configured on the General Properties page of the Data Type.

Scan Times

For large repositories, a scan can run all day.

To prevent this, you might want to limit the scan to a specified range of hours.

If a scan does not complete before the time range expires, the scan recommences where it stopped when the next scheduled scan occurs.

Generating Logs

Repository scans generate logs that can be viewed in the Logs & Monitor view. In the Logs & Monitor view, the Fingerprint query shows all logs generated by a scan.

Log Details

NFS Repository scanning in NATed Environments

NAT (for example in a clustered environment where each member's connections are translated to the Virtual IP address of the clusterClosed Two or more Security Gateways that work together in a redundant configuration - High Availability, or Load Sharing.), prevents repository scanning when the repository is located on an NFS server.

To enable repository scanning you must disable Hide NAT on all NFS services.

The members of a cluster must be configured to send NFS related traffic using the member's IP address in the Source field of the packet, and not the Virtual IP of the cluster.