Defining Data Types

The optimal method for defining new data type representations is to use the Data Type Wizard.

To add a new data type:

In SmartConsole, select Security Policies > Shared Policies > DLP and click Open DLP Policy in SmartDashboard.
SmartDashboard opens and shows the DLP tab.
From the navigation tree, click Data Types.
Click New.
The Data Type Wizard opens.
Enter a name for the new data type.
Choose an option that defines the type of traffic that will be checked against a rule containing this data type.
Fill in the properties as required in the next step (each step is relevant to the option selected in the previous step).
Click Finish.
Click Save and then close SmartDashboard.
From SmartConsole, Install Policy.

Protecting Data By Keyword

You can create a list of keywords that will be matched against data transmissions. Transmissions that contain this list of words in their data are matched. You define whether it should match it on an ALL or ANY basis.

To create a data type representation of specified keywords:

In the Data Type Wizard, select Keywords.
Click Next.
The next step is the Specify Keywords window.
Enter a keyword to protect.
Click Add.
Enter as many keywords or phrases as you want in this data type.
Decide whether data should be matched if all the keywords in this list are matched, if only one match is necessary, or a specific number should be matched.
For example, if you want to ensure that no one can send an email that contains any of the names of congressmen in a committee, their names would be the keywords and you would set the Threshold to At least 1. (Note that the higher the threshold, the more precise the results will be.)

If you wanted to allow emails mentioning the congressmen, but decided that all of their names in one email would be suspicious, then set Threshold to All words must appear.
Click Next.
Click Finish; or if you want to add more parameters to the data type, select the checkbox and then click Finish.

Protecting Documents by Template

Confidential and sensitive documents are often based on templates. A template defines the headers, footers, seals, and formatting of related documents. This is what makes all court orders, for example, look the same.

You can create a Data Type that protects documents based on a specific template. You then add the Data Type to a rule and connections that contain such a document are matched by the policy.

Important - When a template including images is attached to a DLP Template Data Type, the image file format is important. The file format used in the template must match the file format in the user document. If the file formats are different, the rule will not trigger a DLP response.

For example, if the template contains a JPG image and the user document contains the image in GIF format, there is no DLP response.

templateExample

To create a Data Type representation of documents based on a template:

In the Data Type Wizard, select Documents based on corporate template.
Click Next.
Browse to the template file on your system.
This file does not have to be known as a template in the application: the template for the Data Type may be a *.doc file and does not have to be a *.dot file. Choose any file that is a basic example of documents that might be sent.
Move the Similarity slider to determine how closely a document must match the given template to be considered protected.
Best Practice - Set this slider quite low first. The higher it is, the less the rule will catch. After you complete the wizard, send a test email with such a document, and check the Logs & Monitor Logs view to see if the document was caught. Slowly increase the Similarity level until the rule catches the documents you want. This will be different for each template.
Click Next.

Click Finish.

To configure additional properties for the Data Type, select Configure additional Data Type properties clicking Finish.

Property	Description
Match empty templates	Select this option if you want DLP to match the Data Type on an empty template. An empty template is a template that is identical to the uploaded corporate template. If the option is not selected, an empty template is detected but the Data Type is not matched. The template is not considered confidential until it contains inserted private data. Note: the rule is bypassed for this document, but the document may still be matched by another DLP rule in the policy.
Consider template's images	Incorporates a template's graphic images into the matching process. Including template images increases the similarity score calculated between the template and the examined document. The higher the score, the more accurate the match. Select this option if the graphic images used in a template document suggest that the document is confidential.

Property

Description

Match empty templates

Select this option if you want DLP to match the Data Type on an empty template. An empty template is a template that is identical to the uploaded corporate template.
If the option is not selected, an empty template is detected but the Data Type is not matched. The template is not considered confidential until it contains inserted private data.
Note: the rule is bypassed for this document, but the document may still be matched by another DLP rule in the policy.

Consider template's images

Incorporates a template's graphic images into the matching process. Including template images increases the similarity score calculated between the template and the examined document. The higher the score, the more accurate the match.
Select this option if the graphic images used in a template document suggest that the document is confidential.

Alternative to slider testing:

If you want to catch documents that match on different levels with different actions, you may try this procedure:

Create the Data Type for the template, setting the slider to 10%.
In the Policy window, create a Detect rule that tracks matching documents but does not stop them.
Create another Data Type, just like the first, but set the slider to 50%.
Create an Ask User rule that tracks matching documents and holds the transmission until the user decides whether it should be sent or is too sensitive and should be deleted.
Create a third Data Type, with the slider set to 90%.
Create a Prevent rule that tracks matching documents and blocks the transmission.

Protecting Files by Attributes

Create a data type that protects files based on file type, file name, and file size. Transmissions that contain a file that matches the parameters are matched.

To create a data type representation of files:

In the Data Type Wizard, select Files.
Click Next.

Select the appropriate parameters:

Note - A file must match all the parameters that you define here, for it to be matched to the rule. Thus, the more parameters you can set here with assurance, the more accurate the results will be.

The file type is any of these types - Click the add button to select from the Add File Types window.
The file name contains - Enter a string or regular expression to match against file names.
The file size is larger than - Enter the threshold size in KB.

Click Next.
Click Finish, or if you want to add more parameters to the data type, select the checkbox and then click Finish.

Protecting Data by Pattern

You can create a regular expression that will be matched against content in data transmissions. Transmissions that contain strings that match the pattern in their data are matched.

Note - Use the Check Point supported regular expression syntax.

To create a data type representation of a pattern:

In the Data Type Wizard, select Pattern (regular expressions).
Click Next.
Enter a pattern to match against content.
Click Add.
Enter as many regular expressions as you want in this data type.
Decide whether data should match the data type if the pattern is matched even once, or if it should be allowed until a given number of times.
For example, if you want to ensure that no one can send an email that contains a complete price-list of five products, you would set the pattern to "^[0-9]+(\.[0-9]{2})?$" and you would set the Number of occurrences to 5.
Click Next.
Click Finish; or if you want to add more parameters to the data type, select the checkbox and then click Finish.

Defining Compound Data Types

You can create a complex data type representation. A compound data type includes multiple Data Types, which are matched either on AND (a number of Data Types are matched), or NOT (necessary Data Types are not present), or both.

For example, you can look for files or emails that contain patient records. You could create a data type that combines documents that match a patient record template, with a dictionary data type that contains a group of patient names who have not signed release forms. Now you have a single data type that will match emails or FTP that contain patient records of patients who have not signed a release form.

To create a compound data type representation:

In the Data Type Wizard, select Compound.
Click Next.
In the first section, click Add and select Data Types to match on AND.
In the second section, click Add and select Data Types to match on NOT.
If a transmission is sent that matches all the Data Types of the first section and none of the Data Types in the second section, the data of the transmission is matched to the compound Data Types.
Click Next.
Click Finish; or if you want to add more parameters to the Data Type, select the checkbox and then click Finish.

Protecting Data by Fingerprint

Many Data Types identify data by classifying it according to keywords or file attributes such as document type, name, or size. Classifications and attributes are used to describe the data. The fingerprint Data Type does not rely on a description of the data. The fingerprint Data Type identifies the data according to a unique signature known as a fingerprint. A fingerprint accurately identifies confidential files or parts of confidential files.

Fingerprint Data Type can accurately identify files that the organization considers confidential. This Data Type will accurately match files or parts of it.

Generating the unique signature

First you identify a repository. A repository is a network location that contains files that must not go outside of the organization. The DLP blade scans these data files and generates a unique signature for each file.
When a file passes through a DLP gateway, the file is scanned and a signature generated.
The signature of the file passing through the DLP gateway is compared against the signatures of files in the repository. If there is a signature match, the file scanned by the gateway is prevented from going outside of the organization.

Repository Scanning

Files in the repository are constantly changing. New files are added, existing files modified or deleted. To keep file signatures up to date, the repository must be scanned on a regular basis. By default, the repository is automatically scanned every day. If a file is added or modified after a scan, the file's signature will not be updated until the next scheduled scan occurs.

Supported file shares for repositories:

CIFS
NFS

Note - Scans of a repository that has already been scanned takes less time. Unchanged files in a repository are skipped.

Filtering for Efficiency

A large repository might also contain many files that are not confidential and do not need to be scanned. The scan can be made more efficient by:

Accurately defining the location of data in the repository
Select only those folders that are known to contain confidential files. You may need help from the related department heads to do this. For example not all the folders in the Finance department may contain confidential information. These folders do not have to be included in the scan.
Only scanning files that match specific Data Types, for example spreadsheet files or credit card numbers.
If you add Credit Card Numbers as the Data Type in the filter, all the files in the repository that contain credit card numbers are scanned and fingerprinted. If Spreadsheet file is selected as the Data Type in the filter, only spreadsheet files in the repository will be scanned and fingerprinted.

Granularity

Complete files do not have to go outside of an organization for data to be lost. Confidential data can be lost if sections from files in the repository are copied into other files, copied to email or posted to the web. A file in the repository may be saved locally and then modified in a way that it no longer matches the unique fingerprint signature. To identify such incidents, a partial match between files scanned by the DLP gateway and files in the repository can be configured. A partial match can be:

According to a percentage value
The number of text segments in the sent file is divided by the number of text segments in the repository file, and the result expressed as a percentage. A match occurs if this percentage is higher than the percentage configured on the General Properties page of the Data Type.
A number of identical text segments
A match occurs when the number of identical text segments in a scanned file and a file in the repository is higher than the number configured on the General Properties page of the Data Type.

Scan Times

Large repositories might cause a scan to run all day. To prevent this, you might want to limit the scan to a specified range of hours. If a scan does not complete before the time range expires, the scan will recommence where it stopped when the next scheduled scan occurs.

Logging

Repository scans generate logs that can be viewed in the Logs & Monitor view. In the Logs & Monitor view, the Fingerprint query shows all logs generated by a scan.

Logs are generated when:

The fingerprint Data Type is matched.
In the log:
- The Matched File field shows which file in the repository matches the scanned data.
- The Matched File Percentage field shows percentage of segments in the scanned data that match segments from the file in the repository. A 100% match means the scanned data and the file in the repository are identical.
- The Matched File Text Segments shows how many segments of the scanned data were matched to segments in the repository file.
A Whitelist files scan has been started
A whitelist repository scan is running
A Whitelist files scan has ended successfully
A repository scan has been started
A repository scan is running
A repository scan ends successfully

Note - Running logs are generated every two hours. For a scan that lasts less than two hours, you will only see the start and finish logs.

Log Details

Fingerprint

Scan ID	A unique scan identification to distinguish between logs
Next Scheduled Scan Date	Time the scan started
Duration	How long the scan lasted
Scan Status	The status can be Running, Paused, Canceled, or Success
Number of errors	Number of errors encountered.

Fingerprint scan details

Repository root path	The upper level repository
Current directory	Current directory being scanned
Directories	The total number of directories in the repository selected in data locations.
Repository size (MB)	The size of the repository
Repository Files	The number of files in the repository
Directories scanned	The number of directories scanned so far
Scanned size (MB)	The number MBs scanned so far
Scanned files	The number of files scanned so far
Unreachable directories	Number of sub directories in the repositories that could not be opened during the scan.
Fingerprinted files	The number of files with a fingerprint signature
Filtered files	The number of files that were not scanned because they did not meet the criteria set on the Repository Scan Filter page. For example file size, modification date, or Data Type.
Scan speed (KBs)	The speed of the scan
Progress	Percentage of the repository so far scanned
Remaining time	Estimated time to scan completion

To create a fingerprint Data Type:

In the Data Type Wizard, select Fingerprint.
Enter a name and informative comments for the Data Type.
This is the name that will show on the Data Loss Prevention > Repositories page.
Click Next.
In the Fingerprint window:
1. Click the Gateways arrow button to select gateways with the DLP blade enabled.
  By default, The DLP Blades object shows. This object represents all gateways that have the DLP blade enabled. Only gateways selected here scan the repository and enforce the fingerprint data type.
2. Define a network path to the repository
3. If the repository defined in the network path requires a username and password to access it, enter the relevant authentication credentials.
Click Test Connectivity.
This tests that DLP gateways defined in the gateways list (step 4a) can access the repository using the (optional) assigned authentication credentials.
Click the Match Similarity arrow.
This option matches similarity between the document in the repository and the document being examined by the DLP gateway. You can specify an exact match with a document in the repository, or a partial match based on:
- A percentage value or
- Number of matched text segments.
Click Next.
Select Configure additional Data Type Properties after clicking Finish if you want to configure more properties.
Click Finish.
The New data type wizard closes. The data type shows in the list of data types and also on the Repositories page.

To configure more fingerprint properties:

In the Data Types window or Repositories window, double-click fingerprint object to open it for editing. These properties can be configured:

General
Change the data entered in the Data Type wizard.
Data Owners
Add users or user groups that own the data. Data owners can be notified when the fingerprint data type is matched by a rule in the DLP policy.
Advanced Matching
Add CPcode scripts to apply more match criteria after the fingerprint data type is matched by a rule.
Scan Scheduling
Configure when the document repository is scanned to update the fingerprint data type. The default time object (Every-Day) has no time restrictions configured. This means that a scan runs without time restrictions after the fingerprint data type is added to a policy rule. If gateway resources and network bandwidth are an issue, limit the scan to off-peak hours.

Repository Scan Filter

This page offers more scanning criteria:

Scan files matching the following data types
This property lets you scan documents in the repository according to more data types, for example credit card numbers. If you add credit card numbers as the data type, all the files in the repository that contain credit card numbers are fingerprinted. If "spreadsheet files" are selected as the data type, only spreadsheet files in the repository are fingerprinted.
Scan files according to size
Only files of the specified maximum and minimum size are included in the fingerprint.

Scan files according to modification date

Only files that match the specified modification dates are included in the fingerprint.

Note - After a change to the filters (adding or removing a data type, selecting a different file size or modification date) the DLP gateway regards all files in the repository as new. In a large repository, this will result in a long scan. The fingerprint will only be enforced after this scan has ended.

Data locations
Use the Data Locations tree to include or not include repository sub-folders. If you want the fingerprint data type to prevent only one document type from leaving the organization, put that document in a folder that contains no other document. Select only that folder as the data location.

Using the Fingerprint Data Type

To use the fingerprint Data Type, you must:

Add the fingerprint Data Type to a DLP rule
Install a policy on the DLP enabled gateway
After the fingerprint Data Type is included in a policy, a scheduled scan occurs. After the scan successfully finishes, the fingerprint Data Type is enforced.

If you want to manually start a scan of the repository:
1. On the Repositories window, select the fingerprint Data Type.
2. In the summary pane for the Data Type, click Start.

NFS Repository scanning in NATed Environments

NATing, for example in a clustered environment where each member's connections are translated to the Virtual IP address of the cluster, prevents repository scanning when the repository is located on an NFS server. To enable repository scanning you must disable Hide NAT on all NFS services. The members of a cluster must be configured to send NFS related traffic using the member's IP address in the Source field of the packet, and not the Virtual IP of the cluster.

To disable Hide NAT on NFS services:

On the Security Management Server, open $FWDIR/lib/table.def for editing.
Search for the line: no_hide_services_ports.
These are the services and ports not included in Hide NAT.
Enter:
no_hide_services_ports = { <111, 17>, <111, 6>, <4046, 17>, <4046, 6> }

If a list of services and ports already exists, add these numbers to the end of the list.
Save and close the file.
Install the policy onto the ClusterXL object.
Note:
- New settings in table.def globally to all gateways.
- For more, see sk31832.

Advanced Data Types

The Data Type Wizard has four advanced Data Types:

Weight Keywords
Words from a dictionary
Custom CP code match
Message attributes

Protecting Data by Weighted Keyword

If you begin by creating a Data Type for keyword or pattern, and realize that it is not ALL or ANY, but that one word is a sign of protected data in itself, and other word would be a suspicious sign only if it appeared numerous times, you can define this complex data representation as a Weighted Keyword rather than a simple keyword or pattern.

Transmissions that contain this list of words, in the weight-sum that you define, in their data are handled according to the action of the rules that use this Data Type.

To create a Data Type representation of weighted keywords:

In the Data Type Wizard, select Advanced and from the drop-down list, select Weighted Keywords.
Click Next.
Click the arrow of the Add button and select either Word or Phrase or Regular Expression.
(If you click the Add button instead of its sub-menu, the item will be a keyword, not a pattern.)

The Edit Word window opens, for both types of item.
Enter the keyword, phrase, or regular expression.
In the Weight area, set whether each occurrence of matching data content should be counted as 1 (default) or more, and if there is a ceiling to the weight.
- Each appearance of this word contributes the following weight - set to 1 for lowest weight, 2 for double-weight (one instance of this string will be counted as though two), and so on.
- The weight of this word is limited to - set to 0 for no limit, or set to a number higher than the weight in the previous value to set a maximum count (a ceiling) for this one word.
Click OK.
In the Specify Weighted Keywords step, set the Threshold. If data content matches any of the words in this Data Type, with a total weight surpassing this value, the data is matched to the Data Loss Prevention rule.
Click Next.
Click Finish; or if you want to add more parameters to the Data Type, select the checkbox and then click Finish.

Providing Keywords by Dictionary

If you pre-planned the keywords that should flag data as protected, you do not need to enter them one by one in a keyword data representation. Instead, you can upload the list as a dictionary. You decide how many of the items in the list have to be matched to have the data match the rule.

Best Practice - Dictionary files should be one word or phrase per line. If the file contains non-English words, it is recommended that it be a Word document (*.doc). Dictionaries that are simple text files must be in UTF-8 format.

To create a Data Type representation of dictionary:

In the Data Type Wizard, select Advanced and from the drop-down list, select words from a Dictionary.
Click Next.
Browse to the file containing the list of terms.
In the Threshold area, set the number of terms in this list that must be in the content to have the data matched to the rule.
Best Practice - Set this to the highest reasonable value first, and then lower it after you audit the Logs & Monitor logs.

For example, if the dictionary is a list of employee names, you should not set the threshold to 1, which would catch every email that has a signature. You could set an Employee Name Dictionary Data Type to a threshold of half the number of users and its rule to Detect. If no data is caught by the rule after about a week, lower the threshold and check again. When the rule begins to detect this information being sent out, set it to Ask User, so that users have to explain why they are sending this information outside before it will be sent. With this information on hand, you can create a usable, reasonable and accurate enforcement of corporate policy.
Click Next.
Click Finish; or if you want to add more parameters to the Data Type, select the checkbox and then click Finish.

Protecting Data by CPcode

CPcode is a scripting language, similar to C or Perl, specifically for Intrusion Prevention Systems. If you are familiar with this language, you can create your own complex rules. Use CPcode data types to create dynamic definitions of data to protect, or to create data type representations with custom parameters.

For example, you can create a CPcode that checks for a date that is before a public release, allowing you to create rules that stop price list releases before that date, but pass them afterwards. Other common uses of CPcode include relations between rule parameters, such as recipients (match rule to email if sent to too many domains) and protocols (match rule to HTTP if it looks like a web mail).

Note - See the R77 versions CPcode DLP Reference Guide.
If you write a CPcode function yourself, you should test it first before putting it in production.

To create a Data Type representation of CPcode:

In the Data Type Wizard, select Advanced and from the drop-down list, select a Custom CPcode.
Click Next.
Browse to the CPcode script file.
Click Next.
Click Finish; or if you want to add more parameters to the Data Type, select the checkbox and then click Finish.

Example of CPcode function:

func rule_1 {

foreach $recipient inside global:DESTS {

foreach $comp inside CPMPETITORS_DOMAIN {

if( casesuffix( $recipient , $comp ) ) {

set_message_to_user(cat("The mail is sent to " ,

$recipient ,

"which is a competitor's mail address."));

set_track(TRACK_LOG);

return quarantine();

}

Defining the Message Attribute Data Type

In DLP, a message can be sent using the SMTP, HTTP, or FTP protocols.

Message attributes refer to 3 properties of the message:

The total message size in KB
Number of attachments
Total number of words in the message

To create the message attribute Data Type:

Start the Data Type Wizard
Select Advanced and from the drop-down list select Message Attributes.
The Specify Message Attributes window opens.

Configure these message attributes:

Size

The size attribute can have a:

Minimum value	Maximum value	Meaning
Yes	Yes	Messages that fall within the specified range match the message attribute.
Yes	No	A message whose size is greater than the minimum value specified here matches the attribute.
No	Yes	A message whose size is smaller than the maximum value specified here matches the attribute.

Attachments

Define the number of attachments a message can have.

Minimum value	Maximum value	Meaning
Yes	Yes	A Message whose number of attachments falls within the specified range matches the message attribute.
Yes	No	A message with more than the minimum number of attachments specified here matches the attribute.
No	Yes	A message with less attachments that those shown by the maximum value specified here matches the attribute.

Number of words

Scan for a significant amount of text. If an email has a large binary file attached such as a graphic, and the email contains the words "your picture" the email might match the Size attribute but contain no text worth scanning. You will want the email to match a DLP rule only if the email contains enough text that could conceivably result in data loss.

Minimum value	Maximum value	Meaning
Yes	Yes	Messages whose word count falls within the specified range matches the message attribute.
Yes	No	A message whose word count is greater than the minimum value specified here matches the attribute.
No	Yes	A message whose word count is lower than the maximum value specified here matches the attribute.

Click Next.

Click Finish.

If you want to add more parameters to the Data Type, select the Configure additional Data Type properties after clicking finish and then click Finish.

Note - For a message to match the Data Type attribute, it must match the criteria for size and the number of attachments and the number of words. If the message fails to match one of the criteria, it will fail to match the attribute.

Enhancing Accuracy through Statistical Analysis

A number of Data Types, such as credit card numbers, have an option called Enhance accuracy through statistical analysis on their General Properties page.

Credit cards like Visa and Mastercard have sixteen digit numbers arranged in four groups of four. While scanning for this Data Type, all sixteen digit numbers in the data that match the Luhn algorithm will be identified as credit card numbers. The sixteen digits might not represent a credit card number. The sixteen digits might represent spare part numbers, an ordering or sales code.

The Enhance accuracy option applies statistical analysis to increase the accuracy of identifying specified Data Types, for example credit card numbers.

To enhance accuracy through statistical analysis:

In Data Loss Prevention > Data Types select a Data Type that represents numerical data.
Open the Data Type for editing.
On the General Properties page, select Enhance accuracy through statistical analysis.
Click OK.

Note - Enabling statistical analysis does not impact gateway performance.