The optimal method for defining new data type representations is to use the Data Type Wizard.
To add a new data type:
SmartDashboard opens and shows the DLP tab.
The Data Type Wizard opens.
You can create a list of keywords that will be matched against data transmissions. Transmissions that contain this list of words in their data are matched. You define whether it should match it on an ALL or ANY basis.
To create a data type representation of specified keywords:
The next step is the Specify Keywords window.
For example, if you want to ensure that no one can send an email that contains any of the names of congressmen in a committee, their names would be the keywords and you would set the Threshold to At least 1. (Note that the higher the threshold, the more precise the results will be.)
If you wanted to allow emails mentioning the congressmen, but decided that all of their names in one email would be suspicious, then set Threshold to All words must appear.
Confidential and sensitive documents are often based on templates. A template defines the headers, footers, seals, and formatting of related documents. This is what makes all court orders, for example, look the same.
You can create a Data Type that protects documents based on a specific template. You then add the Data Type to a rule and connections that contain such a document are matched by the policy.
Important - When a template including images is attached to a DLP Template Data Type, the image file format is important. The file format used in the template must match the file format in the user document. If the file formats are different, the rule will not trigger a DLP response.
For example, if the template contains a JPG image and the user document contains the image in GIF format, there is no DLP response. |
To create a Data Type representation of documents based on a template:
This file does not have to be known as a template in the application: the template for the Data Type may be a *.doc file and does not have to be a *.dot file. Choose any file that is a basic example of documents that might be sent.
Best Practice - Set this slider quite low first. The higher it is, the less the rule will catch. After you complete the wizard, send a test email with such a document, and check the Logs & Monitor Logs view to see if the document was caught. Slowly increase the Similarity level until the rule catches the documents you want. This will be different for each template.
To configure additional properties for the Data Type, select Configure additional Data Type properties clicking Finish.
Property |
Description |
---|---|
Match empty templates |
|
Consider template's images |
|
Alternative to slider testing:
If you want to catch documents that match on different levels with different actions, you may try this procedure:
Create a data type that protects files based on file type, file name, and file size. Transmissions that contain a file that matches the parameters are matched.
To create a data type representation of files:
Note - A file must match all the parameters that you define here, for it to be matched to the rule. Thus, the more parameters you can set here with assurance, the more accurate the results will be. |
You can create a regular expression that will be matched against content in data transmissions. Transmissions that contain strings that match the pattern in their data are matched.
Note - Use the Check Point supported regular expression syntax. |
To create a data type representation of a pattern:
For example, if you want to ensure that no one can send an email that contains a complete price-list of five products, you would set the pattern to "^[0-9]+(\.[0-9]{2})?$" and you would set the Number of occurrences to 5.
You can create a complex data type representation. A compound data type includes multiple Data Types, which are matched either on AND (a number of Data Types are matched), or NOT (necessary Data Types are not present), or both.
For example, you can look for files or emails that contain patient records. You could create a data type that combines documents that match a patient record template, with a dictionary data type that contains a group of patient names who have not signed release forms. Now you have a single data type that will match emails or FTP that contain patient records of patients who have not signed a release form.
To create a compound data type representation:
If a transmission is sent that matches all the Data Types of the first section and none of the Data Types in the second section, the data of the transmission is matched to the compound Data Types.
Many Data Types identify data by classifying it according to keywords or file attributes such as document type, name, or size. Classifications and attributes are used to describe the data. The fingerprint Data Type does not rely on a description of the data. The fingerprint Data Type identifies the data according to a unique signature known as a fingerprint. A fingerprint accurately identifies confidential files or parts of confidential files.
Fingerprint Data Type can accurately identify files that the organization considers confidential. This Data Type will accurately match files or parts of it.
Generating the unique signature
Repository Scanning
Files in the repository are constantly changing. New files are added, existing files modified or deleted. To keep file signatures up to date, the repository must be scanned on a regular basis. By default, the repository is automatically scanned every day. If a file is added or modified after a scan, the file's signature will not be updated until the next scheduled scan occurs.
Supported file shares for repositories:
Note - Scans of a repository that has already been scanned takes less time. Unchanged files in a repository are skipped. |
Filtering for Efficiency
A large repository might also contain many files that are not confidential and do not need to be scanned. The scan can be made more efficient by:
Select only those folders that are known to contain confidential files. You may need help from the related department heads to do this. For example not all the folders in the Finance department may contain confidential information. These folders do not have to be included in the scan.
If you add Credit Card Numbers as the Data Type in the filter, all the files in the repository that contain credit card numbers are scanned and fingerprinted. If Spreadsheet file is selected as the Data Type in the filter, only spreadsheet files in the repository will be scanned and fingerprinted.
Granularity
Complete files do not have to go outside of an organization for data to be lost. Confidential data can be lost if sections from files in the repository are copied into other files, copied to email or posted to the web. A file in the repository may be saved locally and then modified in a way that it no longer matches the unique fingerprint signature. To identify such incidents, a partial match between files scanned by the DLP gateway and files in the repository can be configured. A partial match can be:
The number of text segments in the sent file is divided by the number of text segments in the repository file, and the result expressed as a percentage. A match occurs if this percentage is higher than the percentage configured on the General Properties page of the Data Type.
A match occurs when the number of identical text segments in a scanned file and a file in the repository is higher than the number configured on the General Properties page of the Data Type.
Scan Times
Large repositories might cause a scan to run all day. To prevent this, you might want to limit the scan to a specified range of hours. If a scan does not complete before the time range expires, the scan will recommence where it stopped when the next scheduled scan occurs.
Logging
Repository scans generate logs that can be viewed in the Logs & Monitor view. In the Logs & Monitor view, the Fingerprint query shows all logs generated by a scan.
Logs are generated when:
In the log:
Note - Running logs are generated every two hours. For a scan that lasts less than two hours, you will only see the start and finish logs. |
Log Details
Scan ID |
A unique scan identification to distinguish between logs |
Next Scheduled Scan Date |
Time the scan started |
Duration |
How long the scan lasted |
Scan Status |
The status can be Running, Paused, Canceled, or Success |
Number of errors |
Number of errors encountered. |
Repository root path |
The upper level repository |
Current directory |
Current directory being scanned |
Directories |
The total number of directories in the repository selected in data locations. |
Repository size (MB) |
The size of the repository |
Repository Files |
The number of files in the repository |
Directories scanned |
The number of directories scanned so far |
Scanned size (MB) |
The number MBs scanned so far |
Scanned files |
The number of files scanned so far |
Unreachable directories |
Number of sub directories in the repositories that could not be opened during the scan. |
Fingerprinted files |
The number of files with a fingerprint signature |
Filtered files |
The number of files that were not scanned because they did not meet the criteria set on the Repository Scan Filter page. For example file size, modification date, or Data Type. |
Scan speed (KBs) |
The speed of the scan |
Progress |
Percentage of the repository so far scanned |
Remaining time |
Estimated time to scan completion |
To create a fingerprint Data Type:
This is the name that will show on the Data Loss Prevention > Repositories page.
By default, The DLP Blades object shows. This object represents all gateways that have the DLP blade enabled. Only gateways selected here scan the repository and enforce the fingerprint data type.
This tests that DLP gateways defined in the gateways list (step 4a) can access the repository using the (optional) assigned authentication credentials.
This option matches similarity between the document in the repository and the document being examined by the DLP gateway. You can specify an exact match with a document in the repository, or a partial match based on:
Select Configure additional Data Type Properties after clicking Finish if you want to configure more properties.
The New data type wizard closes. The data type shows in the list of data types and also on the Repositories page.
To configure more fingerprint properties:
In the Data Types window or Repositories window, double-click fingerprint object to open it for editing. These properties can be configured:
Change the data entered in the Data Type wizard.
Add users or user groups that own the data. Data owners can be notified when the fingerprint data type is matched by a rule in the DLP policy.
Add CPcode scripts to apply more match criteria after the fingerprint data type is matched by a rule.
Configure when the document repository is scanned to update the fingerprint data type. The default time object (Every-Day) has no time restrictions configured. This means that a scan runs without time restrictions after the fingerprint data type is added to a policy rule. If gateway resources and network bandwidth are an issue, limit the scan to off-peak hours.
This page offers more scanning criteria:
This property lets you scan documents in the repository according to more data types, for example credit card numbers. If you add credit card numbers as the data type, all the files in the repository that contain credit card numbers are fingerprinted. If "spreadsheet files" are selected as the data type, only spreadsheet files in the repository are fingerprinted.
Only files of the specified maximum and minimum size are included in the fingerprint.
Only files that match the specified modification dates are included in the fingerprint.
Note - After a change to the filters (adding or removing a data type, selecting a different file size or modification date) the DLP gateway regards all files in the repository as new. In a large repository, this will result in a long scan. The fingerprint will only be enforced after this scan has ended. |
Use the Data Locations tree to include or not include repository sub-folders. If you want the fingerprint data type to prevent only one document type from leaving the organization, put that document in a folder that contains no other document. Select only that folder as the data location.
Using the Fingerprint Data Type
To use the fingerprint Data Type, you must:
After the fingerprint Data Type is included in a policy, a scheduled scan occurs. After the scan successfully finishes, the fingerprint Data Type is enforced.
If you want to manually start a scan of the repository:
NFS Repository scanning in NATed Environments
NATing, for example in a clustered environment where each member's connections are translated to the Virtual IP address of the cluster, prevents repository scanning when the repository is located on an NFS server. To enable repository scanning you must disable Hide NAT on all NFS services. The members of a cluster must be configured to send NFS related traffic using the member's IP address in the Source field of the packet, and not the Virtual IP of the cluster.
To disable Hide NAT on NFS services:
$FWDIR/lib/table.def
for editing.no_hide_services_ports
. These are the services and ports not included in Hide NAT.
no_hide_services_ports = { <111, 17>, <111, 6>, <4046, 17>, <4046, 6> }
If a list of services and ports already exists, add these numbers to the end of the list.
Note:
table.def
globally to all gateways. The Data Type Wizard has four advanced Data Types:
If you begin by creating a Data Type for keyword or pattern, and realize that it is not ALL or ANY, but that one word is a sign of protected data in itself, and other word would be a suspicious sign only if it appeared numerous times, you can define this complex data representation as a Weighted Keyword rather than a simple keyword or pattern.
Transmissions that contain this list of words, in the weight-sum that you define, in their data are handled according to the action of the rules that use this Data Type.
To create a Data Type representation of weighted keywords:
(If you click the Add button instead of its sub-menu, the item will be a keyword, not a pattern.)
The Edit Word window opens, for both types of item.
If you pre-planned the keywords that should flag data as protected, you do not need to enter them one by one in a keyword data representation. Instead, you can upload the list as a dictionary. You decide how many of the items in the list have to be matched to have the data match the rule.
Best Practice - Dictionary files should be one word or phrase per line. If the file contains non-English words, it is recommended that it be a Word document (*.doc). Dictionaries that are simple text files must be in UTF-8 format.
To create a Data Type representation of dictionary:
Best Practice - Set this to the highest reasonable value first, and then lower it after you audit the Logs & Monitor logs.
For example, if the dictionary is a list of employee names, you should not set the threshold to 1, which would catch every email that has a signature. You could set an Employee Name Dictionary Data Type to a threshold of half the number of users and its rule to Detect. If no data is caught by the rule after about a week, lower the threshold and check again. When the rule begins to detect this information being sent out, set it to Ask User, so that users have to explain why they are sending this information outside before it will be sent. With this information on hand, you can create a usable, reasonable and accurate enforcement of corporate policy.
CPcode is a scripting language, similar to C or Perl, specifically for Intrusion Prevention Systems. If you are familiar with this language, you can create your own complex rules. Use CPcode data types to create dynamic definitions of data to protect, or to create data type representations with custom parameters.
For example, you can create a CPcode that checks for a date that is before a public release, allowing you to create rules that stop price list releases before that date, but pass them afterwards. Other common uses of CPcode include relations between rule parameters, such as recipients (match rule to email if sent to too many domains) and protocols (match rule to HTTP if it looks like a web mail).
Note - See the R77 versions CPcode DLP Reference Guide. |
To create a Data Type representation of CPcode:
Example of CPcode function:
func rule_1 { foreach $recipient inside global:DESTS { foreach $comp inside CPMPETITORS_DOMAIN { if( casesuffix( $recipient , $comp ) ) { set_message_to_user(cat("The mail is sent to " , $recipient , "which is a competitor's mail address.")); set_track(TRACK_LOG); return quarantine(); } } } } |
In DLP, a message can be sent using the SMTP, HTTP, or FTP protocols.
Message attributes refer to 3 properties of the message:
To create the message attribute Data Type:
The Specify Message Attributes window opens.
The size attribute can have a:
Minimum value |
Maximum value |
Meaning |
---|---|---|
Yes |
Yes |
Messages that fall within the specified range match the message attribute. |
Yes |
No |
A message whose size is greater than the minimum value specified here matches the attribute. |
No |
Yes |
A message whose size is smaller than the maximum value specified here matches the attribute. |
Define the number of attachments a message can have.
Minimum value |
Maximum value |
Meaning |
---|---|---|
Yes |
Yes |
A Message whose number of attachments falls within the specified range matches the message attribute. |
Yes |
No |
A message with more than the minimum number of attachments specified here matches the attribute. |
No |
Yes |
A message with less attachments that those shown by the maximum value specified here matches the attribute. |
Scan for a significant amount of text. If an email has a large binary file attached such as a graphic, and the email contains the words "your picture" the email might match the Size attribute but contain no text worth scanning. You will want the email to match a DLP rule only if the email contains enough text that could conceivably result in data loss.
Minimum value |
Maximum value |
Meaning |
---|---|---|
Yes |
Yes |
Messages whose word count falls within the specified range matches the message attribute. |
Yes |
No |
A message whose word count is greater than the minimum value specified here matches the attribute. |
No |
Yes |
A message whose word count is lower than the maximum value specified here matches the attribute. |
If you want to add more parameters to the Data Type, select the Configure additional Data Type properties after clicking finish and then click Finish.
Note - For a message to match the Data Type attribute, it must match the criteria for size and the number of attachments and the number of words. If the message fails to match one of the criteria, it will fail to match the attribute. |
A number of Data Types, such as credit card numbers, have an option called Enhance accuracy through statistical analysis on their General Properties page.
Credit cards like Visa and Mastercard have sixteen digit numbers arranged in four groups of four. While scanning for this Data Type, all sixteen digit numbers in the data that match the Luhn algorithm will be identified as credit card numbers. The sixteen digits might not represent a credit card number. The sixteen digits might represent spare part numbers, an ordering or sales code.
The Enhance accuracy option applies statistical analysis to increase the accuracy of identifying specified Data Types, for example credit card numbers.
To enhance accuracy through statistical analysis:
Note - Enabling statistical analysis does not impact gateway performance. |