Introduction to Masking Algorithms¶
Overview¶
This article provides a brief outline of the different algorithm options that are available, along with other general algorithm information. More specific algorithm details can be explored in the Out Of The Box Algorithm Instances or Algorithm Frameworks sections.
An algorithm plugin can be configured through the graphical user interface by entering the plugin's required configuration in JSON format. For more information, visit the General UI for Extended Algorithms article.
Algorithm Options¶
Out Of The Box Algorithm Instances¶
Out of the box algorithm instances are pre-configured ready to use algorithms. The out of the box algorithms with related frameworks can be customized using the corresponding extensible frameworks. For more information on algorithm instance extensibility, see Extensible Algorithms.
Algorithm Frameworks¶
Algorithm frameworks allow for creation of algorithm instances with a custom configuration. For more information on algorithm framework extensibility, see Extensible Algorithms. More information on multi-column algorithms can be found at Using Multi-Column Algorithms.
Algorithm Framework | Extensible? | Multi-Column? | Out of the Box Instances |
---|---|---|---|
Binary Lookup | X | ||
Character Mapping | X | dlpx-core:CM Alpha-Numeric dlpx-core:CM Digits |
|
Data Cleansing | X | ||
Date Replacement | X | ||
Date Shift | X | Date Shift Fixed | |
Dependent Date Shift | X | X | |
X | dlpx-core:Email Unique dlpx-core:Email SL |
||
Free Text Redaction | X | ||
Full Name | X | dlpx-core:FullName | |
Mapping | X | ||
Min Max | X | ||
Name | X | dlpx-core:FirstName dlpx-core:LastName |
|
Numeric Expression | X | ||
Payment Card | X | Credit Card | |
Regex Decompose | X | ||
Secure Lookup | X | See Out Of The Box Algorithm Instances > Secure Lookup for all Secure Lookup algorithm instances | |
Segment Mapping | X | ||
Tokenization | X |
Configuring Your Own Algorithms¶
Algorithm Settings¶
The Algorithm tab displays algorithm Names along with Type and Description. This is where you add (create) new algorithms. The default algorithms and any algorithms you have defined appear on this tab.
At the top of the page, Nonconforming Data behavior is displayed to specify how all algorithms should behave if they encounter data values in an unexpected format. Mark job as Failed instructs algorithms to throw an exception that will result in the job failing. Mark job as Succeeded instructs algorithms to ignore the non-conformant data and not throw an exception. Note that Mark job as Succeeded will result in the non-conformant data not being masked should the job succeed, but the Monitor page will display a warning that can be used to report the non-conformant data events.
Creating New Algorithms¶
If none of the default algorithms meet your needs, you might want to create a new algorithm. An algorithm that you create is called a "user-defined algorithm".
Algorithm Frameworks give you the ability to quickly and easily define the algorithms you want, directly on the Settings page. After you create an algorithm, your algorithm will be available to all users.
To add an algorithm:
-
In the upper right-hand corner of the Algorithm settings tab, click Add Algorithm.
-
Select an algorithm type.
-
Complete the form to the right to name and describe your new algorithm.
-
Click Save.
Editing Algorithms¶
Administrators, as well as users with EDIT Algorithm permission assigned in their Role, may edit any user-defined algorithm on the system.
The following algorithm instances cannot be modified:
- Instances that ship with and are defined by the system
- Instances defined by algorithm plugins
Multi-Column Algorithms¶
Overview¶
Multi-column algorithms are a special kind of algorithm that allow a single algorithm assignment to be made spanning multiple columns or fields in inventory. This allows coordinated masking of multiple fields - for example, masking two date-time values while preserving the interval between them.
The Dependent Date Shift algorithm is an example of a multi-column algorithm.
Usage¶
Each multi-column algorithm defines a set of Logical Fields; these logical fields are assigned to the actual fields or columns in inventory, defining how each value will be treated by the algorithm. A particular logical field may be read-only, indicating that it is considered as input but not masked by the multi-column algorithm, and/or optional, meaning the logical field is not required in order for the masking assignment to be complete. Furthermore, the Algorithm Group number allows a multi-column algorithm to be assigned multiple times in the same table or file-format, with the group number indicating which set(s) of logical fields should be processed together as a single assignment.
Note
Incomplete multi-column masking assignments in the inventory may not be detected until such time as a masking job is executed using that inventory. It is important to review each multi-column assignment carefully to ensure that for each Algorithm Group, each non-optional Logical Field is assigned to a column or field in the table or file-format.
Limitations¶
Multi-column algorithms may only be applied in inventories for data connectors where entire rows or records are processed as a unit.
Specific limitations:
- Multi-column algorithms are not supported for XML file masking.
- Multi-column algorithm assignments must be contained with a single Record Type for delimited and fixed-width files.
- Multi-column algorithm assignments must not cross redefines in VSAM copybooks.
- Multi-column algorithms may not be called by other algorithms through the algorithm chaining feature.
Algorithm Frameworks Overview¶
Choosing an Algorithm Framework¶
See the Algorithm Frameworks section for a detailed description of each Algorithm Framework. The algorithm framework you choose will depend on the format of the data and your internal data security guidelines.
Choosing Between Character and Segment Mapping Frameworks¶
The Character Mapping algorithm is intended to replace Segment Mapping for many use cases. That said, it does not replicate every feature of that algorithm, so the specific masking application will determine which one is appropriate.
Reasons to choose Character Mapping over Segment Mapping:
- Character Mapping can mask all characters in the first Unicode plane. Segment Mapping can only mask "[a-zA-Z]" + "[0-9]"
- Character Mapping automatically preserves all non-masked characters. Segment Mapping requires configuration of preserve characters. Character Mapping is much easier to use when the data is potentially "dirty" or not consistently formatted.
- Character Mapping can process preserve ranges in reverse, allowing the last positions of an input to be preserved when inputs have different lengths. Segment Mapping preserve ranges are always processed from the beginning of input.
- Character Mapping uses a more complex masking computation, so that every maskable position influences every other position in the masked value. Segment Mapping pre-computes the permutations for each segment independently.
Reasons to choose Segment Mapping over Character Mapping:
- Segment mapping can mask different parts of the input, determined by position, differently. Character Mapping always masks the same groups of characters regardless of position.
- Segment mapping can map inputs to different outputs at a position, like { A, B, C, D } -> { W, X, Y, Z } by specifying different Input and Mask values. This is not possible with Character Mapping.
- Segment mapping supports numeric segments, with up to 6-digit segments masked to a specific range. Character Mapping doesn't allow this kind of range limiting.