Regex Decompose¶
The Regex Decompose framework masks values that match specified Java 8 regular expressions. The algorithm attempts to match the algorithm input against each regular expression, and once a match is found, the associated action is applied to transform either the entire input, or each capturing group (parts of the input) defined by the expression. A fallback action may be provided for use when none of the defined regular expressions match the input. If no fallback action is defined and an input fails to match any of the defined regular expressions, the algorithm may be configured to generate a non-conformant data exception.
Capturing groups are used in regular expressions to create subgroups. These can be expressed in regular expressions using parentheses to group characters together. This algorithm allows for different capturing groups to be assigned different mask actions. Nested capturing groups are unsupported and may lead to unpredictable behavior. If no capturing groups are defined, the first action is applied to the entire match. In this case, the action list should contain only one action.
Creation of Regex Decompose algorithms can only be done through the API, see API Calls for Creating Algorithms - Regex Decompose.
Examples¶
As an example, a Regex Decompose algorithm with the following configuration:
Mask Pattern:
Regular Expression: "[0-9]*"
Action: Redact
Redact String: "redacted"
Require Mask: false
Trim Input: true
Maximum Input Length: 10
Will produced masked results as follows:
- "12345" → "redacted"
- " 6789 " → " redacted "
- "12345678901" → non-conformant data
- exceeds maximum input length
- "abc123" → "abc123"
- remains unmasked since it does not match the regex pattern
The provided regular expression matches any inputs with 0 or more digits in the range [0-9] and any inputs that match will be replaced with the string "redacted". Any inputs that contain characters outside of the range [0-9] will not be masked. If require mask was set to true, the last example "abc123" would trigger a non-conformant data event as the value would not be masked by the algorithm.
Another example that includes capturing groups with the following configuration:
Mask Pattern:
Regular Expression: "([1-9]*)-([a-z]*)"
Action 1: Redact
Redact Character: 'X'
Action 2: Preserve
Require Mask: true
Trim Input: true
Maximum Input Length: 10
Fallback Action: Redact
Redact String: "redacted"
Will produce masked results as follows:
- "12345-abc" → "XXXXX-abc"
- "abc-123" → "redacted"
- does not match the pattern so the fallback action is applied
- "1-a" → "X-a"
- "-" → "redacted"
- does match the pattern but the masked output would be "-" which breaks the requirement that the output must be different from the input so the fallback action is applied
- "redacted" → non-conformant data
- does not match the pattern so the fallback action is applied but the fallback action does not change the value so it fails the requirement that the input must be masked
The provided regular expression matches any inputs with 0 or more digits in the range [1-9], a dash, and 0 or more characters in the range [a-z]. Any inputs that do not match that pattern will be masked by the fallback action. If the fallback action fails to change the input, a non-conformant data event will occur.
All inputs with the same input value masked with the same algorithm configuration will result in the same output values.