Skip to content

Segment Mapping

Extensible Algorithm Framework

Segment Mapping algorithms produce no overlaps or repetitions in the masked data. They let you create unique masked values by dividing a target value into separate segments and masking each segment individually.

You might use this method if you need columns with unique values, such as Social Security Numbers, primary key columns, or foreign key columns. When using segment mapping algorithms for primary and foreign keys, in order to make sure they match, you must use the same Segment Mapping algorithm for each. You can set the algorithm to produce alphanumeric results (letters and numbers) or only numbers.

With Segment Mapping, you can set the algorithm to ignore specific characters. For example, you can choose to ignore dashes [-] so that the same Social Security Number will be identified no matter how it is formatted. You can also preserve certain values. For example, to increase the randomness of masked values, you can preserve a single number such as 5 wherever it occurs. Or if you want to leave some information unmasked, such as the last four digits of Social Security numbers, you can preserve that information.

This algorithm can be used for tokenization and re-identification jobs if the following conditions are met:

  • All alpha-numeric and numeric segments have Value Ranges with "Mask values with: The same ranges"
  • There are no segments with "Segment Treatment: Mask with a constant value"
  • If a numeric segment is defined, "Short Numeric Segment Handling: Report nonconforming data" is selected

To decide whether Character Mapping or Segment Mapping is the correct option for your use case, see Choosing Between Character and Segment Mapping Frameworks.

Creating a Segment Mapping Algorithm via UI

  1. In the upper right-hand region of the Algorithms tab, click Add Algorithm.

  2. Select Segment Mapping. The "Create Segment Mapping Algorithm" pane appears.

  3. Enter an Algorithm Name.

    Info

    This MUST be unique.

  4. Enter a Description (optional).

  5. Click the Segment 1 tab to open the pane for the first segment. Use the plus (+) button to add as many segments as you need (maximum of 10). Use the tabs to navigate between segments.

  6. For each segment, select its:

    • Length (number of characters). The maximum is 6.
    • Segment Treatment: Mask alpha-numeric, Mask numeric, Preserve, or Mask with a constant value.
    • Value Ranges. Optional for alpha-numeric and numeric, required for constant. See Specifying Value Ranges.

    Info

    Numeric segments are masked as whole segments. Alpha-numeric segments are masked by individual characters.

  7. If you would like to allow the masking of short numeric segments, change the Short Numeric Segment Handling drop-down to select Mask partial segments. This option allows masking to proceed if an input string is truncated midsegment. For example, you define a numeric segment of length 4, but the input string ends midsegment so you have a 2 digit number instead of 4.

    Info

    This only applies to Mask numeric segments. Other segment treatments always apply to partial segments.

    Info

    If Mask partial segments is selected AND a Mask numeric segment is defined, the algorithm is not reversible and cannot be used for tokenization/re-identification.

    By default, the segment mapping algorithm will Report nonconforming data for short numeric segments and the Monitor page will display a warning that can be used to report the non-conformant data events. This will result in the non-conformant data not being masked.

    Example

    Segment 1: length 2, mask alpha-numeric. Segment 2: length 4, mask numeric.

    Input Output Short Numeric Segment Handling
    AB1234 DL9148 Either
    AB12 AB12 Report nonconforming data (reported)
    AB0012 DL3619 Report nonconforming data (not reported)
    AB12 DL3619 Mask partial segments

  8. Select the appropriate Ignore Characters handling. Ignored characters are removed from the input value before masking and restored to their original positions after masking.

    When Automatically ignore special characters is selected, all non-maskable characters are ignored. When Ignore specific characters is selected, only specified characters are ignored. Enter the characters you wish to ignore in the Specific Characters box, separated by a comma. To ignore the comma character (,), check the Ignore Commas checkbox. To ignore control characters, check the Add Control Characters checkbox and select the desired characters to ignore.

  9. Lastly, the checkbox for Process Preserve Segments Before Ignore Characters selects whether to process segments with "Segment Treatment: Preserve" first, before removing ignore characters, so ignore characters count as length when finding preserve segments in the input, and then process the remaining segments.

    The default is for this to be unchecked, so ignore characters are removed first, and then the segments are processed in order.

    Warning

    This option exists to support backwards compatibility with the legacy Segment Mapping algorithm configuration and is not recommended for newly created algorithms, as it may cause some segments to be processed out of order.

  10. When you are finished, click Save.

  11. Before you can use the algorithm in a profiling job, you must add it to a domain. If you are not using the Masking Engine Profiler to create your inventory, you do not need to associate the algorithm with a domain.

Specifying Value Ranges

You can specify values ranges for each segment based on the Segment Treatment.

For Mask alpha-numeric, you can specify an original value range and a mask value range. If either of these fields is left blank, it will use the default value range, which is 0-9,A-Z. Use the value range fields to specify individual values and ranges, for example 'A-F,P,R,1-5,7,9'.

Info

The masking will only look to mask these values and will preserve any other values. Letters are masked to letters and digits to digits.

Info

If the original and replacement values and ranges are not the same, the algorithm is not reversible and cannot be used for tokenization/re-identification.

For Mask numeric, you can specify an original value range and a mask value range. If either of these fields is left blank it will use the default value range, which is 0 to the max integer that can fit into the segment length (ex: 000-999 for a segment of length 3). Use the value range fields to specify integer values and ranges, for example '10,30,50-875'.

Info

The masking will only look to mask these values and will preserve any other values.

For Preserve, you cannot specify any value ranges as whatever is encountered in this segment will be preserved.

For Mask with a constant value, you cannot specify an original value range, and your replace value must be a single value the same length as the segment (ex: if the segment length is 3, 'ABC' would be valid replacement).

Warning

The Segment Mapping pattern and sub-patterns need to match the data in order for it to be masked. If the data is longer than the defined pattern it will be passed through unmasked. To avoid this unwanted behavior - patterns (segments) and Ignore Characters should be set to match the data.

For information on creating Segment Mapping algorithms through the API, see API Calls for Creating Algorithms - Segment Mapping.

Examples

Perhaps you have an account number for which you need to create a segment mapping algorithm. You can separate the account number into segments, preserving the first two-character segment, replacing a segment with a specific value, and preserving a hyphen. The following is a sample value for this account number:

NM831026-04

Where:

  • NM is a plan code number that you want to preserve, always a two-character alphanumeric code.

  • 831026 is the uniquely identifiable account number. To ensure that you do not inadvertently create actual account numbers, you can replace the first two digits with a sequence that never appears in your account numbers in that location. (For example, you can replace the first two digits with 98 because 98 is never used as the first two digits of an account number.) To do that, you want to split these six digits into two segments. The first of these segments would be a 2 character constant segment mapping to 98. The second of these 2 could be a 4 character numeric segment.

  • -04 is a location code. You want to preserve the hyphen and you can replace the two digits with a number within a range (in this case, a range of 1 to 77).