Skip to content

Using Multi-Column Algorithms

To be able to configure and use the Multi-Column (MC) Algorithms one should be familiar with the following themes:

Logical Fields

A sample instance (serving as an example) of the MC algorithms is in the Masking SDK distribution, named "MultiColumnDateAlgorithm". That framework (the instance is based on) defines two fields:

    @Override
    public List<AlgorithmLogicalField> listMultiColumnFields() {
        /*
         *  Here we define the column names to be used in the algorithm. These names are only used to reference the
         *  columns within the algorithm and do not need to correspond to the names of the columns on the data source.
         *  For example, our data source may call these 2 fields "dateOfBirth" and "dateOfDeath", however within the
         *  algorithm implementation they will be referenced as "startDate" and "endDate" (see mask method to see how
         *  this is used).
         */
        return ImmutableList.of(
                new AlgorithmLogicalField("startDate", MaskingType.LOCAL_DATE_TIME),
                new AlgorithmLogicalField("endDate", MaskingType.LOCAL_DATE_TIME));
    }

In that example, the fields "startDate" and "endDate" are logical fields, defined by the framework. If one doesn't have access to the source code of the framework, it's possible to find the logical field names (and their types) using the Masking API: GET /algorithms/{algorithmName} endpoint.

The API provides a five-argument constructor for AlgorithmLogicalField that allows for fields to be marked as: read-only and/or optional, as well as to provide a short documentation string for the field's usage. The Extensibility SDK provides an example algorithm that demonstrates this called MultiColumnRedaction.java.

Let's suppose you already have an instance of multi-column Algorithm installed. That might happen in any of the following two cases:

  • The Plugin you've installed contains a default instance for MC algorithms.
  • The Plugin you've installed contains only a framework for configurable MC algorithms. In that case, you've configured an instance of the algorithm.

Let's take as an example "MultiColumnDateAlgorithm" algorithm mentioned above (plugin is named "sample" in that example). Retrieving its info using the GET /algorithms/{algorithmName} endpoint returns:

  {
    "algorithmName": "Sample Plugin:MultiColumnDateAlgorithm",
    "algorithmType": "COMPONENT",
    "isTokenizationSupported": false,
    "pluginId": 11,
    "fields": [
        {
          "fieldId": 5,
          "name": "startDate",
          "type": "LOCAL_DATE_TIME",
          "isReadOnly": false,
          "isOptional": false
        },
        {
          "fieldId": 6,
          "name": "endDate",
          "type": "LOCAL_DATE_TIME",
          "isReadOnly": false,
          "isOptional": false
        }
    ],
    "algorithmExtension": {}
  }

Here we can see the information structure for the logical fields, defined by the current framework. We will use that data when configuring the Inventory fields.

INFO

Previous versions of the Extensibility API required two methods - listMaskedFields and listReadOnlyFields - to be implemented when creating a multi-column algorithm. These methods are now deprecated, and listMultiColumnFields is preferred way for multi-column algorithms to define thier fields. However, existing algorithms that use the old methods should continue to function normally.

Configuring columnMetadata for MC algorithm

To configure the involved column (i.e. masked and read-only columns) - we should update the column's metadata with the following information:

"algorithmFieldId"
"algorithmGroupNo"
"algorithmName"
"domainName"

The last two fields are the regular configuring fields for masked columns. Let's look closer to the newly introduced fields for MC:

  • algorithmFieldId is a filedId for the corresponding logical field. For example for "startDate" from the example above its value is 5.
  • algorithmGroupNo is a group number (integer) for the columns treated by the same algorithm instance. It is introduced for cases where we might have multiple columns of a similar type, which are masked by the different Masking Jobs using the same algorithm. In such a case that's important to unite the columns per algorithm run, by assigning the same group number.

There are two supported methods to configure the columnMetadata for the masked table inventory:

  • Via API
  • Via UI

Configuring columnMetadata for MC algorithms via API

Below is the example of the column metadata before it's configured for MC algorithm:

Let's associate that field with the logical field startDate (fieldId=5) from the snapshot above, by adding the mentioned fields:

INFO

For the masked column, the isMasked field should be manually changed to true, while for read-only field it stays false.

If at this point an inventory for the masked table is checked in the UI - the configured (via API) inventory will be displayed there:

Configuring columnMetadata for MC algorithms via UI

The same columnMetadata configuration can also be made via the UI. As with other algorithms one has to choose the Domain and Algorithm values, applied to the current column. If a Multi-Column algorithm has been chosen, the following additional two fields will need to be filled out:

  • Select Logical Field dropbox, where the corresponding logical field to be selected.
  • Algorithm Group window, where algorithmGroupNo value to be entered.

INFO

In the UI configuration for columnMetadata, the customer shouldn't mark the isMasked field (as via the API in the example above). It's taken care automatically since ME knows the associated logical field is being masked or used as a read-only.

Error Management

There are different configuration errors possible while setting the MC algorithms. The configuration process prevents as many misconfigrations as possible, but some configuration errors can only be detected when a job is executed. For example, if trying to associate a second column to the same (already busy) logical field will result in a configuration error similar to:

In case there is a missed association with the required logical field - that type of error isn't recognized during the configuration, but only during the job execution (which will fail due to that misconfiguration).

Please find below an example of the monitor job error report:

Limitations for the MC Algorithms

  1. Currently, it's possible to run the MC Algorithms only on a single table. Masking multiple tables columns by MC Algorithms is not supported.
  2. XML File masking does not support MC algorithms.
  3. VSAM File masking does not support MC algorithms. The only exception is VSAM files which don't redefine record types.
  4. Some types of misconfiguration errors (as described above) are only detected during job execution.