Skip to content

The MaskingAlgorithm Java Interface

Any Java class that should be recognized as a masking algorithm (whether stand-alone or configurable) must implement the MaskingAlgorithm interface. This interface is parameterized with the data type the algorithm masks, which defines the input and output data type of the mask method. The full details of this interface are described in the Masking Plugin API Javadoc

Core Data Types

The Delphix Masking Engine is designed to support a wide and extensible set of data sources, which naturally encode data in a variety of different formats. In order to simplify algorithm development, while maintaining the ability to mask data from many sources, we've identified a core set of data formats which are likely to require different masking treatment and ensured that the Extensible Algorithm framework converts all data to/from these types as needed. These types define the allowed parameterization of the MaskingAlgorithm Java interface.

Each masking algorithm class is defined to mask exactly one of the following data types:

  • Binary data - java.nio.ByteBuffer
  • String data - java.lang.String
  • Numeric data - java.math.BigDecimal
  • Date time data - java.time.LocalDateTime
  • Multi-column data - com.delphix.masking.api.plugin.utils.GenericDataRow (See Multi-Column Masking section)

Each algorithm is expected to input, process, and emit objects of one of the above Java types, but is free to use any intermediate types as needed to access library methods. Because it is frequently the case that data of one type is stored in databases or documents in a type other than its most natural native type (ex. dates stored in VARCHAR fields, or numbers stored as text in a CSV file), the masking framework that executes these algorithms is capable of performing a number of automatic type conversions, detailed in the next section. This allows algorithms written to process one data type to handle data of other types, with no additional work required of the algorithm author.

Supported Automatic Type Conversions

Algorithm Native Type Supported Type Notes
ByteBuffer String Algorithm receives the UTF-8 encoded value of the String and is expected to return a valid UTF-8 ByteBuffer.
LocalDateTime String The correct date format must be assigned to the field or column in the masking inventory.
LocalDateTime Compatible numeric types A compatible date format, such as yyyyMMdd, must be assigned to the column in inventory.
BigDecimal All numeric types Upconverted to BigDecimal. Out of range values after masking are truncated to fit the range of the underlying type.
BigDecimal String String value is converted to a number.

Special Case Values

In order to allow algorithms to implement special handling for null, empty, and special case values, these values are presented to the masking algorithm unmodified. Algorithms should be prepared to process the full range of input values possible for the input type. In practice, this means that most mask method implementations will begin with a null check on the input value, prior to attempting to use the input - for example, by calling input.length() or similar. It is perfectly acceptable and commonplace to return null in the case where the mask input is null.

Method Overview

This section provides a high-level overview of the methods in the MaskingAlgorithm interface. For complete details, consult the Masking Plugin API Javadoc included in the Algorithm SDK archive.

  • getName and getDescription - These methods are used to determine the name and description of frameworks and algorithm instances included in the plugin. For user-created instances, these methods are never called.
  • getDefaultInstances and getAllowFurtherInstances - These methods control the set of instances of the algorithm framework that are defined by the plugin, and whether the user should be allowed to create additional instances.
  • validate - This method is called after configuration is applied to allow the algorithm class to check whether the injected configuration is valid.
  • setup and tearDown - These methods are called before the algorithm object is used for masking, and after, respectively. Typically any resources, such as input files, are acquired during setup and released during tearDown.
  • mask - This is the method that does the actual data masking in the algorithm class. The input and output values are parameterized for type safety as described above.
  • listMaskedFields - This method needs to be implemented for Multi-Column Algorithms. It returns a map of field names (String) to the Core Data Type. This method does not need to be implemented if not implementing a Multi-Column Algorithm.
  • listReadOnlyFields - Similar to listMaskedFields but optional for Multi-Column Algorithms. Fields returned by this method are read-only and cannot be changed.

The Life Cycles of Algorithm Objects

The Extensibility framework uses objects classes implementing MaskingAlgorithm interface for several distinct purposes. These object life cycles are as follows:

Plugin Discovery

This occurs when the extensibility framework evaluates the capabilities present in a MaskingAlgorithm class.

  1. Java object creation - an object of the algorithm class is created
  2. getName - determines framework name
  3. getDescription - determines framework description
  4. getDefaultInstances - determines all plugin-provided algorithm instances. For each instance:
    1. getName - determines instance name
    2. getDescription - determines instance description
    3. validate - ensure object passes validation
    4. Serialize configurable fields - these are saved as a JSON document defining the instance's configuration
    5. Disposal - the Java object is discarded
  5. getAllowFurtherInstances - determines whether the framework is visible in the algorithm/framework API endpoint
  6. Disposal - the Java object is discarded

User Algorithm Creation

This life cycle occurs whenever a user attempts to create a new instance of a plugin algorithm framework. The algorithm definition is saved only if each step succeeds.

  1. Java object creation - an object of the algorithm class is created
  2. Configuration injection - the values in the user-provided JSON document are injected into the object
  3. validate - the object's validate method is called
  4. Disposal - the Java object is discarded

Note

The setup method is not executed when a user-defined instance is created.

Algorithm Use

This is the life cycle of an algorithm object when used to mask data.

  1. Java object creation - an object of the algorithm class is created
  2. Configuration injection - the saved JSON document defining this instance is injected in the object
  3. setup - the setup method is called once
  4. mask - the mask method is called on each value to be masked
  5. tearDown - the tearDown method is called once
  6. Disposal - the Java object is discarded

Tip

It should be noted that a distinct Java object is created for each application of a masking algorithm during Job execution. For algorithms that create or load a large amount of state, this can result in significant memory usage storing redundant data for each instance. This can be avoided using a class level static cache to store data; the instance name, which can be retrieved during setup from the ComponentService interface object, can be used as an access key for data cached in this way.

Multi-Column Masking

It is possible to write an algorithm that masks data that depends on other column(s) values. In order to account for the different possible data types, we use an object called a GenericDataRow.

Generic Data

A GenericDataRow is a map of field names (String) to GenericData objects. Each GenericData object contains the value, along with methods to return the respective typed object. When accessing the value from a GenericDataObject it will be necessary to read it into a Core Data Type. To do so, use one of the following methods:

  • getStringValue()
  • getBigDecimalValue()
  • getLocalDateTimeValue()
  • getByteBufferValue()

Once the value has been masked it should be re-set by calling setValue and passing as an argument the value as a Core Data Type.