Skip to content

Data Cleansing

See Data Cleansing for more information about this algorithm framework.

Creating a Data Cleansing Algorithm via API

  1. Retrieve the frameworkId for the Data Cleansing Framework. This can be done via the following endpoint:

    algorithm   GET /algorithm/frameworks
    

    The framework information should look similar to the following:

    {
        "frameworkId": 24,
        "frameworkName": "Data Cleansing",
        "frameworkType": "STRING",
        "plugin": {
            "pluginId": 7,
            "pluginName": "dlpx-core",
            "pluginAuthor": "Delphix Engineering",
            "pluginType": "EXTENDED_ALGORITHM"
        }
    }
    
  2. Upload a lookup file via the following endpoint:

    fileUpload   POST /file-uploads
    

    Copy the fileReferenceId value returned in the Response Body.

  3. Create a Data Cleansing algorithm via the following endpoint:

    algorithm   POST /algorithms
    

    Using the JSON formatted input, similar to the following example:

    {
        "algorithmName": "demoDataCleansing",
        "algorithmType": "COMPONENT",
        "frameworkId": 24,
        "algorithmExtension": {
            "lookupFile": {
                "uri": "delphix-file://upload/f_52b19f8a9125435a83a1237fa53aeaf5/sample.txt"
            },
            "delimiter": "=",
            "caseSensitive": false,
            "trimWhitespace": true
        }
    }
    

Data Cleansing Algorithm Extension

  • lookupFile (required)

    String
    The fileReferenceId value returned from the fileUpload endpoint for uploading files to the Masking Engine. The file should contain a newline separated list of {value, replacement} pairs separated by the delimiter. No extraneous whitespace should be present.

  • delimiter (required, minLength=1; maxLength=50; default="=")

    String
    The delimiter string used to separate {value, replacement} pairs in the lookup file.

  • caseSensitive (optional, default=true)

    Boolean
    Whether the case of the input string must match the values in the lookup file.

  • trimWhitespace (optional, default=true)

    Boolean
    Whether to trim leading and trailing whitespace from the input string.
    Note: This must be true to cleanse fixed-width files and fixed-length database data types such as CHAR and NCHAR.