Skip to content

Free Text Redaction

Extensible Algorithm Framework

A Free Text Redaction Algorithm Framework helps you remove sensitive data that appears in free-text columns such as “Notes.” This type of algorithm requires some expertise to use because you must set it to recognize sensitive data within a block of text.

The algorithm uses a list of lookup words to determine what information it needs to mask. You can decide which words the algorithm uses to search for material such as addresses. For example, you can set the algorithm to look for “St,” “Cir,” “Blvd,” and other words that suggest an address. You can also use pattern matching to identify potentially sensitive information. For example, a number that takes the form 123-45-6789 is likely to be a Social Security Number. Lookup words and regular expressions will match individual words within the input text, rather than phrases.

You can use a Free Text Redaction Algorithm Framework to show or hide information by displaying either a “DenyList” or an “AllowList.”

DenyList – Designated material will be redacted (removed). For example, you can set a deny list to hide patient names and addresses. The deny list feature will match the data in the lookup file to the input.

AllowList – ONLY designated material will be visible. For example, if a drug company wants to assess how often a particular drug is being prescribed, you can use an allow list so that only the name of the drug will appear in the notes.

Creating a Free Text Redaction Algorithm via UI

  1. Enter an Algorithm Name.

  2. Enter a Description.

  3. Select a Redact Type: the Deny List or Allow List.

  4. Select a Lookup File and enter a Redaction Value OR/AND

  5. Enter Regular Expressions separated by a new line and enter a Redaction Value.

  6. Click Save.

Existing limitations:
  1. The maximum number of supported Regular Expressions is 50. Exceeding this number will lead to the Component Configuration exception.
  2. The maximum number of supported words in the Lookup File is 1000. Exceeding this number may affect the algorithm performance.
  3. The Lookup File format must be txt.
  4. Every entry in the Lookup File must be a new line separated. Phrases are not supported. Case sensitive.
  5. The maximum length of an input text to mask is 32768. Exceeding this number will lead to the Non-Conformant data exception.

For information on creating Free Text Redaction algorithms through the API, see API Calls for Creating Algorithms - Free Text Redaction.

Examples

Input:

The customer Bob Jones is satisfied with the terms of the sales
agreement. Please call to confirm at 718-223-7896.
Algorithm configuration:
  1. The Redact Type is DenyList
  2. Lookup File entries:

    Bob
    Jones
    agreement
    
  3. The Lookup File Redaction Value is XXXX

  4. Regular Expressions entry:
    [0-9]{3}-[0-9]{3}-[0-9]{4}
    
    1. The Regular Expression Redaction Value is YYYY
Masking result:
The customer XXXX XXXX is satisfied with the terms
of the sales XXXX. Please call to confirm at YYYY.

"Bob", "Jones", "agreement" and the phone number are redacted.