Skip to content

Numeric Expression

Extensible Algorithm Framework

Numeric Expression algorithms mask numeric input by evaluating it within a one-line, mathematical expression written by the user in the Java programming language. The expression can reference the current unmasked value via an implicit variable called input.

For example, to mask a numeric column by always multiplying the input by 50%, the following expression could be used:

 input * 0.5

In addition to input, the expression can reference user-defined constant variables whose values are determined at the beginning of a masking job and remain fixed for the life of the masking job.

See below for examples of expressions and constants.

Creating a Numeric Expression Algorithm via UI

  1. In the upper right-hand corner of the Algorithms tab, click Add Algorithm.
  2. Select Numeric Expression Framework. The Create Numeric Expression Algorithm pane appears.
  3. Enter an Algorithm Name. (Required)

    Info

    This MUST be unique on the Masking Engine.

  4. Enter an optional Description.
  5. Enter an Expression. This must be a one-line, mathematical expression written in the Java programming language that references input (the current unmasked value), e.g. input * 0.5 or input + Math.random(). See below for more examples of expressions.
  6. Choose the Input Type. This is the data type that input conforms to within the expression. The default double option causes input to be treated as a double-precision floating point variable in expressions such as:
    input * 0.5
    or
    input + Math.random()
    Input Type can also be set to long, which causes input to be treated as a long integer variable in expressions such as:
    Long.sum(input, 50L)
    The final Input Type option is BigDecimal, which causes input to be treated as a java.math.BigDecimal variable in expressions such as:
    input.scaleByPowerOfTen(3)
  7. Enter an optional Replacement Value for Nonconforming Data if necessary. This is the default masked value to be used if the unmasked input is not a numeric data type and can't automatically be converted to one.
  8. Optional: define any constants used by the expression. Constants are variables that the expression can reference by name and whose values remain fixed for the life of a masking job. For example, to mask every column value in a masking job by multiplying them all by the same random number, you could use an expression such as:
    input * theSameRandomNumber
    but theSameRandomNumber would need to be defined as a constant whose Name is theSameRandomNumber and whose Value is something like new java.util.Random().nextDouble(). See below for more examples of constants.
  9. When you are finished, click Save.

For information on creating Numeric Expression algorithms through the API, see API Calls for Creating Algorithms - Numeric Expression.

Writing Good Expressions & Constants

Expressions and the Java programming language are powerful. Care must be taken to avoid writing bad expressions, which will manifest in the form of failed masking jobs. It is highly recommended to stage complex expressions with a Java IDE such as Eclipse or IntelliJ IDEA before using them in a masking job.

The requirement that expressions must be written in Java might be intimidating to non-programmers, but simple mathematical equations in Java look similar to simple mathematical equations in general. The four most common operators are supported: addition (+), subtraction (-), multiplication (*), and division (/). For operators not supported by Java, use methods from the java.lang.Math library. For example, one might expect input ^ 5 to mean "take input to the fifth power," but ^ is not a power operator in Java. Instead, use Math.pow(input, 5.0).

To isolate parts of the expression for clarity or to enforce order of operations, use open and closed parentheses () only. Do not use square braces [] or curly braces {}.

Expression Do's and Don'ts

Do use an Input Type (explained above) that corresponds to the data type of the column being masked. For columns whose values are floating-point numbers (i.e. numbers that have digits to the right of the decimal point) set Input Type to double (the default) or BigDecimal if the expression needs to treat the input as a java.math.BigDecimal object in order to perform more complex math. For columns whose values are integers (whole numbers), set Input Type to long.

Don't write expressions that do mathematically impossible things (e.g. divide by zero) or will result in numeric overflow or values that are too large or too small to fit in the database column being masked.

Don't use line breaks or other whitespace to force an expression to be longer than one line.

Don't attempt to assign an expression to a variable. For example, this won't work:

 output = input * 0.5

but this will:

 input * 0.5

The result of the expression will be automatically assigned as the masked value. It's not necessary or allowed to assign it to anything else.

Don't use the return keyword or end the expression with a semicolon.

Don't write expressions that return a non-numeric value, e.g.

 java.util.Arrays.asList(input)

The above expression would return a List object, which can't be converted into a numeric value. Expressions must return a value whose type is numeric: an int, short, long, float, or double Java primitive type (or their object wrappers) as well as java.math.BigDecimal and java.math.BigInteger. Returning String and char[] (character array) values is also acceptable as long as they can be converted into a numeric value.

Do fully-qualify any Java class the expression references that isn't in the java.lang package, e.g.

 input * new java.util.Random().nextDouble()

This won't work:

 input * new Random().nextDouble()

because Java's Random class is in the java.util package rather than java.lang.

Don't use the import keyword in an attempt to import non-java.lang classes that are referenced frequently by the expression and/or constants. Fully-qualify such Java classes every time they're referenced.

Constants

Constants are variables that the expression can reference by name and whose values remain fixed for the life of a masking job. Constant names must be valid Java variable names. No two constants can have the same name, nor can "input" or "seed" be used as a constant name.

Constant values are very much like the expression: one-line Java expressions that must return a value. However, unlike the algorithm's main expression, constant values aren't required to be numeric.

Constants can reference by name other constants defined before them.

seed

There is a built-in constant named seed. Its value is a long integer that's based on the algorithm key, so the value of seed is guaranteed to remain the same across multiple masking jobs as long as the algorithm key remains the same. A common use case for seed is to seed a random number generator to produce the same (i.e. predictable) "random" number(s) among different masking jobs.

Numeric Expression Examples

Example 1

A numeric column must be masked by multiplying all of its values by the same random percentage. The random percentage must remain the same across every masking job.

Solution:

A single constant is required for the random percentage:

Name Value
randomPercentage new java.util.Random(seed).nextDouble()

Note that the built-in seed constant is being used to seed the random number generator, an instance of java.util.Random, which is used to produce a single random number.

The expression can then reference randomPercentage like this:

 input * randomPercentage

Example 2

A numeric column must be masked by taking the square root of each value, then rounding it to a certain number of decimal places. Initially, it will be rounded to two decimal places, but the number of decimal places will be changed frequently, so it should be easily adjustable by the user.

Solution:

We'll define two constants this time:

Name Value
decimalPlaces 2
multiplier Math.pow(10.0, decimalPlaces)

then use this expression:

 Math.floor(Math.sqrt(input) * multiplier + 0.5) / multiplier

The heavy lifting is being done by the main expression, which uses the multiplier constant. Note that multiplier references decimalPlaces, whose value could be easily changed by someone who is not inclined mathematically and doesn't understand how the expression is rounding numbers.

Example 3

We must mask a numeric column that represents the day of the current month, e.g. 1-31 (or 1-28, 1-29, 1-30). This column will be masked by adding to it a random number of days, which can be between 1 and the highest day in the current month, inclusive. If the masked value exceeds the highest day in the current month, it will simply be set to the highest day in the current month.

Solution:

First, since the day of the current month is an integer (whole number), set the algorithm's Input Type to long (integer) instead of the default double (floating point).

Then define three constants:

Name Value
calendar java.util.Calendar.getInstance()
lastDayOfMonth calendar.getActualMaximum(java.util.Calendar.DAY_OF_MONTH)
randomDays new java.util.Random().ints(1, lastDayOfMonth + 1).iterator().nextInt()

calendar is a new instance of java.util.Calendar set to the current date and time.

lastDayOfMonth uses calendar to determine the last day of the current month.

randomDays uses lastDayOfMonth to generate a random number between 1 and lastDayOfMonth (inclusive).

The expression will then look like this:

 (input + randomDays > lastDayOfMonth) ? lastDayOfMonth.longValue() : input + randomDays

This expression leverages Java's ternary operator to mask conditionally. If the unmasked input plus randomDays exceeds lastDayOfMonth, then the masked value will simply be lastDayOfMonth. Otherwise, the masked value will be the unmasked input plus randomDays.