Skip to content

Discovering Your Sensitive Data

Overview

After connecting data to the masking service, the next step is to discover which of the data should be secured. Once a rule set has been created, this is done by creating and running a profiling job using that rule set. A profiling job examines the metadata, such as column names and types, and potentially the data itself, to determine which columns or fields contain sensitive information. Upon determining that a data item is sensitive, the profiler assigns the matching domain and associated masking algorithm to the column or field.

When a profiling job is created, the Profile Set chosen defines the logic that will be used to determine which columns or fields contain sensitive information.

Concepts

Column Level Profiling

Column level profiling uses regular expressions (regex) to scan the metadata (column names) of the selected data sources. There are several dozen pre-configured profile Expressions (like the one below) designed to identify common sensitive data types (SSN, Name, Addresses, etc). You also have the ability to write your own profile Expressions.

Column level profiling also supports constraining matches by the type of column the data is stored in. For example, it can be configured to only match a First Name expression if the column is some kind of string type, rather than a date or a number.

Data Level Profiling

Data level profiling also uses regular expressions, but to scan the actual data instead of the metadata.

The pre-built profile sets included with the product do not include any data level expressions by default, but some data level expressions are included that may be added to user-created profile sets.