Basics of Data Masking

By Laura Case, Director of Product Management | March 14, 2023

Asking different people within software organizations or even different people within the security space to define “data masking” often means you will get different – and sometimes complex – answers. There are so many terms, pseudonyms, and technologies related to data masking that we wanted to get back to the basics. In this article, I’ll discuss the fundamental concepts of data masking, explain various masking techniques, delve into the difference between static data masking and dynamic data masking, and talk about Baffle’s Data Masking services.

Database Icon with Data MaskingWhat is Data Masking

In data security, data masking is generally used as the catch-all term for the process of altering data to protect sensitive information. It can also be known as data obfuscation, data de-identification, data anonymization, and data sanitization. Additionally the General Data Protection Regulation (GDPR) introduced the term “pseudoanonymization” that covers a range of processes to protect sensitive data.

The goal of data masking is to allow sensitive data to be used while ensuring it is kept secure. To meet this goal, sensitive data must be masked using a method that ensures there is no way to reverse engineer the process and gain access to the original data.

Data Masking Techniques

There are many different ways to mask sensitive data, and there are pros and cons to each type of masking. To make the decision on how to mask your data, it’s important to understand the various techniques, how you want to use the data after it’s masked, and who will have access to the data.

Data encryption. Even though encryption is often viewed as an alternative to masking, at its core it is a way of altering data to protect sensitive information. The main distinction of encryption when compared to other masking techniques is the ability to recover the actual data values if you have the encryption key. So this is the only masking technique available if reversibility is desired. However, depending on the type of encryption – a topic we’ll save for another article – the data may not be useful once it’s encrypted.
Two tables comparing unmasked credit card data and encrypted credit card data.
Partial masking – This is replacing part of the data value, such as replacing every number in a credit card number with “X” except the last four digits. This is useful when you don’t need full access to the entire data value, but unlike encrypted data, it is impossible to unmask.
Two tables comparing unmasked credit card data and partially masked credit card data.

Redaction – This is completely substituting the data with something else. All personally identifiable information such as first name and last name could be changed to “CONFIDENTIAL” for every record. Just like partial masking, it can’t be unmasked.
Two tables comparing unmasked credit card data and redacted credit card data.
Nulling the data – This is making the value blank or null. This is not a recommended method of masking because it makes it unclear whether the value was never available, or if it was masked.
Two tables comparing unmasked credit card data and null credit card data.
Data substitution (or tokenization) – This is substituting each value with another realistic value, often using look up tables. Each occurrence of the first name “John” could be substituted with “Alex.” This is a good method for credit card numbers so that they are realistic and can pass application validation logic like LUHN checks, but can be less secure with data like names where some names are more common than others.
Two tables comparing unmasked credit card data and tokenized credit card data.

Static versus Dynamic Masking

Now that we’ve discussed masking techniques, it’s important to discuss the method in which data is masked. All of the above masking can either be done via static data masking or dynamic data masking.

Static data masking is when you directly alter the data with an anonymized value. This is typically done on a copy of your data (a replica) so as not to lose the original values. That replica can then be used elsewhere safely. The biggest advantage to static data masking is that the masked data can never expose the sensitive value. The biggest disadvantage is that you have to maintain a masked version and an unmasked version of your data. With ever increasing amounts of data within organizations, this can be incredibly costly.

Dynamic data masking is when masking occurs only when the data is viewed. Dynamic data masking is powerful because it allows role-based access controls to be implemented and does not destroy the original data value. This means with the appropriately configured security policies, a supervisor can see the data in cleartext, but all other users only see masked data. Dynamic masking allows greater control over access to the data and expands the use cases your data can serve.

Data Masking with Baffle

As you can see, there are many different ways you can mask sensitive data. Baffle’s Data Protection Services gives you the ability to choose any of these data masking methods, or a combination of these methods. While we believe encrypted, dynamic data masking to be one of the best and most secure options out there, we also understand that may not be desired for every data use in every situation. Our flexible platform provides you robust options to mask your sensitive data the way you want to mask it, and to enable both field-level and record-level data masking. We do all of this with no application code changes, helping you avoid lengthy projects to protect your data.

Learn More

To see a demo of data masking and discuss your data protection concerns, please schedule a meeting with Baffle.