Is Tokenization a Good Way to Protect Your Data?

By Harold Byun, VP Products | April 22, 2021

This blog looks at the functionality of vault-based data tokenization methods and some key data protection challenges in using such approaches in cloud security and modern compute environments.

The data tokenization process is a method that service providers use to transform data values into token values and is often used for data security, regulatory, and compliance requirements established by bodies such as Payment Card Industry Data Security Standard (PCI DSS Compliance), General Data Protection Regulation (GDPR), and HIPAA. There are basically three approaches to enabling data encryption through data tokenization:

  1. Vault-based Tokenization
  2. Vaultless Tokenization
  3. Format Preserving Encryption (FPE)

Vault-based tokenization operates by creating a dictionary of sensitive data values and mapping them to token values which replace the original data values in a database or data store. While encryption utilizes techniques to construct ciphertext from plaintext, tokenization uses an algorithm to generate completely random characters in the same format to overwrite the data.

A token vault stores the relationships between both the original values and the token values. The original sensitive data and the token values are then stored in a hardened vault and when an application or user needs access to the original value, a look up is performed against the dictionary and the token is reversed.

The benefit of using tokenization, as with other data-centric security measures, is that the original data values or dataset are not stored in clear text a data store. While this method provides a viable means of data masking and addressing compliance, there are some key issues to consider related to the actual data privacy, security provided, and operationalization of the solution.

Threat Mitigation and Risk Transfer

One key challenge with a vault-based tokenization system is that the solution is simply moving sensitive or personal data from one data store to another. This may protect against a primary data repository data breach, but in reality, is creating a mapping of your sensitive information to token values that simply live in another location. This may help transfer some risk or address a specific data storage environment, but it is ultimately replicating the sensitive data issue in another place. Further, if high availability needs to be implemented, the token vault will need to be replicated as well.

Application Changes Required

In order to leverage a token vault, one will also need to alter applications and queries that look to access tokenized data. This requires application changes across the environment which can be costly and time-consuming. Furthermore, in cloud-native environments, distributed data environments, and microservices environments, instrumenting these changes can slow down application deployments because of the number of touchpoints involved.

Performance and Scalability Challenges

Think for a moment about how much data your organization has. Then, consider what it would mean if every time you needed to ask for a piece of information, you needed to walk down the hall and talk to your work buddy to get an answer. While this may exaggerate the architecture of a token vault call, this is effectively what’s happening every time an application or user is asking for a piece of sensitive data.

They need to call out to the vault and wait for an answer — in other words, there are an extra two hops every time this lookup needs to occur. In today’s world, very few organizations are collecting less information or data. What this means is that vault-based token service is an inherently unscalable approach to data security in the modern world due to big data use cases and the need for high-performance transaction processing.

Data Security Methodology

Another challenge with data tokenization is that the method uses proprietary encryption or a data transform method. In general, proprietary, non-vetted methods to protect data are frowned upon in the cryptography world, especially without proper validation. The general principle which many cryptographers and security analysts operate under is that the hackers should know the cryptographic method, and in spite of knowing the method, the attacker still cannot break the encryption or transform mechanism to decrypt and detokenize sensitive data. With proprietary techniques, there is no such guarantee or peer review of the method.

Summary

There are several methods available to protect your data, whether on-premises or in a cloud-native environment, and there is no panacea for data security. Security challenges for structured and unstructured data can be addressed in a variety of ways, but access to the data is at the heart of the problem. However, there are some distinct advantages and disadvantages to various tokenization, de-identification, encryption, and decryption techniques. When evaluating solutions, you should definitely consider performance and scaling impact, openness and transparency of the security method, and ease of integration and operationalization.

Learn more about our supported encryption modes here.

Learn about Baffle Data Protection Services here.

Request a Demo if you’d like to see the simplicity of Baffle DPS.

Recent Articles

Data Protection for Sharing Data Between High and Low Environments

Privacy-preserving Analytics: The Future of Data Protection Services & Data Security is Now

The Secret to Secure Data? Supporting Your CISO