Guide

The Ultimate Guide to Data Tokenization (and Encryption)

Table of contents

Introduction

Most organizations collect and store vast amounts of sensitive data, including credit card numbers, social security numbers, and medical records. This data is a prime target for cybercriminals, and a data breach can have devastating consequences.

Data tokenization is a security technique that helps protect sensitive data by replacing it with a non-sensitive substitute, called a token. Tokens are unique identifiers that have no intrinsic value and cannot be used to recreate the original data. By tokenizing sensitive data, organizations can reduce the risk of data breaches and comply with data security regulations.

Tokenization is often mandated by major regulations such as PCI DSS and HIPAA

What is Data Tokenization?

Data tokenization is a data security process that replaces sensitive data with a non-sensitive value, called a token. Tokens can be random numbers, strings of characters, or any other non-identifiable value. When sensitive data is tokenized, the original data is stored securely in a token vault. The tokens are then used to represent the sensitive data in applications and databases.

What Data Types is Tokenization Used For?

The most common use cases for data tokenization are to de-identify sensitive data such as:

Credit card numbers
Social security numbers
Medical records
Phone numbers
Passport numbers
Bank account numbers
Email addresses
Birth Dates

Benefits of Data Tokenization

There are many benefits to data tokenization, including:

Enhanced security:

By replacing sensitive data with tokens, organizations can reduce the risk of data breaches. Even if a hacker gains access to a database of tokens, they will not be able to decrypt the tokens and steal the original data.

Improved compliance:

Data tokenization can help organizations comply with data security regulations such as PCI DSS, GDPR, and CCPA. These regulations require organizations to protect sensitive data, and tokenization is an effective way to achieve this

Reduced risk of human error:

Data tokenization can help to reduce the risk of human error by eliminating the need for employees to handle sensitive data.

Increased operational efficiency:

Data tokenization can help to improve operational efficiency by streamlining data security processes.

How Data Tokenization Works

Data tokenization typically involves the following steps:
Sensitive data identification:

The first step is to identify the sensitive data that needs to be tokenized. This may include credit card numbers, social security numbers, bank account numbers, and other personally identifiable information (PII).
Tokenization process: The sensitive data is then replaced with tokens. The tokenization process can be performed using a variety of methods, such as substitution, encryption, or a combination of both.
Token storage: Legacy solutions store tokens and their original values in a token vault. The token vault is a secure repository that is designed to protect tokens from unauthorized access. Modern solutions encrypt the original value eliminating the need for token vaults.
Token management: The tokens are managed throughout their lifecycle. This includes provisioning, rotating, and decommissioning tokens.
Data decryption (when needed): When authorized applications need to access the original data, the tokens are used to retrieve their corresponding original value. In legacy tokenization solutions, the data is retrieved from the token vault. In modern solutions, the original value is decrypted from the encrypted token value and made available to the application (per role-based access controls policies).

Types of Data Tokenization

There are two primary methods of data tokenization:

Vaulted Tokenization: This traditional method involves replacing sensitive data with a token and storing the mapping between the original data and the token in a secure database, often referred to as a "token vault." While effective, vaulted tokenization can introduce additional management overhead and potential vulnerabilities due to the need to protect the token vault.
Vaultless Tokenization: This modern approach eliminates the need for a token vault by using complex algorithms to generate tokens directly from the sensitive data. Format Preserving Encryption (FPE) is a common method used in vaultless tokenization.

Advantages of Format Preserving Encryption (FPE)

FPE is a cryptographic technique that encrypts data while preserving the original data format. This makes it ideal for tokenization because it ensures that the token can be used seamlessly in existing systems without requiring modifications. Key benefits of FPE include:

Format Preservation: Tokens maintain the original data format, ensuring compatibility with existing applications and systems.
Strong Security: FPE offers robust encryption, protecting sensitive data from unauthorized access.
Efficiency: Vaultless tokenization using FPE can be more efficient than traditional vaulted tokenization due to the elimination of the token vault.

Data Tokenization vs. Data Encryption

Data tokenization and data encryption are both data security techniques that can be used to protect sensitive data. In many aspects, data encryption can be thought of as one form of data tokenization.

Data tokenization replaces sensitive data with an anonymized value. The original value is then stored in a token vault.
Data encryption scrambles the sensitive data using a cryptographic key. The encrypted data can be decrypted back to the original data using the same key.

Data encryption has emerged as a more secure and easy to implement option than token vaults, especially with the ability of new encryption algorithms to maintain the format of the original data value.

Data Masking vs. Data Tokenization

Data masking and data tokenization are both techniques used to protect sensitive data, but they differ in their approach and purpose.

Data Masking

Replaces sensitive data with non-sensitive, but realistic-looking data.

Purpose: To protect sensitive data while allowing for data usage in non-production environments like testing, development, and training.
Irreversible: The original data cannot be recovered from masked data.
Examples: Replacing a credit card number with a valid-looking but fake number, or obscuring a name with random characters while maintaining the same format.

Data Tokenization

Replaces sensitive data with a random, meaningless value called a token.

Purpose: To protect sensitive data while allowing for authorized access and processing of the tokenized data.
Reversible: The original data can be recovered from the token.
Examples: Replacing a credit card number with a random string of characters, or substituting a social security number with a token.

Data Tokenization and Compliance

Data tokenization can help organizations comply with a variety of data security regulations, including:

PCI DSS (Payment Card Industry Data Security Standard): PCI DSS requires organizations that store, process, or transmit credit card data to implement a number of security controls. Data tokenization can help organizations comply with PCI DSS by protecting credit card numbers from unauthorized access.

GDPR (General Data Protection Regulation): GDPR is a regulation that applies to the processing of personal data of individuals in the European Union (EU). GDPR requires organizations to implement a number of security controls to protect personal data. Data tokenization can help organizations comply with GDPR by protecting personal data from unauthorized access.

CCPA (California Consumer Privacy Act): CCPA is a regulation that gives consumers in California the right to know what personal data is being collected about them, to delete their personal data, and to opt out of the sale of their personal data. CCPA requires organizations to implement

Modern Data Protection vs Legacy Tokenization Solutions

	Legacy Tokenization	Modern Data Protection Service
Data Transformation	Data replaced with a token	Data tokenized with AES-256 bit keys for Format-Preserving Encryption (FPE)
Security	Vaulted tokenization is vulnerable to frequency attacks like Chosen Plaintext Attack (CPA) and is heavily dependent on the cardinality of the data fields.	Baffle offers FPE which is a mathematical transformation that is accelerated by the AES-NI instruction set and is proven to be cryptographically secure with no data dependence whatsoever
Application Impact	Requires applications to access a cloud-based API which implies that the source code has to be available or being developed in-house and manage keys required for encryption	Requires a network layer connection only eliminating the need to change an existing application completely relieving application developers of the burden of integrating a service or managing keys
Storage Requirements	Vaulted would require a lookup table that is the same size as the database doubling the storage needed. Vaultless requires a smaller table but sacrifices security	FPE does not add any storage needs since it preserves the format of the original data
Performance	Every entry into the data store requires a lookup to ensure that each token is unique. The same process is repeated for incremental additions to the data store as well.	FPE is a mathematical transformation accelerated by AES-NI instructions on a processor so executes a factor of magnitude faster than vaulted or vaultless tokenization.

Resources

See all the resources

Whitepaper

Whitepaper: A Security Analysis Of Tokenization Systems

Download Now

Webinar

Are you ready for PCI DSS 4.0?

Watch Now

Schedule a Demo with the Baffle team

Meet with Baffle team to ask questions and find out how Baffle can protect your sensitive data.

Easy

No application code modification required

Secure

AES cryptographic protection

Fast

Deploy in hours not weeks

Control

Bring your own keys to protect your data in any cloud infrastructure

Protect PII

Anonymize all sensitive data and make data breaches irrelevant

Compliant

Easily conform with the latest requirements of PCI, GDPR, CCPA, NIST, and more.