A Case Study for Protecting Files with Sensitive Data in the Cloud

By Sumandra Majee, Chief Product Architect | April 8, 2024

Financial institutions and healthcare providers deal with a vast amount of sensitive data like PII and PHI data, from Social Security numbers (SSN) to credit card information and medical records. Often, this data resides in various file formats like Excel, CSV, and others, stored on-premises or in cloud object storage. As companies migrate these files to the cloud, ensuring their security and compliance becomes paramount.

This blog post explores a real-world scenario where a large financial institution leveraged Baffle Data Protection platform to address PCI compliance challenges during their cloud migration. We’ll delve into the problems they faced, how Baffle’s unique features helped them achieve compliance without workflow disruptions, and the key factors that influenced their decision.

Customer Challenge: 

In this specific case, the financial institution had millions of customer transaction records stored in file servers mostly as CSV files and quite a few as unstructured files. New files were constantly generated, and credit card information needed protection before moving to the cloud and ready for downstream consumption. The requirements were:

  • No workflow or application disruption. Many software based encryption libraries require application modification. 
  • Security team required complete control over the data security.
  • Security team required  strong fine grained authorization and access controls to limit visibility of data.
  • Infrastructure team required a high performance elastic solution that can be under their control.

The Baffle Solution: Transparent Data Security

Baffle emerged as the ideal solution for this financial institution due to its set of unique capabilities:

  • Seamless Integration: Unlike SaaS solutions that require API integration and workflow modifications, Baffle data protection acts as a transparent proxy. The institution simply pointed their Managed File Transfer (MFT) tool to Baffle’s built-in SFTP interface, eliminating the need for complex application of infrastructure changes.
  • Field-Level Data Protection: The financial institution created a set of policies to tokenize credit card information in specific columns, ensuring only sensitive data is protected.

In this case the security team tokenized the credit card information using Format Preserving Encryption (FPE) technique. Encrypted data appears like valid credit card numbers, passes LUHN check ensuring seamless integration with existing applications and human users. Authorized users can decrypt the original data quickly.

  • Advanced Data Protection Policy: Ability to perform multiple data specific formats preserving encryption, masking  and strong field level access control.
  • Scriptable Extensions:  Leveraged Baffle Shield’s scripting engine to implement custom logic for processing credit card information and then insert those information as a separate column. All without affecting any application.
  • Ability to understand complex file format: They have a variety of CSV files with various types  of column delimiters and some of those delimiters are multi character. Many files were completely unstructured.  Some files contained two or more CSV tables in a single file, so the credit card information is in multiple different places in the single file. These file formats are non-standard and complex but Baffle’s policy engine is built to easily describe complex unstructured and semi structured.

The figure below captures overall flow.

The original clear files were first stored on a secure on prem file server. Managed File Transfer (MFT) is a tool that automates file transfers over SFTP, FTP. MFT picks files from the server and then pushes the file thru Baffle Shield which acts as SFTP proxy. The file name is used to select a security team managed data protection policy. The result is a transformed file where the credit card information is tokenized and the shield then pushes it to another cloud based MFT. Once the transformed files are available, application software can utilize the fields and file safely for a variety of tasks without worrying about compliance requirements as data moves between applications, data lake and analytics tier. 

Baffle Data Protection Solution: Protecting your data on your terms

Baffle is a powerful multi-tenant containerized proxy. The proxy can enforce data-centric protection driven by a sophisticated policy engine. Containerized software allows customers to  deploy it directly on a Virtual Private Cloud (VPC) environment or on Prem.

It can intercept aka proxy,

  • SFTP flow
  • HTTP(s) flow
  • Emulate REST API

Baffle also provides a set of REST API to enable any application to easily integrate with this service. It integrates with a wide variety of cloud storage services like AWS S3, Azure Blob Storage and classical SFTP service for example. 

It has ability to understand a wide variety of payload (semi structured and unstructured) and file types e.g. CSV, JSON, XML, Parquet for example.

Provides powerful encryption techniques for data protection,

  • Full File encryption: Uses industry standard AES encryption.
  • Format-preserving encryption (FPE): Protects data while maintaining its original format, ensuring seamless integration with existing applications.
  • Static and Dynamic masking: Redacts sensitive data as necessary.

Baffle integrates with various Key Management Services (KMS) such as AWS KMS, Azure Key Vault, and HashiCorp Vault. It utilizes envelope encryption, a secure method where a random Data Encryption Key (DEK) encrypts your data. This DEK is then further protected by a separate Key Encryption Key (KEK) stored securely within your chosen KMS. This two-layered approach ensures the utmost security for your data and allows for key rotation without ever compromising the encrypted information itself.

Baffle also provides sophisticated access control mechanisms that extend down to the field level. This granular control allows you to define exactly who can access what data, and when along with audit capability. 

 Imagine granting user Joe with a production role the ability to decrypt credit card information only from files matching the pattern “Transaction-record*.csv” and only between 9AM and 5PM.

Why Baffle?

Several factors influenced the financial institution’s decision to choose Baffle:

  • Software, not SaaS: Maintaining control over data security within their own environment was crucial. Baffle offered on-premises and VPC deployment options, ideal for hybrid cloud strategies.
  • Seamless Integration: Avoiding workflow disruptions was paramount. Baffle’s transparent proxy approach eliminated the need for complex API integrations and workflow changes.
  • Support for a wide variety of data formats: The ability to handle various file formats, including custom structures, ensured protection for today and in future.
  • Policy Engine and Scriptable Extensions: Granular policy control and custom scripting capabilities addressed the institution’s unique requirements.


The financial institution successfully achieved PCI compliance with Baffle, protecting sensitive data stored in files during their cloud migration. Baffle’s unique features ensured a seamless integration without disrupting existing workflows. This case study exemplifies how Baffle empowers organizations to navigate complex data security challenges in today’s hybrid cloud environments.