The Data Journey

By Ameesh Divatia, CEO and co-founder | January 18, 2022

Protecting Data Pipelines End to End, Making Breaches Irrelevant

Deriving intelligence out of enterprise data via Artifical Intelligence and Machine Learning is becoming more and more desired by organizations. Overall cloud data expansion and the footprint, as well as the continued workload moving to cloud is exploding, with Gartner estimating that 75% of databases and storage will be in cloud by 2022.

What’s more, a huge portion of that data remains unencrypted. Enterprise data analysts and business managers need data that they can utilize at any moment, batched or in real-time, transformed and then processed into meaningful information.

Meanwhile, organizations are facing ongoing security and regulatory challenges.

Snowflake and the Economist Intelligence Unit recently shared a new report on trends in cloud and data sharing:

  • 42% of businesses fear data leaks from sharing with or sourcing data to external sources
  • 64% struggle to integrate data from multiple sources
  • 41% fear that the shared data will be used for purposes other than intended

Baffle and Snowflake

Modern data pipeline architectures allow the flow of information between sources and the data’s transformation safely and securely. Monetizing data securely requires organizations to ensure all data moves to the cloud in a de-identified form, can be easily accessed and operated on by business analysts, and is highly performant so as not to hinder productivity.

With Baffle and Snowflake,  data is protected as it moves to the cloud and as it is consumed by the business via a “no code” or “low code” data security mesh. It enables reporting and operations on encrypted data, without breaking application functionality.

Key Encryption and De-identification

Having both security and processing power is what makes Snowflake and Baffle so powerful. Most encryption solutions require a tremendous overhead, as both symmetric and asymmetric encryption methods are heavy users of compute cycles.

The Baffle Data Protection Services seamlessly de-identifies data as it is ingested and staged in the cloud and subsequently migrated to Snowflake.

This encryption works by deploying a transparent data security mesh that encrypts data migrated to cloud storage or staging environments while supporting masking, tokenization, field-level encryption, and role-based access control for Snowflake data.

The solution also provides customers with Bring Your Own Key (BYOK) or Hold Your Own Key (HYOK) private key capabilities so that neither Baffle nor any other third party has access to your data.

Baffle makes it easy for customers to go from any source to any destination and land the data in a de-identified state for any data schema. There is no need to create clones or modify them into a fixed schema or transform the data before moving to the cloud – it is all done on the fly.

Baffle’s transparent, no-code data security mesh allows applications and SQL-type queries to function without any code modifications while securing access and controlling decryption and re-identification of data stored in Snowflake.

“The technology we’ve developed preserves data privacy while simultaneously opening the door for aggregate data analytics and data management models that were previously not thought possible,” said Priyadarshan “PD” Kolte, co-founder and CTO of Baffle.

Baffle’s Data Protection Services enables data-centric protection of sensitive information inside unstructured files or object source data. Baffle’s same field-level encryption and tokenization capabilities for structured data can be applied via file-level encryption to data inside files and object storage to ensure data privacy. Data-Centric File Protection (DFP) simplifies data protection and compliance as part of the business intelligence data pipeline.

Steps in The Data Journey

Here is a simple explanation of the journey your data takes in the Baffle process:

  • On-Premises Database – In this example, we have potentially sensitive data in the form of direct identifiers or personal identifiers such as sales or social security numbers, date of birth, telephone numbers, biometric identifiers, protected health information based on HIPAA privacy rules, or even Internet Protocol (IP) address numbers saved in an on-premises database.
  • AWS DMS – The data will then pass-through Amazon Web Services Database Migration Service as the original data.
  • Baffle Shield – The data set will then hit the Baffle shield, our core technology that we use in the de-identification process, where the de-identified data set will then pass to the S3 bucket.
  • S3 Bucket – Data set lands here in the S3 bucket in a de-identified state ready for staging.
  • Snowflake – Data is piped into Snowflake, and here it can be used for analytics based on authorized roles. The de-identified information is in a format-preserving encryption mode keeping it safe.
  • Once the data from staging is copied to Snowflake, we can then attempt to view the data within an unauthorized role, and the de-identified data will not be seen. If we try the same query using an authorized role, we’ll decrypt that data using the core baffle technology to re-identify the data.
  • Aggregate procedures can be performed here. For example, a user can query the total sum of sales or a range-bound query of sales between 50 and 100. Or we can see the top 10 sellers in descending order. We can even execute a wildcard search on the data set within this environment.

This 5-minute video below demonstrates how we do this on the fly.

Baffle & Snowflake: Making Data Breaches Irrelevant

Baffle’s approach assumes that breaches will happen, either through negligence or a brute force attack, ensuring that unprotected data is never exposed to an attacker. Furthermore, their goal is to encrypt the data as soon as it is produced and maintain encryption even when it is processed in Snowflake. This ensures a powerful, easy-to-use, and efficient way to always protect data.

Baffle was founded to battle the increasing threats to enterprise assets in public and private clouds, making data breaches irrelevant. The combination of Snowflake and Baffle data encryption software helps data make this journey in a zero-trust environment.

The combined power of Baffle & Snowflake allows the fastest and safest way to store, protect, and process your cloud data. This partnership will enable companies to confidently race to the cloud and responsibly harness more data faster for business benefit while safeguarding information.