Redshift Encryption, Security, and Three Questions that Need Answering

By Ameesh Divatia, CEO and co-founder | November 29, 2022

Amazon Redshift is used by tens of thousands of clients to analyze data ranging from terabytes to petabytes and to conduct complicated analytical queries. In addition, Amazon Redshift provides you access to a wide variety of data analytics tools, capabilities for compliance, and even applications for artificial intelligence and machine learning.

With all this data and the powerful insights that can be ascertained from its consumption and analysis, security must be considered of utmost priority.

It is not uncommon for companies of any size to ignore the safety of their information technology resources. As a result, they will likely place a greater focus on performance and implementation over security because these factors can have a more substantial influence immediately. This approach is not advisable, as solutions are available.

But as enterprises transition from on-premises data to cloud data warehouses, security is sometimes overlooked or handled as an afterthought. When it comes to data protection measures, they are frequently thought too disruptive to apps and business analytics initiatives or too hard to execute. Encryption in a data warehouse typically consumes significant overhead or necessitates application code changes to manage the work of de-identifying and re-identifying data for analysis, querying, or reporting.

Security Questions that Need Answering

In order to ensure that security is one of the foundations that supports all IT initiatives, especially those dealing with cloud data warehousing and analytics, we’re going to go over Amazon Redshift Security in this article. This is especially important to remember regarding the safety of a data warehouse, which stores enormous volumes of information. Next, we will dive into the following four queries most asked about Redshift data security.

  • How secure is data stored in Amazon Redshift?
  • Is Redshift encrypted by default?
  • Does Redshift offer field-level encryption, and what solutions are available if not?

To begin, we should outline the security protocols available for the user within the Amazon Web Service (AWS) environment. First, AWS is protected by SSL, IAM roles, and several other network/data access restrictions inside the system and each database that is part of a Redshift cluster. Second, AWS Redshift gives you access to a valuable collection of tools that may govern who can access your data center warehouse clusters and keep them secure. Finally, Redshift offers basic security settings similar to that of traditional databases. Basically, they encrypt data on disk, but that does not protect you from modern attack vectors. For example, an attacker who can access the database using stolen or compromised credentials can see all the data in plain text.

In some use scenarios, such as those requiring granular or dynamic access restrictions, it is difficult to fulfill business objectives using Redshift by itself. This is one of the challenges.

How secure is data stored in Amazon Redshift?

With AWS, you manage the privacy controls of your data, control how your data is used, who has access to it, and how it is encrypted. These capabilities are underpinned by the most flexible and secure cloud computing environment available today.

Redshift uses AWS S3 (Simple Storage Service) as its persistent storage tier. Data stored in S3 is in the clear until an additional configuration step enables ‘at rest’ encryption. This will protect the S3 bucket from being accessed by an unauthorized user who does not have access to the encryption keys. As soon as the data is accessed by Redshift by an authorized user, the data is decrypted, and it is available in the clear in the Redshift memory. If the Redshift administrator’s credentials are compromised, this data is accessible in the clear irrespective of whether ‘at rest’ S3 bucket encryption is enabled.

Is Redshift Data in the S3 bucket Encrypted by Default?

Though Redshift is an encrypted database that offers robust security features to help protect your data, within the S3 bucket, data is not encrypted by default. This type of Redshift database encryption requires an explicit step that would encrypt that data at rest in the S3 bucket. Encrypting Amazon Redshift clusters with sensitive data is optional. However, guidelines and governing bodies, such as PCI DSS, SOX, HIPAA, and others, may compel you to encrypt your data due to standards that establish requirements for managing certain data types. You can utilize AWS Key Management Service (AWS KMS) or a customer-managed key (CMK) encryption when you launch your cluster or change an unencrypted cluster. In addition, Amazon Redshift automatically migrates your data when you use AWS KMS encryption.

Does Redshift offer field-level encryption, and what solutions are available if not?

Although the AWS environment is safe, field-level encryption is not included in the offering. This is something that businesses that are considering a migration to Redshift should keep in mind. If there is a breach, critical and sensitive information may get exposed. It is vital to be aware of the tradeoffs since it may be the case that the benefits of Redshift, such as storage scalability and cloud analytics, outweigh the possible danger. However, it is necessary to be aware of the risk before deciding.

As mentioned before, Baffle offers encryption at the database field level that also protects data in use and offers fine-grained role-based controls to better protect sensitive data from breaches. In addition, they also offer features that can mask data for legitimate use in test and dev environments. This means data engineers and developers have copies of production data that look like actual data, but the values have been obfuscated. Encrypting data on a disk is much easier to do. Hence it is a commodity feature offered by most database vendors and cloud providers. Field-level encryption is more advanced and offers the necessary protection against modern attack vectors. Other third-party solution often requires developers to modify their code and use some APIs. Baffle provides a simpler way to do it without any code changes, which is the game-changer we’ve been hoping for.

Amazon Redshift and Baffle DPS

Baffle DPS is the only solution that offers end-to-end protection of the modern data pipeline, with easy connectivity with AWS Database Migration Services, AWS Glue, AWS S3, and Redshift without requiring any code changes.

Over tens of thousands of Redshift clients can now employ Baffle’s transparent data security mesh to safeguard every stage of the data pipeline as source data is moved to Redshift and used for data analytics. Moreover, this can be done without any noticeable performance impact on the user experience.

Baffle DPS de-identifies the data seamlessly when transferred to the Redshift environment and provides selective re-identification for authorized users while enabling a customer-owned BYOK or HYOK model. In addition, baffle DPS also offers a selective re-identification option. This allows businesses to derive the maximum value and insights from sensitive data stored in the cloud for their operational processes while maintaining compliance with relevant industry standards and regulations.

Baffle Solution Features

Baffle’s Redshift database protection solution is a high-performance product that optimizes and secures data in the pipeline during any work performed in a data warehouse. In addition, Baffle offers the following features:

  • Baffle’s Redshift database protection solution is a high-performance product that optimizes and secures data in the pipeline during virtually any work performed in a data warehouse.
  • Baffle offers seamless integration with database migration services (AWS DMS) or AWS Glue, or other ETL solutions to encrypt, decrypt, or tokenize your data on the fly as it migrates from on-premise to the cloud.
  • Support for multiple modes of encryption, tokenization, or format-preserving encryption (FPE) to simplify cloud data warehouse protection during user activity at the field level.
  • Provides a transparent, no-code data security mesh that allows applications and SQL-type queries to function without any code modifications while securing access and controlling the re-identification of a Redshift database.
  • Allows for Amazon Redshift clusters as you need to query your Amazon S3 data, and the ability to use Amazon Redshift as part of your VPC configuration forno-codets of any size.

Undoubtedly the vast utilization of cloud data warehouses, like AWS Redshift, has only just begun. With more data available and the necessity for fast analysis and consumption, security concerns will increase exponentially. It is good to know that Redshift has protocols to keep much of your data safe and secure. Thankfully, there are resources and services such as Baffle Data Protection Services (DPS) that go the extra mile. Baffle DPS for AWS Redshift protects data in the cloud via a “no code” and “low code” data security mesh. Baffle can:

  • De-identify the data pipeline end-to-end “on-the-fly”
  • Selectively re-identify data without any application changes.
  • Conduct operations on encrypted data, such as mathematical operations and wildcard search.
  • Enable Adaptive Access Control and Dynamic Data Masking to control who can view what data.
  • And provide blazing-fast performance that does not slow down applications.

Baffle Data Protection Services for AWS Redshift is the only purpose-built software solution that provides a more transparent and easily deployable solution to help you deliver security in lockstep with the needs of your business.

If you have questions about how Baffle DPS can work for you, schedule a live demo with one of our experts today.