Skip to content

RFC: Sensitive data masking utility #1858

Closed
@seshubaws

Description

@seshubaws

Is this related to an existing feature request or issue?

#1173

Which AWS Lambda Powertools utility does this relate to?

Other

Summary

Customers would like to obfuscate incoming data for known fields that contain PII, so that they're not passed downstream or accidentally logged. With the increase of batch processing utilities and GDPR, this is one of the hardest tasks for customers today, specially when considering multi-account users.

AWS Encryption SDK is a good starting point but it can be too complex for the average developer, data engineer, or DevOps persona to use. As such, it is highly requested that the Powertools library have a utility to easily mask and/or encrypt sensitive data.

Use case

The use case for this utility would be for developers who want to mask or encrypt sensitive data such as names, addresses, SSNs, etc. in order for them to not be logged in CloudWatch so such data is not compromised, and so that downstream systems like S3, DynamoDB, RDBMS, etc. will not need any additional work on handling PII data.

Additionally, developers should be able to recover encrypted sensitive data to its original form so that they can handle sensitive requests around that data on a as-needed basis.

Proposal

The data masking utility should allow users to mask data, or encrypt and decrypt it. If they would like to encrypt their data, customers should be able to decide for themselves which encryption provider they want to use, though we will provide an out-of-the-box integration with the AWS Encryption SDK. The below code snippet is a rudimentary look at how this utility can be used and how it will function.

Usage

from aws_lambda_powertools.utilities.data_masking.constants import KMS_KEY_ARN
from aws_lambda_powertools.utilities.data_masking import DataMasking
from aws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdk import AwsEncryptionSdkProvider


def lambda_handler():
    
   data_masker = DataMasking()
    data = {
              "id": 1,
               "name": "John Doe",
               "age": 30,
               "email": "johndoe@example.com",
               "address": {
                    "street": "123 Main St", 
                    "city": "Anytown", 
                    "state": "CA", 
                    "zip": "12345"
               },
      }

    masked = data_masker.mask(data=data, fields=["email", "address.street"])
    """
    masked = {
              "id": 1,
               "name": "John Doe",
               "age": 30,
               "email": "*****",
               "address": {
                    "street": "*****", 
                    "city": "Anytown", 
                    "state": "CA", 
                    "zip": "12345"
               },
      }
   """

    encryption_provider = AwsEncryptionSdkProvider(keys=[KMS_KEY_ARN])
    data_masker = DataMasking(provider=encryption_provider)

    encrypted = data_masker.encrypt(data=data, fields=["email", "address.street"])
    """
    encrypted = {
              "id": 1,
               "name": "John Doe",
               "age": 30,
               "email": "InRoaXMgaXMgYSBzdHJpbmciHsLZGx2na-XzP_TB5Bf2LNU1bLc",
               "address": {
                    "street": "XMgYSB_KDddaDJYMb-JpbmGnagTklwQ-msdaDLP", 
                    "city": "Anytown", 
                    "state": "CA", 
                    "zip": "12345"
               },
      }
   """
 
    decrypted = data_masker.decrypt(data=encrypted, fields=["email", "address.street"])
    """
    data = {
              "id": 1,
               "name": "John Doe",
               "age": 30,
               "email": "johndoe@example.com",
               "address": {
                    "street": "123 Main St", 
                    "city": "Anytown", 
                    "state": "CA", 
                    "zip": "12345"
               },
      }
   """

AWS Encryption SDK

The AWS Encryption SDK is a client-side encryption library that makes it easier to encrypt and decrypt data of any type in your application. The Encryption SDK is available in all the languages that Powertools supports. You can use it with customer master keys in AWS Key Management Service (AWS KMS), though the library does not require any AWS service. When you encrypt data, the SDK returns a single, portable encrypted message that includes the encrypted data and encrypted data keys. This object is designed to work in many different types of applications. You can specify many of the encryption options, including selecting an encryption and signing algorithm.

Latencies
Latencies for using this utility with the AWS Encryption SDK in Lambda functions configured with 128MB, 1024MB, and 1769MB, respectively.

plugins.metrics-by-endpoint.response_time./Prod/function128:
  min: ......................................................................... 0
  max: ......................................................................... 5373
  median: ...................................................................... 561.2
  p95: ......................................................................... 713.5
  p99: ......................................................................... 1620
plugins.metrics-by-endpoint.response_time./Prod/function1024:
  min: ......................................................................... 0
  max: ......................................................................... 5039
  median: ...................................................................... 89.1
  p95: ......................................................................... 144
  p99: ......................................................................... 232.8
plugins.metrics-by-endpoint.response_time./Prod/function1769:
  min: ......................................................................... 0
  max: ......................................................................... 1726
  median: ...................................................................... 82.3
  p95: ......................................................................... 133
  p99: ......................................................................... 183.1

Custom encryption
If customer would like to use another encryption provider, or define their own encrypt and decrypt functions, we will define an interface that the customer can implement and pass in to the DataMaskingUtility class.

from aws_lambda_powertools.utilities.data_masking.provider import BaseProvider
from itsdangerous.url_safe import URLSafeSerializer

class MyCustomEncryption(BaseProvider):
    def __init__(self, secret):
        self.secret = URLSafeSerializer(secret)

    def encrypt(self, value: str) -> str:
        if value is None:
            return value
        return self.secret.dumps(value)

    def decrypt(self, value: str) -> str:
        if value is None:
            return value
        return self.secret.loads(value)


def lambda_handler():
    data = {
        "id": 1,
        "name": "John Doe",
        "age": 30,
        "email": "johndoe@example.com",
        "address": {
            "street": "123 Main St", 
            "city": "Anytown", 
            "state": "CA", 
            "zip": "12345"
        },
    }

    masking_provider = MyCustomEncryption(secret="secret-key")
    data_masker = DataMasking(provider=masking_provider)

    encrypted = data_masker.encrypt(data, fields=["email", "address.street"])
    """
    encrypted = {
        "id": 1,
        "name": "John Doe",
        "age": 30,
        "email": "InRoaXMgaXMgYSBzdHJpbmciHsLZGx2na-XzP_TB5Bf2LNU1bLc",
        "address": {
            "street": "XMgYSB_KDddaDJYMb-JpbmGnagTklwQ-msdaDLP", 
            "city": "Anytown", 
            "state": "CA", 
            "zip": "12345"
        },
    }
   """

   decrypted = data_masker.decrypt(data=encrypted, fields=["email", "address.street"])
    """
    decrypted = {
              "id": 1,
               "name": "John Doe",
               "age": 30,
               "email": "johndoe@example.com",
               "address": {
                    "street": "123 Main St", 
                    "city": "Anytown", 
                    "state": "CA", 
                    "zip": "12345"
               },
      }
   """

Out of scope

Traversing an arbitrary dictionary will be out of scope for the initial launch of this tool. This feature will be to receive instructions as to where in the given dictionary it should mask/unmask the data.

We still need to determine the most efficient method of taking input JSON path masking or encrypting the value at that path. JMESPath or JSON path can be considered for simple use cases but we need to find the fastest method.

Potential challenges

  • We will need to discuss the design and implementation with a security expert to ensure safety.
  • Also need to determine if we should support envelope encryption for customers using the AWS Encryption SDK
    • Envelope encryption is the practice of encrypting plaintext data with a data key, and then encrypting the data key under another key. But, eventually, one key must remain in plaintext so you can decrypt the keys and your data. This top-level plaintext key encryption key is known as the root key. You can store root keys in AWS KMS, known as AWS KMS keys.
  • Need to decide what the best name for the methods are. Should they still be called encrypt and decrypt even in the case where the customer only masks and won't be able to decrypt later? Should we have a mask method in the interface that is always accessible so that users can irretrievably mask some data and also encrypt some data?

Dependencies and Integrations

Integration with the AWS Encryption SDK.

Alternative solutions

No response

Acknowledgment

Metadata

Metadata

Assignees

Labels

RFCdata-maskingSensitive Data Masking feature

Type

No type

Projects

Status

Shipped

Relationships

None yet

Development

No branches or pull requests

Issue actions