Skip to content

RFC: Add support for Pydantic v2 #2672

Closed
@leandrodamascena

Description

@leandrodamascena

Is this related to an existing feature request or issue?

#2427

Which Powertools for AWS Lambda (Python) utility does this relate to?

Parser

Summary

This RFC proposes changes to Parser’s Pydantic Models to support both Pydantic V2 and V1 without breaking changes.

In Powertools v3 (~EOL, date not settled yet), we will then remove support for Pydantic V1, and update our Parser’s dependency to use Pydantic v2.

stateDiagram-v2
    Models: Update Parser Pydantic models
    DocsV2: Document how to bring Pydantic v2
    Compatibility: Ensure Pydantic v1 and v2 can coexist
    POC: Make POC available for beta testers
    Compatibility --> Models
    Models --> DocsV2
    DocsV2 --> POC
    POC --> PR

Loading

Compatibility table

Pydantic v1 Pydantic v2 V2 deprecation V2 removed In use? Code change required
@validator @field_validator ✔️ ✔️ ✔️
@root_validator @model_validator ✔️ ✔️ ✔️
.parse_obj() model_validate() ✔️ ✔️
.json() model_dump_json() ✔️ ✔️
.dict() model_dump() ✔️ ✔️
.parse_raw() model_validate_json() ✔️ ✔️

Use case

Pydantic V2, the latest version of Pydantic, has been launched with enhanced features and performances. Customers using Powertools for AWS Lambda (Python) in their AWS Lambda functions express interest to update it to Pydantic V2. To meet the customer's needs, the integration of Pydantic V2 in Powertools is required.

This integration enables customers to update their workloads to use Pydantic V2 while ensuring a seamless transition for existing customers using Pydantic V1.


Proposal

To accommodate both Pydantic v1 and Pydantic v2, the proposed design is to refactor the parser models with minimal changes. These changes should work in both versions of Pydantic and be transparent to users.

TL;DR: Proposed actions summary

  • Set default value of None for optional fields
  • Keep @validator and @root_validator deprecated features with a note to remove in V3
  • Keep @parse_obj deprecated features with a note to remove in V3
  • Investigate empty Dicts/List fail validation
  • Investigate .dict() and .json() removal
  • Handle TypeError validation
  • Investigate datetime coercion
  • Investigate development dependencies conflict
  • Document how to bring Pydantic v2 with Powertools
  • Document how to disable deprecation warnings for Pydantic V2
  • Create PR with POC

Optional fields must have default value

Pydantic v1

class SqsAttributesModel(BaseModel):
    ApproximateReceiveCount: str
    ApproximateFirstReceiveTimestamp: datetime
    MessageDeduplicationId: Optional[str]
    MessageGroupId: Optional[str]
    SenderId: str
    SentTimestamp: datetime
    SequenceNumber: Optional[str]
    AWSTraceHeader: Optional[str]

Pydantic v2

class SqsAttributesModel(BaseModel):
    ApproximateReceiveCount: str
    ApproximateFirstReceiveTimestamp: datetime
    MessageDeduplicationId: Optional[str] = None
    MessageGroupId: Optional[str] = None
    SenderId: str
    SentTimestamp: datetime
    SequenceNumber: Optional[str] = None
    AWSTraceHeader: Optional[str] = None

Validators are deprecated

Both @root_validator and @validator validators are deprecated in Pydantic V2 and will be removed in Pydantic V3. Pydantic recommends using the new @model_validator and @field_validator validators. However, we can continue using the deprecated validators in Powertools to avoid breaking changes and plan their removal for Powertools v3.

Pydantic v1

@root_validator(allow_reuse=True)
def check_message_id(cls, values):
    message_id, event_type = values.get("messageId"), values.get("eventType")
    if message_id is not None and event_type != "MESSAGE":
        raise TypeError("messageId is available only when the `eventType` is `MESSAGE`")
    return values

Pydantic v2

@root_validator(allow_reuse=True, skip_on_failure=True)
def check_message_id(cls, values):
    message_id, event_type = values.get("messageId"), values.get("eventType")
    if message_id is not None and event_type != "MESSAGE":
        raise ValueError("messageId is available only when the `eventType` is `MESSAGE`")
    return values

Another alternative is to check the Pydantic version and add the conditional import. We will also need to create a function to wrap the validator decorators and check the Pydantic version. This workaround may make the models harder to read and understand for maintenance purposes.

Powertools Layer

The Powertools Layer is built with Pydanticv1 and this can be a potential problem if the customer uses our layer and brings Pydantic v2 as an external dependency.

In tests with Lambda Powertools Layer + Pydanticv2 installed as an external dependency, Lambda first includes the /var/task path, that is, the external dependency will have preference over the one used in the Layer and it allows the customer brings their preferred Pydantic version.

image

Path

{"level":"INFO","location":"<module>:8","message":["/var/task","/opt/python/lib/python3.10/site-packages","/opt/python","/var/runtime","/var/lang/lib/python310.zip","/var/lang/lib/python3.10","/var/lang/lib/python3.10/lib-dynload","/var/lang/lib/python3.10/site-packages","/opt/python/lib/python3.10/site-packages"],"timestamp":"2023-07-05 09:04:52,691+0000","service":"service_undefined"}

Pydantic Version

{"level":"INFO","location":"lambda_handler:11","message":"Pydantic version -> 2.0.1","timestamp":"2023-07-05 09:04:52,694+0000","service":"service_undefined","xray_trace_id":"1-64a53234-0ca06edc18c523524775237c"}

Warnings

  • To ensure a smooth transition and minimize disruptions for our users, we have temporarily suppressed the PydanticDeprecatedSince20 and PydanticDeprecationWarning warnings (related to these functions). This allows existing applications to continue functioning as expected without outputting warnings.

  • If needed, you can enable the warnings yourself with something like the code below. Reference: https://docs.python.org/3/library/warnings.html

from aws_lambda_powertools.utilities.parser import event_parser, BaseModel, envelopes
from aws_lambda_powertools.utilities.parser.models import (
    SqsModel,
)

from aws_lambda_powertools import Logger
import pydantic

import warnings
warnings.simplefilter('default')

Out of scope

Refactorings involving breaking change for customers who want to use v1. If there is something that involves breaking, it will be left out of this change.

We could this opportunity to evaluate the performance of Pydantic V2 and potentially enhance the performance of Parser utility - not required tho.


Potential challenges

Most of the challenges were addressed and I was able to use the Powertools for AWS Lambda (Python) Parser utility with Pydantic v2 with several models. But some challenges still need to be understood whether this is a breaking change or not.

Working with datetime fields

In Pydantic v1, when using datetime fields, the UTC offset is included and the tests work fine. However, in Pydantic v2, the UTC offset is not included, causing our tests to fail.

Codebase

from datetime import datetime

from pydantic import BaseModel
import pydantic


class Model(BaseModel):
    datefield: datetime = None

epoch_time = 1659687279885

m = Model(
    datefield=epoch_time,
)

print(f"Pydantic version -> {pydantic.__version__}")
print(f"Raw epoch time -> {epoch_time}")
print(f"Raw pydantic field -> {m.datefield}")
print(f"Pydantic converted epoch time -> {int(round(m.datefield.timestamp() * 1000))}")

assert epoch_time == int(round(m.datefield.timestamp() * 1000))

Pydantic v1

/tmp/pydantic2 via 🐍 v3.10.6 (.env) on ☁️  (us-east-1) 
❯ python v1.py
Pydantic version -> 1.10.11
Raw epoch time -> 1659687279885
Raw pydantic field -> 2022-08-05 08:14:39.885000+00:00
Pydantic converted epoch time -> 1659687279885

Pydantic v2

/tmp/pydantic2 via 🐍 v3.10.6 (.env) on ☁️  (us-east-1) 
❯ python v2.py
Pydantic version -> 2.0.1
Raw epoch time -> 1659687279885
Raw pydantic field -> 2022-08-05 08:14:39.885000
Pydantic converted epoch time -> 1659683679885
Traceback (most recent call last):
  File "/tmp/pydantic2/v2.py", line 21, in <module>
    assert epoch_time == int(round(m.datefield.timestamp() * 1000))
AssertionError

Batch processing

Some Batch processing tests are failing and I need to investigate why.

FAILED tests/functional/test_utilities_batch.py::test_batch_processor_model_with_partial_validation_error - AttributeError: 'NoneType' object has no attribute 'message_id'
FAILED tests/functional/test_utilities_batch.py::test_batch_processor_dynamodb_context_model_with_partial_validation_error - AttributeError: 'NoneType' object has no attribute 'dynamodb'
FAILED tests/functional/test_utilities_batch.py::test_batch_processor_kinesis_context_parser_model_with_partial_validation_error - AttributeError: 'NoneType' object has no attribute 'kinesis'
FAILED tests/functional/test_utilities_batch.py::test_async_batch_processor_model_with_partial_validation_error - AttributeError: 'NoneType' object has no attribute 'message_id'
FAILED tests/functional/test_utilities_batch.py::test_async_batch_processor_dynamodb_context_model_with_partial_validation_error - AttributeError: 'NoneType' object has no attribute 'dynamodb'
FAILED tests/functional/test_utilities_batch.py::test_async_batch_processor_kinesis_context_parser_model_with_partial_validation_error - AttributeError: 'NoneType' object has no attribute 'kinesis'

Developer environment

Since some of our dependencies (cfn-lint / aws-sam-translator) have a requirement of Pydantic v1, we'll need to remove them from our development environment in order to accommodate Pydantic v2.

❯ poetry add "pydantic>=2.0"                       

Updating dependencies
Resolving dependencies... (0.6s)

Because no versions of aws-sam-translator match >1.68.0,<1.69.0 || >1.69.0,<1.70.0 || >1.70.0
 and aws-sam-translator (1.68.0) depends on pydantic (>=1.8,<2.0), aws-sam-translator (>=1.68.0,<1.69.0 || >1.69.0,<1.70.0 || >1.70.0) requires pydantic (>=1.8,<2.0).
And because aws-sam-translator (1.69.0) depends on pydantic (>=1.8,<2.0), aws-sam-translator (>=1.68.0,<1.70.0 || >1.70.0) requires pydantic (>=1.8,<2.0).
And because aws-sam-translator (1.70.0) depends on pydantic (>=1.8,<2.0)
 and cfn-lint (0.77.10) depends on aws-sam-translator (>=1.68.0), cfn-lint (0.77.10) requires pydantic (>=1.8,<2.0).
So, because aws-lambda-powertools depends on both pydantic (>=2.0) and cfn-lint (0.77.10), version solving failed.

Dependency resolution for Pydantic v1 and v2

Customers will run into a conflict when having either requirements.txt:

All Powertools features and Pydantic v2

aws-lambda-powertools[all]
pydantic # or pydantic>=2

Powertools parser feature and Pydantic v2

aws-lambda-powertools[parser]
pydantic # or pydantic>=2

Recommendation

We should include a new section in the documentation to explain how to use Pydantic v2 with Parser.

For example, customers should refrain from using [all] or [parser] when bringing Pydantic v2 as part of their dependencies.

  • aws-lambda-powertools[all] becomes aws-lambda-powertools[validation,tracer,aws-sdk]

Because of the Optional[str] = None breaking change in v2, we should keep our pydantic pinning to v1 until we launch v3 and move away. We cannot guarantee a customer is using additional Pydantic v1 features through Powertools - or followed our docs to the letter.

This also gives us room to recommend disabling warnings for deprecated features we're keeping for backwards compatibility (e.g., validators). This prevents Pydantic littering customers' logs ($$) when bringing Pydantic v2.


Ad-hoc test for pydantic v2 dep

Until Powertools V3 is a thing, we should guard against backwards incompatibility for newer Pydantic models, or changes to existing models that might be contributed externally from a Pydantic v2 customer.

Recommendation

Setup a new temporary GitHub Action workflow to trigger on changes to Parser's models. We can use Nox to streamline dependencies and easily trigger Parser's unit tests.

Alternatively, within this temporary workflow, we could call make dev, remove Pydantic, install Pydantic v2`, and run Parser's unit tests.

Nox's benefit is that it's more stable, easier to reason, and it will lead us to address an unrelated area we aren't testing today -- tests w/o optional dependencies (e.g., Batch without parser code).


No response

Alternative solutions

No response

Acknowledgment

Metadata

Metadata

Labels

Type

No type

Projects

Status

Shipped

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions