Description
Is this related to an existing feature request or issue?
Which Powertools for AWS Lambda (Python) utility does this relate to?
Parser
Summary
This RFC proposes changes to Parser’s Pydantic Models to support both Pydantic V2 and V1 without breaking changes.
In Powertools v3 (~EOL, date not settled yet), we will then remove support for Pydantic V1, and update our Parser’s dependency to use Pydantic v2.
stateDiagram-v2
Models: Update Parser Pydantic models
DocsV2: Document how to bring Pydantic v2
Compatibility: Ensure Pydantic v1 and v2 can coexist
POC: Make POC available for beta testers
Compatibility --> Models
Models --> DocsV2
DocsV2 --> POC
POC --> PR
Compatibility table
Pydantic v1 | Pydantic v2 | V2 deprecation | V2 removed | In use? | Code change required |
---|---|---|---|---|---|
@validator | @field_validator | ✔️ | ✔️ | ✔️ | |
@root_validator | @model_validator | ✔️ | ✔️ | ✔️ | |
.parse_obj() | model_validate() | ✔️ | ✔️ | ||
.json() | model_dump_json() | ✔️ | ✔️ | ||
.dict() | model_dump() | ✔️ | ✔️ | ||
.parse_raw() | model_validate_json() | ✔️ | ✔️ |
Use case
Pydantic V2, the latest version of Pydantic, has been launched with enhanced features and performances. Customers using Powertools for AWS Lambda (Python) in their AWS Lambda functions express interest to update it to Pydantic V2. To meet the customer's needs, the integration of Pydantic V2 in Powertools is required.
This integration enables customers to update their workloads to use Pydantic V2 while ensuring a seamless transition for existing customers using Pydantic V1.
Proposal
To accommodate both Pydantic v1 and Pydantic v2, the proposed design is to refactor the parser models with minimal changes. These changes should work in both versions of Pydantic and be transparent to users.
TL;DR: Proposed actions summary
- Set default value of
None
for optional fields - Keep
@validator
and@root_validator
deprecated features with a note to remove in V3 - Keep
@parse_obj
deprecated features with a note to remove in V3 - Investigate empty Dicts/List fail validation
- Investigate .dict() and .json() removal
- Handle TypeError validation
- Investigate
datetime
coercion - Investigate development dependencies conflict
- Document how to bring Pydantic v2 with Powertools
- Document how to disable deprecation warnings for Pydantic V2
- Create PR with POC
Optional fields must have default value
Pydantic v1
class SqsAttributesModel(BaseModel):
ApproximateReceiveCount: str
ApproximateFirstReceiveTimestamp: datetime
MessageDeduplicationId: Optional[str]
MessageGroupId: Optional[str]
SenderId: str
SentTimestamp: datetime
SequenceNumber: Optional[str]
AWSTraceHeader: Optional[str]
Pydantic v2
class SqsAttributesModel(BaseModel):
ApproximateReceiveCount: str
ApproximateFirstReceiveTimestamp: datetime
MessageDeduplicationId: Optional[str] = None
MessageGroupId: Optional[str] = None
SenderId: str
SentTimestamp: datetime
SequenceNumber: Optional[str] = None
AWSTraceHeader: Optional[str] = None
Validators are deprecated
Both @root_validator
and @validator
validators are deprecated in Pydantic V2 and will be removed in Pydantic V3. Pydantic recommends using the new @model_validator
and @field_validator
validators. However, we can continue using the deprecated validators in Powertools to avoid breaking changes and plan their removal for Powertools v3.
Pydantic v1
@root_validator(allow_reuse=True)
def check_message_id(cls, values):
message_id, event_type = values.get("messageId"), values.get("eventType")
if message_id is not None and event_type != "MESSAGE":
raise TypeError("messageId is available only when the `eventType` is `MESSAGE`")
return values
Pydantic v2
@root_validator(allow_reuse=True, skip_on_failure=True)
def check_message_id(cls, values):
message_id, event_type = values.get("messageId"), values.get("eventType")
if message_id is not None and event_type != "MESSAGE":
raise ValueError("messageId is available only when the `eventType` is `MESSAGE`")
return values
Another alternative is to check the Pydantic version and add the conditional import. We will also need to create a function to wrap the validator decorators and check the Pydantic version. This workaround may make the models harder to read and understand for maintenance purposes.
Powertools Layer
The Powertools Layer is built with Pydanticv1 and this can be a potential problem if the customer uses our layer and brings Pydantic v2 as an external dependency.
In tests with Lambda Powertools Layer + Pydanticv2 installed as an external dependency, Lambda first includes the /var/task
path, that is, the external dependency will have preference over the one used in the Layer and it allows the customer brings their preferred Pydantic version.
Path
{"level":"INFO","location":"<module>:8","message":["/var/task","/opt/python/lib/python3.10/site-packages","/opt/python","/var/runtime","/var/lang/lib/python310.zip","/var/lang/lib/python3.10","/var/lang/lib/python3.10/lib-dynload","/var/lang/lib/python3.10/site-packages","/opt/python/lib/python3.10/site-packages"],"timestamp":"2023-07-05 09:04:52,691+0000","service":"service_undefined"}
Pydantic Version
{"level":"INFO","location":"lambda_handler:11","message":"Pydantic version -> 2.0.1","timestamp":"2023-07-05 09:04:52,694+0000","service":"service_undefined","xray_trace_id":"1-64a53234-0ca06edc18c523524775237c"}
Warnings
-
To ensure a smooth transition and minimize disruptions for our users, we have temporarily suppressed the
PydanticDeprecatedSince20
andPydanticDeprecationWarning
warnings (related to these functions). This allows existing applications to continue functioning as expected without outputting warnings. -
If needed, you can enable the warnings yourself with something like the code below. Reference: https://docs.python.org/3/library/warnings.html
from aws_lambda_powertools.utilities.parser import event_parser, BaseModel, envelopes
from aws_lambda_powertools.utilities.parser.models import (
SqsModel,
)
from aws_lambda_powertools import Logger
import pydantic
import warnings
warnings.simplefilter('default')
Out of scope
Refactorings involving breaking change for customers who want to use v1. If there is something that involves breaking, it will be left out of this change.
We could this opportunity to evaluate the performance of Pydantic V2 and potentially enhance the performance of Parser utility - not required tho.
Potential challenges
Most of the challenges were addressed and I was able to use the Powertools for AWS Lambda (Python) Parser utility with Pydantic v2 with several models. But some challenges still need to be understood whether this is a breaking change or not.
Working with datetime fields
In Pydantic v1, when using datetime
fields, the UTC offset is included and the tests work fine. However, in Pydantic v2, the UTC offset is not included, causing our tests to fail.
Codebase
from datetime import datetime
from pydantic import BaseModel
import pydantic
class Model(BaseModel):
datefield: datetime = None
epoch_time = 1659687279885
m = Model(
datefield=epoch_time,
)
print(f"Pydantic version -> {pydantic.__version__}")
print(f"Raw epoch time -> {epoch_time}")
print(f"Raw pydantic field -> {m.datefield}")
print(f"Pydantic converted epoch time -> {int(round(m.datefield.timestamp() * 1000))}")
assert epoch_time == int(round(m.datefield.timestamp() * 1000))
Pydantic v1
/tmp/pydantic2 via 🐍 v3.10.6 (.env) on ☁️ (us-east-1)
❯ python v1.py
Pydantic version -> 1.10.11
Raw epoch time -> 1659687279885
Raw pydantic field -> 2022-08-05 08:14:39.885000+00:00
Pydantic converted epoch time -> 1659687279885
Pydantic v2
/tmp/pydantic2 via 🐍 v3.10.6 (.env) on ☁️ (us-east-1)
❯ python v2.py
Pydantic version -> 2.0.1
Raw epoch time -> 1659687279885
Raw pydantic field -> 2022-08-05 08:14:39.885000
Pydantic converted epoch time -> 1659683679885
Traceback (most recent call last):
File "/tmp/pydantic2/v2.py", line 21, in <module>
assert epoch_time == int(round(m.datefield.timestamp() * 1000))
AssertionError
Batch processing
Some Batch processing tests are failing and I need to investigate why.
FAILED tests/functional/test_utilities_batch.py::test_batch_processor_model_with_partial_validation_error - AttributeError: 'NoneType' object has no attribute 'message_id'
FAILED tests/functional/test_utilities_batch.py::test_batch_processor_dynamodb_context_model_with_partial_validation_error - AttributeError: 'NoneType' object has no attribute 'dynamodb'
FAILED tests/functional/test_utilities_batch.py::test_batch_processor_kinesis_context_parser_model_with_partial_validation_error - AttributeError: 'NoneType' object has no attribute 'kinesis'
FAILED tests/functional/test_utilities_batch.py::test_async_batch_processor_model_with_partial_validation_error - AttributeError: 'NoneType' object has no attribute 'message_id'
FAILED tests/functional/test_utilities_batch.py::test_async_batch_processor_dynamodb_context_model_with_partial_validation_error - AttributeError: 'NoneType' object has no attribute 'dynamodb'
FAILED tests/functional/test_utilities_batch.py::test_async_batch_processor_kinesis_context_parser_model_with_partial_validation_error - AttributeError: 'NoneType' object has no attribute 'kinesis'
Developer environment
Since some of our dependencies (cfn-lint
/ aws-sam-translator
) have a requirement of Pydantic v1, we'll need to remove them from our development environment in order to accommodate Pydantic v2.
❯ poetry add "pydantic>=2.0"
Updating dependencies
Resolving dependencies... (0.6s)
Because no versions of aws-sam-translator match >1.68.0,<1.69.0 || >1.69.0,<1.70.0 || >1.70.0
and aws-sam-translator (1.68.0) depends on pydantic (>=1.8,<2.0), aws-sam-translator (>=1.68.0,<1.69.0 || >1.69.0,<1.70.0 || >1.70.0) requires pydantic (>=1.8,<2.0).
And because aws-sam-translator (1.69.0) depends on pydantic (>=1.8,<2.0), aws-sam-translator (>=1.68.0,<1.70.0 || >1.70.0) requires pydantic (>=1.8,<2.0).
And because aws-sam-translator (1.70.0) depends on pydantic (>=1.8,<2.0)
and cfn-lint (0.77.10) depends on aws-sam-translator (>=1.68.0), cfn-lint (0.77.10) requires pydantic (>=1.8,<2.0).
So, because aws-lambda-powertools depends on both pydantic (>=2.0) and cfn-lint (0.77.10), version solving failed.
Dependency resolution for Pydantic v1 and v2
Customers will run into a conflict when having either requirements.txt
:
All Powertools features and Pydantic v2
aws-lambda-powertools[all]
pydantic # or pydantic>=2
Powertools parser feature and Pydantic v2
aws-lambda-powertools[parser]
pydantic # or pydantic>=2
Recommendation
We should include a new section in the documentation to explain how to use Pydantic v2 with Parser.
For example, customers should refrain from using [all]
or [parser]
when bringing Pydantic v2 as part of their dependencies.
aws-lambda-powertools[all]
becomesaws-lambda-powertools[validation,tracer,aws-sdk]
Because of the Optional[str] = None
breaking change in v2, we should keep our pydantic pinning to v1 until we launch v3 and move away. We cannot guarantee a customer is using additional Pydantic v1 features through Powertools - or followed our docs to the letter.
This also gives us room to recommend disabling warnings for deprecated features we're keeping for backwards compatibility (e.g., validators). This prevents Pydantic littering customers' logs ($$) when bringing Pydantic v2.
Ad-hoc test for pydantic v2 dep
Until Powertools V3 is a thing, we should guard against backwards incompatibility for newer Pydantic models, or changes to existing models that might be contributed externally from a Pydantic v2 customer.
Recommendation
Setup a new temporary GitHub Action workflow to trigger on changes to Parser's models. We can use Nox to streamline dependencies and easily trigger Parser's unit tests.
Alternatively, within this temporary workflow, we could call make dev
, remove Pydantic, install Pydantic v2`, and run Parser's unit tests.
Nox's benefit is that it's more stable, easier to reason, and it will lead us to address an unrelated area we aren't testing today -- tests w/o optional dependencies (e.g., Batch without parser code).
No response
Alternative solutions
No response
Acknowledgment
- This feature request meets Powertools for AWS Lambda (Python) Tenets
- Should this be considered in other Powertools for AWS Lambda languages? i.e. Java, TypeScript, and .NET
Metadata
Metadata
Assignees
Type
Projects
Status