Description
Pytest assert rewrite causes serializing issue with dill - apparently it makes AssertRewriteHook present (including output redirection in the code being serialized by airflow, that gets properly serialised when assert rewriting is disabled.
Context: When we added python_files = "*.py"
to Apache Airflow in order to not accidentally skip some of our tests ( apache/airflow#30315 ) at some point one of our tests : tests.operators.test_python.TestPythonVirtualenvOperator.test_airflow_context
started to fail with mysterious error:
# Check for a __reduce_ex__ method, fall back to __reduce__
reduce = getattr(obj, "__reduce_ex__", None)
if reduce is not None:
> rv = reduce(self.proto)
E TypeError: cannot pickle 'EncodedFile' object
../../../../.pyenv/versions/3.9.9/lib/python3.9/pickle.py:578: TypeError
When debugging with debugger, you can see that what happens is that dill is attempting to serialize AssertionRewrritingHook - and since the hook contains references to some files (captured stdout/stderr) - those cannot be serialized by dill.
ModuleSpec(name='airflow.macros', loader=<_pytest.assertion.rewrite.AssertionRewritingHook object at 0x1289c0a90>, origin='/Users/jarek/IdeaProjects/airflow/airflow/macros/__init__.py', submodule_search_locations=['/Users/jarek/IdeaProjects/airflow/airflow/macros'])
This does not happen when assert rewrite is turned off, unfortunately I have not found a good workaround to exclude something wia PYTEST_DONT_REWRITE
commend , so a I had to separate out the test to run separately from other tests with --assert=plain
as workaround.
Rewrite is the most likely reason because either adding --assert=plain
solves the problem.
Interesting clue. I could also fix the problem by adding exclusion via python_files
. Initially we had python_files = ["test_*.py"]
and the problem did not appear. The problem started to appear when we changed it to python_files = ["*.py"]
. I performed a trial-and-error bisecting on the name that causes the problem and it seems that it is yaml.py
that causes problem (I tried to add PYTEST_DONT_REWRITE
to the yaml.py
files we have in the system and it does not solve the problem). Reproduction steps showing that are also added.
I've opened a PR to cassandra to include PYTEST_DONT_REWRITE datastax/python-driver#1142 and in Apache Airflow we have PR to autoamaticallly patch cassandra driver with it apache/airflow#30315, but those are merely workarounds for the problem.
Reproduction:
An easy way to reproduce it:
- Pull the CI image of Airlfow that contain the workaround and all the airflow dependencies (it contains patched types_code.py):
docker pull ghcr.io/apache/airflow/main/ci/python3.10:8580edf1cb0e67efdf45e6686d2f0239bc8f1ebb
- Enter the image (you will be dropped into shell with everything ready to run the tests):
docker run -it ghcr.io/apache/airflow/main/ci/python3.10:8580edf1cb0e67efdf45e6686d2f0239bc8f1ebb
- Run the test in question:
pytest tests/operators/test_python.py::TestPythonVirtualenvOperator::test_airflow_cont
We have the test currently skipped so this results in:
SKIPPED [1] tests/operators/test_python.py:789: assertion rewriting breaks this test because dill will try to serialize AssertRewritingHook including captured stdout and we need to run it with
--assert=plain
pytest option and PYTEST_PLAIN_ASSERTS=true
===================================================================== 1 skipped in 0.03s ======================================================================
- Run the test with
PYTEST_PLAIN_ASSERTS=true
PYTEST_PLAIN_ASSERTS=true pytest tests/operators/test_python.py::TestPythonVirtualenvOperator::test_airflow_context
This test fails with long exception and stack trace :
File "/usr/local/lib/python3.10/site-packages/dill/_dill.py", line 912, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/usr/local/lib/python3.10/pickle.py", line 972, in save_dict
self._batch_setitems(obj.items())
File "/usr/local/lib/python3.10/pickle.py", line 998, in _batch_setitems
save(v)
File "/usr/local/lib/python3.10/pickle.py", line 578, in save
rv = reduce(self.proto)
TypeError: cannot pickle 'EncodedFile' object
If you debug it - you will see that AssertionRewritingHook
is the object contributing the file to serialize by dill.
- Run the same test with assert rewrite disabled:
PYTEST_PLAIN_ASSERTS=true pytest tests/operators/test_python.py::TestPythonVirtualenvOperator::test_airflow_context --assert=plain
Result - the test succeeds:
tests/operators/test_python.py::TestPythonVirtualenvOperator::test_airflow_context PASSED [100%]
Why in some way 'yaml.py` is a problem?
- Modify
pyproject.toml
(in current working dir) to change:
python_files = [
"*.py",
]
into
python_files = [
"yaml.py",
]
- Run the same test as in 4) above:
PYTEST_PLAIN_ASSERTS=true pytest tests/operators/test_python.py::TestPythonVirtualenvOperator::test_airflow_context
It will fail again with TypeError: cannot pickle 'EncodedFile' object
- Change the
pyproject.toml
to anything else thanyaml.py
- for examplet*.py
:
python_files = [
"t*.py",
]
- Repeat the command from 7) - it should succeed this time
PYTEST_PLAIN_ASSERTS=true pytest tests/operators/test_python.py::TestPythonVirtualenvOperator::test_airflow_context
I tried to bisect the name -> it continues to fail as you add y*.py
, ya*.py
etc - up to yaml.py
. Seems that INCLUDING yaml.py
in the list of the python_files
triggers the error.
Mandatory information:
Versions
- Pytest: 7.2.2
- OS: docker container based on debian buster (official Python 3.10 image - same for other python versions)
Linux 209653871bc9 5.15.0-67-generic #74-Ubuntu SMP Wed Feb 22 14:14:39 UTC 2023 x86_64 GNU/Linux
The output of pip list:
Package Version Editable project location
-------------------------------------- ----------- -------------------------
adal 1.2.7
aiobotocore 2.5.0
aiofiles 22.1.0
aiohttp 3.8.4
aioitertools 0.11.0
aioresponses 0.7.4
aiosignal 1.3.1
alabaster 0.7.13
alembic 1.10.2
aliyun-python-sdk-core 2.13.36
aliyun-python-sdk-kms 2.16.0
amqp 5.1.1
analytics-python 1.4.post1
ansiwrap 0.8.4
anyio 3.6.2
apache-airflow 2.6.0.dev0 /opt/airflow
apache-beam 2.46.0
apispec 5.2.2
appdirs 1.4.4
argcomplete 3.0.5
arrow 1.2.3
asana 3.2.0
asgiref 3.6.0
asn1crypto 1.5.1
astroid 2.15.1
asttokens 2.2.1
async-timeout 4.0.2
asynctest 0.13.0
atlasclient 1.0.0
atlassian-python-api 3.35.0
attrs 22.2.0
Authlib 1.2.0
aws-sam-translator 1.63.0
aws-xray-sdk 2.11.0
azure-batch 13.0.0
azure-common 1.1.28
azure-core 1.26.3
azure-cosmos 4.3.1
azure-datalake-store 0.0.52
azure-identity 1.12.0
azure-keyvault-secrets 4.7.0
azure-kusto-data 0.0.45
azure-mgmt-containerinstance 1.5.0
azure-mgmt-core 1.3.2
azure-mgmt-datafactory 1.1.0
azure-mgmt-datalake-nspkg 3.0.1
azure-mgmt-datalake-store 0.5.0
azure-mgmt-nspkg 3.0.2
azure-mgmt-resource 23.0.0
azure-nspkg 3.0.2
azure-servicebus 7.8.3
azure-storage-blob 12.15.0
azure-storage-common 2.1.0
azure-storage-file 2.1.0
azure-storage-file-datalake 12.10.1
azure-synapse-spark 0.7.0
Babel 2.12.1
backcall 0.2.0
backoff 1.10.0
bcrypt 4.0.1
beautifulsoup4 4.12.0
billiard 3.6.4.0
bitarray 2.7.3
black 23.1a1
bleach 6.0.0
blinker 1.5
boto 2.49.0
boto3 1.26.76
botocore 1.29.76
bowler 0.9.0
cachelib 0.9.0
cachetools 5.3.0
cassandra-driver 3.25.0
cattrs 22.2.0
celery 5.2.7
certifi 2022.12.7
cffi 1.15.1
cfgv 3.3.1
cfn-lint 0.76.1
cgroupspy 0.2.2
chardet 4.0.0
charset-normalizer 2.1.1
checksumdir 1.2.0
ciso8601 2.3.0
click 8.1.3
click-default-group 1.2.2
click-didyoumean 0.3.0
click-plugins 1.1.1
click-repl 0.2.0
clickclick 20.10.2
cloudant 2.15.0
cloudpickle 2.2.1
colorama 0.4.6
colorlog 4.8.0
ConfigUpdater 3.1.1
connexion 2.14.2
coverage 7.2.2
crcmod 1.7
cron-descriptor 1.2.35
croniter 1.3.8
cryptography 39.0.2
curlify 2.2.1
dask 2023.3.2
databricks-sql-connector 2.4.1
datadog 0.45.0
db-dtypes 1.0.5
decorator 5.1.1
defusedxml 0.7.1
Deprecated 1.2.13
dill 0.3.1.1
distlib 0.3.6
distributed 2023.3.2
dnspython 2.3.0
docker 6.0.1
docopt 0.6.2
docutils 0.16
ecdsa 0.18.0
elasticsearch 7.13.4
elasticsearch-dbapi 0.2.10
elasticsearch-dsl 7.4.1
email-validator 1.3.1
entrypoints 0.4
eralchemy2 1.3.7
et-xmlfile 1.1.0
eventlet 0.33.3
exceptiongroup 1.1.1
execnet 1.9.0
executing 1.2.0
facebook-business 16.0.1
fastavro 1.7.3
fasteners 0.18
fastjsonschema 2.16.3
filelock 3.10.7
fissix 21.11.13
Flask 2.2.3
Flask-AppBuilder 4.3.0
Flask-Babel 2.0.0
Flask-Bcrypt 1.0.1
Flask-Caching 2.0.2
Flask-JWT-Extended 4.4.4
Flask-Limiter 3.3.0
Flask-Login 0.6.2
Flask-Session 0.4.0
Flask-SQLAlchemy 2.5.1
Flask-WTF 1.1.1
flower 1.2.0
frozenlist 1.3.3
fsspec 2023.3.0
future 0.18.3
gcloud-aio-auth 4.2.0
gcloud-aio-bigquery 6.3.0
gcloud-aio-storage 8.1.0
gcsfs 2023.3.0
geomet 0.2.1.post1
gevent 22.10.2
gitdb 4.0.10
GitPython 3.1.31
google-ads 18.0.0
google-api-core 2.8.2
google-api-python-client 1.12.11
google-auth 2.16.3
google-auth-httplib2 0.1.0
google-auth-oauthlib 0.8.0
google-cloud-aiplatform 1.16.1
google-cloud-appengine-logging 1.1.3
google-cloud-audit-log 0.2.4
google-cloud-automl 2.8.0
google-cloud-bigquery 2.34.4
google-cloud-bigquery-datatransfer 3.7.0
google-cloud-bigquery-storage 2.14.1
google-cloud-bigtable 2.11.1
google-cloud-build 3.9.0
google-cloud-compute 0.7.0
google-cloud-container 2.11.1
google-cloud-core 2.3.2
google-cloud-datacatalog 3.9.0
google-cloud-dataflow-client 0.5.4
google-cloud-dataform 0.2.0
google-cloud-dataplex 1.1.0
google-cloud-dataproc 5.0.0
google-cloud-dataproc-metastore 1.6.0
google-cloud-dlp 3.8.0
google-cloud-kms 2.12.0
google-cloud-language 1.3.2
google-cloud-logging 3.2.1
google-cloud-memcache 1.4.1
google-cloud-monitoring 2.11.0
google-cloud-orchestration-airflow 1.4.1
google-cloud-os-login 2.7.1
google-cloud-pubsub 2.13.5
google-cloud-redis 2.9.0
google-cloud-resource-manager 1.6.0
google-cloud-secret-manager 1.0.2
google-cloud-spanner 1.19.3
google-cloud-speech 1.3.4
google-cloud-storage 2.7.0
google-cloud-tasks 2.10.1
google-cloud-texttospeech 1.0.3
google-cloud-translate 1.7.2
google-cloud-videointelligence 1.16.3
google-cloud-vision 1.0.2
google-cloud-workflows 1.7.1
google-crc32c 1.5.0
google-resumable-media 2.4.1
googleapis-common-protos 1.56.4
graphql-core 3.2.3
graphviz 0.20.1
greenlet 2.0.2
grpc-google-iam-v1 0.12.4
grpcio 1.53.0
grpcio-gcp 0.2.2
grpcio-status 1.48.2
gssapi 1.8.2
gunicorn 20.1.0
h11 0.14.0
hdfs 2.7.0
HeapDict 1.0.1
hmsclient 0.1.1
httpcore 0.16.3
httplib2 0.21.0
httpx 0.23.3
humanize 4.6.0
hvac 1.1.0
identify 2.5.22
idna 3.4
ijson 3.2.0.post0
imagesize 1.4.1
importlib-metadata 6.1.0
importlib-resources 5.12.0
impyla 0.18.0
incremental 22.10.0
inflection 0.5.1
influxdb-client 1.36.1
iniconfig 2.0.0
ipdb 0.13.13
ipython 8.11.0
isodate 0.6.1
itsdangerous 2.1.2
jaraco.classes 3.2.3
JayDeBeApi 1.2.3
jedi 0.18.2
jeepney 0.8.0
Jinja2 3.1.2
jira 3.5.0
jmespath 0.10.0
JPype1 1.4.1
jschema-to-python 1.2.3
json-merge-patch 0.2
jsondiff 2.0.0
jsonpatch 1.32
jsonpath-ng 1.5.3
jsonpickle 3.0.1
jsonpointer 2.3
jsonschema 4.17.3
jsonschema-spec 0.1.4
junit-xml 1.9
jupyter_client 8.1.0
jupyter_core 5.3.0
keyring 23.13.1
kombu 5.2.4
krb5 0.5.0
kubernetes 23.6.0
kubernetes-asyncio 24.2.2
kylinpy 2.8.4
lazy-object-proxy 1.9.0
ldap3 2.9.1
limits 3.3.1
linkify-it-py 2.0.0
locket 1.0.0
lockfile 0.12.2
looker-sdk 23.2.0
lxml 4.9.2
lz4 4.3.2
Mako 1.2.4
Markdown 3.4.3
markdown-it-py 2.2.0
MarkupSafe 2.1.2
marshmallow 3.19.0
marshmallow-enum 1.5.1
marshmallow-oneofschema 3.0.1
marshmallow-sqlalchemy 0.26.1
matplotlib-inline 0.1.6
mdit-py-plugins 0.3.5
mdurl 0.1.2
mongomock 4.1.2
monotonic 1.6
more-itertools 9.1.0
moreorless 0.4.0
moto 4.1.6
mpmath 1.3.0
msal 1.21.0
msal-extensions 1.0.0
msgpack 1.0.5
msrest 0.7.1
msrestazure 0.6.4
multi-key-dict 2.0.3
multidict 6.0.4
mypy 1.0.0
mypy-boto3-appflow 1.26.78
mypy-boto3-rds 1.26.99
mypy-boto3-redshift-data 1.26.88
mypy-extensions 1.0.0
mysql-connector-python 8.0.32
mysqlclient 2.1.1
nbclient 0.7.2
nbformat 5.8.0
neo4j 5.6.0
networkx 3.0
nodeenv 1.7.0
numpy 1.24.2
oauthlib 3.2.2
objsize 0.6.1
openapi-schema-validator 0.4.4
openapi-spec-validator 0.5.6
openpyxl 3.1.2
opentelemetry-api 1.15.0
opentelemetry-exporter-otlp 1.15.0
opentelemetry-exporter-otlp-proto-grpc 1.15.0
opentelemetry-exporter-otlp-proto-http 1.15.0
opentelemetry-exporter-prometheus 1.12.0rc1
opentelemetry-proto 1.15.0
opentelemetry-sdk 1.15.0
opentelemetry-semantic-conventions 0.36b0
opsgenie-sdk 2.1.5
oracledb 1.2.2
ordered-set 4.1.0
orjson 3.8.8
oscrypto 1.3.0
oss2 2.17.0
packaging 21.3
pandas 1.5.3
pandas-gbq 0.17.9
papermill 2.4.0
paramiko 3.1.0
parso 0.8.3
partd 1.3.0
pathable 0.4.3
pathspec 0.9.0
pbr 5.11.1
pdpyras 4.5.2
pendulum 2.1.2
pexpect 4.8.0
pickleshare 0.7.5
pinotdb 0.4.14
pip 23.0.1
pipdeptree 2.7.0
pipx 1.2.0
pkginfo 1.9.6
platformdirs 3.2.0
pluggy 1.0.0
ply 3.11
plyvel 1.5.0
portalocker 2.7.0
pre-commit 3.2.1
presto-python-client 0.8.3
prison 0.2.1
prometheus-client 0.16.0
prompt-toolkit 3.0.38
proto-plus 1.19.6
protobuf 3.20.0
psutil 5.9.4
psycopg2-binary 2.9.5
ptyprocess 0.7.0
pure-eval 0.2.2
pure-sasl 0.6.2
py-partiql-parser 0.1.0
py4j 0.10.9.5
pyarrow 9.0.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycountry 22.3.5
pycparser 2.21
pycryptodome 3.17
pycryptodomex 3.17
pydantic 1.10.7
pydata-google-auth 1.7.0
pydot 1.4.2
pydruid 0.6.5
pyenchant 3.2.2
pyexasol 0.25.2
PyGithub 1.58.1
Pygments 2.14.0
pygraphviz 1.10
pyhcl 0.4.4
PyHive 0.6.5
PyJWT 2.6.0
pykerberos 1.2.4
pymongo 3.13.0
pymssql 2.2.7
PyNaCl 1.5.0
pyodbc 4.0.35
pyOpenSSL 23.1.1
pyparsing 3.0.9
pypsrp 0.8.1
pyrsistent 0.19.3
pyspark 3.3.2
pyspnego 0.8.0
pytest 7.2.2
pytest-asyncio 0.21.0
pytest-capture-warnings 0.0.4
pytest-cov 4.0.0
pytest-httpx 0.21.3
pytest-instafail 0.4.2
pytest-rerunfailures 11.1.2
pytest-timeouts 1.2.1
pytest-xdist 3.2.1
python-arango 7.5.7
python-daemon 3.0.1
python-dateutil 2.8.2
python-dotenv 1.0.0
python-http-client 3.3.7
python-jenkins 1.7.0
python-jose 3.3.0
python-ldap 3.4.3
python-nvd3 0.15.0
python-slugify 8.0.1
python-telegram-bot 20.2
pytz 2023.2
pytz-deprecation-shim 0.1.0.post0
pytzdata 2020.1
pywinrm 0.4.3
PyYAML 6.0
pyzmq 25.0.2
qds-sdk 1.16.1
reactivex 4.0.4
readme-renderer 37.3
redis 3.5.3
redshift-connector 2.0.910
regex 2023.3.23
requests 2.28.2
requests-file 1.5.1
requests-kerberos 0.14.0
requests-mock 1.10.0
requests-ntlm 1.2.0
requests-oauthlib 1.3.1
requests-toolbelt 0.10.1
responses 0.23.1
rfc3339-validator 0.1.4
rfc3986 1.5.0
rich 13.3.3
rich_argparse 1.1.0
rich-click 1.6.1
rsa 4.9
ruff 0.0.259
s3transfer 0.6.0
sarif-om 1.0.4
sasl 0.3.1
scramp 1.4.4
scrapbook 0.5.0
SecretStorage 3.3.3
semver 2.13.0
sendgrid 6.10.0
sentinels 1.0.0
sentry-sdk 1.17.0
setproctitle 1.3.2
setuptools 66.1.1
simple-salesforce 1.12.3
six 1.16.0
slack-sdk 3.20.2
smbprotocol 1.10.1
smmap 5.0.0
sniffio 1.3.0
snowballstemmer 2.2.0
snowflake-connector-python 3.0.2
snowflake-sqlalchemy 1.4.7
sortedcontainers 2.4.0
soupsieve 2.4
Sphinx 5.3.0
sphinx-airflow-theme 0.0.11
sphinx-argparse 0.4.0
sphinx-autoapi 2.0.1
sphinx-copybutton 0.5.1
sphinx-jinja 2.0.2
sphinx-rtd-theme 1.2.0
sphinxcontrib-applehelp 1.0.4
sphinxcontrib-devhelp 1.0.2
sphinxcontrib-htmlhelp 2.0.1
sphinxcontrib-httpdomain 1.8.1
sphinxcontrib-jquery 4.1
sphinxcontrib-jsmath 1.0.1
sphinxcontrib-qthelp 1.0.3
sphinxcontrib-redoc 1.6.0
sphinxcontrib-serializinghtml 1.1.5
sphinxcontrib-spelling 8.0.0
spython 0.3.0
SQLAlchemy 1.4.47
sqlalchemy-bigquery 1.6.1
sqlalchemy-drill 1.1.2
SQLAlchemy-JSONField 1.0.1.post0
sqlalchemy-redshift 0.8.12
SQLAlchemy-Utils 0.40.0
sqlparse 0.4.3
sshpubkeys 3.3.1
sshtunnel 0.4.0
stack-data 0.6.2
starkbank-ecdsa 2.2.0
statsd 4.0.1
sympy 1.11.1
tableauserverclient 0.24
tabulate 0.9.0
tblib 1.7.0
tenacity 8.2.2
termcolor 2.2.0
text-unidecode 1.3
textwrap3 0.9.2
thrift 0.16.0
thrift-sasl 0.4.3
time-machine 2.9.0
tomli 2.0.1
toolz 0.12.0
tornado 6.2
towncrier 22.12.0
tqdm 4.65.0
traitlets 5.9.0
trino 0.322.0
twine 4.0.2
types-boto 2.49.18.7
types-certifi 2021.10.8.3
types-croniter 1.3.2.7
types-Deprecated 1.2.9.2
types-docutils 0.19.1.7
types-Markdown 3.4.2.6
types-paramiko 3.0.0.5
types-protobuf 4.22.0.0
types-PyMySQL 1.0.19.6
types-pyOpenSSL 23.1.0.1
types-python-dateutil 2.8.19.11
types-python-slugify 8.0.0.2
types-pytz 2023.2.0.1
types-PyYAML 6.0.12.9
types-redis 4.5.3.0
types-requests 2.28.11.16
types-setuptools 67.6.0.5
types-tabulate 0.9.0.1
types-termcolor 1.1.6.2
types-toml 0.10.8.5
types-urllib3 1.26.25.9
typing_extensions 4.5.0
tzdata 2023.2
tzlocal 4.3
uamqp 1.6.4
uc-micro-py 1.0.1
unicodecsv 0.14.1
Unidecode 1.3.6
uritemplate 3.0.1
urllib3 1.26.15
userpath 1.8.0
vertica-python 1.3.1
vine 5.0.0
virtualenv 20.21.0
volatile 2.1.0
watchtower 2.0.1
wcwidth 0.2.6
webencodings 0.5.1
websocket-client 1.5.1
Werkzeug 2.2.3
wheel 0.40.0
wrapt 1.15.0
WTForms 3.0.1
xmltodict 0.13.0
yamllint 1.30.0
yandexcloud 0.206.0
yarl 1.8.2
zeep 4.2.1
zenpy 2.0.25
zict 2.2.0
zipp 3.15.0
zope.event 4.6
zope.interface 6.0
zstandard 0.20.0
- a detailed description of the bug or problem you are having
- output of
pip list
from the virtual environment you are using - pytest and operating system versions
- minimal example if possible