Skip to content

Add Pyodide support and CI jobs for Zarr #1903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 62 commits into
base: main
Choose a base branch
from

Conversation

agriyakhetarpal
Copy link

@agriyakhetarpal agriyakhetarpal commented May 23, 2024

Description

This description is a stub and will be updated in due course.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

Additional context

Here are a few xrefs to some other PRs that have been implemented for packages in the Scientific Python ecosystem:

The greater, long-term goal towards this change is to implement Sphinx-based interactive documentation via JupyterLite and WASM-powered, in-browser kernels, as referenced in Quansight-Labs/czi-scientific-python-mgmt#19, see also: Quansight-Labs/czi-scientific-python-mgmt#18. A pilot that can be readily tried out is available for the "API reference" pages under the PyWavelets documentation. This will be preceded by configuring a place to publish these WebAssembly/Emscripten wheels nightly or on a scheduled cadence for Zarr (which could be a third-party, PyPI-like index such as Anaconda.org) and then integrating Sphinx extensions such as jupyterlite-sphinx for hosted documentation.

@agriyakhetarpal
Copy link
Author

Zarr v3 has crc32c and zstandard as dependencies, which were not present for v2. These packages contain compiled code and are not available in Pyodide yet (based on the logs from the workflow run). I shall try to add recipes for them Pyodide, but I am afraid I shall not have an answer for how long that can take.

@agriyakhetarpal
Copy link
Author

Update: we added both crc32c and zstandard (please see the issues/PRs linked above). Pyodide 0.26.0 should be out very soon containing both packages plus with a bump to the Python version (to 3.12.1) and an ABI-breaking change (via Emscripten 3.1.58). I shall revisit this PR in a few days.

Next, when Zarr v3 (stable) comes out on PyPI in the next month or so, I shall bump the version/recipe for the in-tree Pyodide build as well. It would be great to have the out-of-tree build, i.e., this PR, ready for review/merging by then, and I will be committed to putting my best efforts in doing so – thanks!

@agriyakhetarpal
Copy link
Author

The Pyodide 0.26.0 release is out, and we have been lucky to have got both crc32c and zstandard in time (but in any case, they would have got into a patch release anyway). Let me update this PR in a short moment.

Copy link
Author

@agriyakhetarpal agriyakhetarpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I apologise for the radio silence on this PR. I've recently been exploring Pyodide/WASM support for Zarr again, as we aim to reintroduce Zarr to https://github.com/pyodide/pyodide-recipes. We will utilise that repository in the next release of Pyodide, namely version 0.28, scheduled for release in a few weeks – please see the tracking issue at pyodide/pyodide-recipes#99 for a subset of our remaining work. We currently use Python 3.13, and the first version of Zarr that supports it is v3, which meant that we needed to get this working – we had disabled Zarr due to the use of threads. I've tried to revisit support for Pyodide through the recent commits I've pushed, and I've been able to get it to work.

If I understand correctly, Zarr's inherent threading model in v3 can be bypassed in WebAssembly through the browser's/Node.js's event loop, as demonstrated in my changes to src/zarr/core/sync.py. The only route it disables is zarr.sync, and tests in test_sync.py are skipped as a result. If I understand correctly, zarr.sync is going away anyway with a later version of Zarr.

This PR depends on its sister PR that I created at zarr-developers/numcodecs#529 where we're hoping to add Pyodide/WASM support for numcodecs. All tests have been passing there and it has been ready for further review. Here, I'm temporarily building numcodecs from my PR's branch there, as the version of numcodecs in the Pyodide distribution itself is old does not satisfy the constraint here (>=0.16). I still haven't been able to figure out why the VCS system is not picking up the right version in CI – it looks like a bug where actions/checkout@v4 does not fetch tags and does not perform an unshallow clone when checking out to a directory other than {{ github.workspace }}. Either way, I've temporarily skipped the version check for now so that we can let the tests run till the end.

Hence, while this is obviously not in a mergeable state – I've been able to get several tests passing and Zarr seems to be working well. I believe this PR is ready for review and further inspection, and it should enable us to integrate a portion of this PR as a patch for Pyodide, allowing Zarr to be re-enabled once the idea receives peer review here. A full review can proceed in the meantime as I work through this PR.

In other respects, what I'm finding a bit concerning is the slowness of the test suite in CI. The CI job took 5 hours to run, which seems like an anomaly – in a Pyodide virtual environment on my macOS machine locally, I've been able to consistently run the tests until completion in ~11 minutes on multiple runs, rather than hours. test_set_orthogonal_selection_3d takes 6034 seconds here, which is more than 150x of what it takes outside of CI. Would it make sense to add an additional slow_in_wasm marker so that pytest can ignore the tests shown below?

============================= slowest 10 durations =============================
6034.84s call     tests/test_indexing.py::test_set_orthogonal_selection_3d
4101.22s call     tests/test_properties.py::test_array_creates_implicit_groups
3878.97s call     tests/test_properties.py::test_array_roundtrip
924.30s call     tests/test_properties.py::test_roundtrip_array_metadata_from_store
714.00s call     tests/test_indexing.py::test_set_orthogonal_selection_2d
397.94s call     tests/test_indexing.py::test_set_coordinate_selection_2d
242.29s call     tests/test_indexing.py::test_set_orthogonal_selection_1d
150.35s call     tests/test_indexing.py::test_set_block_selection_2d
121.05s call     tests/test_indexing.py::test_set_block_selection_1d
95.83s call     tests/test_indexing.py::test_get_orthogonal_selection_3d

I've also xfailed some flaky tests arising from Hypothesis, which I haven't been able to figure out the cause for, and marked a Blosc test where I get mismatched arrays (which will need further probing into how Blosc was compiled to WASM) as a known failure case – I hope that is alright!

Out of the two failing tests in the CI job:

  • test_array_roundtrip xfails on my local setup on several reruns due to Hypothesis having degraded performance when generating test cases, and it looks like it strict-xpassed here. It is likely a flake, too.
  • I had introduced a lenience factor for tests/test_group.py::test_group_members_performance[memory] to accommodate the fact that it takes a little longer to run, likely due to overhead accumulated by running in WASM and the latency coming from the Node.js event loop. The lenience factor/tolerance of 1.1 is apparently good enough for my machine, but we know CI machines are quite unreliable, and the required lenience factor to make it pass here would be 4. I'm open to receiving suggestions about this test. My suggestion would be to rework the test method to test versus multiple group counts—parametrised—and verify that the time taken doesn't scale linearly with the group count (i.e., times should be roughly similar if $O(1)$, later times should be much larger if $O(n)$; and the time ratio should ideally be much less than the linear scaling ratio).

Thank you for your time!

Comment on lines +80 to +83
# For some reason fetch-depth: 0 and fetch-tags: true aren't working...
- name: Manually fetch tags for numcodecs
working-directory: numcodecs-wasm
run: git fetch --tags
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For unexplained reasons, even this doesn't help.

Comment on lines +45 to +50
# _numcodecs_version = Version(numcodecs.__version__)
# if _numcodecs_version < Version("0.13.0"):
# raise RuntimeError(
# "numcodecs version >= 0.13.0 is required to use the zstd codec. "
# f"Version {_numcodecs_version} is currently installed."
# )
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporary change; these lines will be uncommented when we either have a solution for the numcodecs version being fetched in the CI job or we create yet another alpha with an updated version of numcodecs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about skipping tests that require newer version of numcodecs?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, no, this is only to disable the version check. numcodecs is being compiled from zarr-developers/numcodecs#529 with the latest version, just that the version is incorrect.

Comment on lines +111 to +112
"async": {"concurrency": 1 if IS_WASM else 10, "timeout": None},
"threading": {"max_workers": 1 if IS_WASM else None},
Copy link
Author

@agriyakhetarpal agriyakhetarpal May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is to block spawning any new threads on any occasion, hence max_workers is set to 1. Also, I'm not sure if concurrency should be set to 1 or if we can safely use 10.

Comment on lines +42 to +58
def _make_shutdown_asyncgens_noop_for_pyodide() -> None:
"""
Patch Pyodide's WebLoop to fix interoperability with pytest-asyncio.

WebLoop.shutdown_asyncgens() raises NotImplementedError, which causes
pytest-asyncio to issue warnings during test cleanup and potentially
cause resource leaks that make tests hang. This is a bit of a
hack, but it allows us to run tests that use pytest-asyncio.

This is necessary because pytest-asyncio tries to clean up async generators
when tearing down test event loops, but Pyodide's WebLoop doesn't support
this as it integrates with the browser's event loop rather than managing
its own lifecycle.
"""
try:
if not IS_WASM and "pyodide" not in sys.modules:
return
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The missing coverage annotations for this test are because Codecov does not know about this WASM CI job where these code paths are hit. I can mark them with a # pragma: no cover comment closer to the time of merging this PR.

Comment on lines +86 to +87
"async": {"concurrency": 1 if IS_WASM else 10, "timeout": None},
"threading": {"max_workers": 1 if IS_WASM else None},
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same question as above around what an appropriate async.concurrency value would be

@agriyakhetarpal agriyakhetarpal changed the title [WIP]: Add Pyodide support and CI jobs for Zarr Add Pyodide support and CI jobs for Zarr May 28, 2025
@agriyakhetarpal
Copy link
Author

Marking this as ready for review based on the above notes.

@agriyakhetarpal agriyakhetarpal marked this pull request as ready for review May 28, 2025 11:16
@agriyakhetarpal
Copy link
Author

I can't request a review from you as you haven't been previously involved in this PR, @ryanking13, but it would be nice if you could take a look at it as well – thank you!

Comment on lines +45 to +50
# _numcodecs_version = Version(numcodecs.__version__)
# if _numcodecs_version < Version("0.13.0"):
# raise RuntimeError(
# "numcodecs version >= 0.13.0 is required to use the zstd codec. "
# f"Version {_numcodecs_version} is currently installed."
# )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about skipping tests that require newer version of numcodecs?

Comment on lines +1981 to +1982
# Performance for Pyodide is only marginally worse than for Python,
# usually in the 1.012~1.018 range for a latency of 0.1). So we add

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather skip this test as this can be very quite flaky.

# https://developer.mozilla.org/en-US/docs/Web/API/setTimeout
if IS_WASM:
current_loop = asyncio.get_running_loop()
return current_loop.run_until_complete(coro)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_until_complete is no-op in Pyodide unless JSPI is enabled. So I wonder whether if it is better to use run_until_complete here or just raise an error.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see; do we know of a way to check if JSPI is enabled? We don't have --experimental-wasm-jspi in https://github.com/pyodide/pyodide/blob/adfa092b26745a16127ce2ad680f81407f07738d/src/templates/python#L24-L26 so I thought it isn't enabled, but it seems to be enabled – I added a few debug print statements, and the currently running loop is indeed the WebLoop, and run_until_complete doesn't seem to be a no-op. I'm able to get an AsyncArray after sync() completes.

Comment on lines +44 to +46
Patch Pyodide's WebLoop to fix interoperability with pytest-asyncio.

WebLoop.shutdown_asyncgens() raises NotImplementedError, which causes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should implement it in Pyodide?

If this is only problematic in testing environment, how about moving this into conftest.py?

Copy link
Author

@agriyakhetarpal agriyakhetarpal May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should implement it in Pyodide?

I haven't thought about it, but yes, I think this should be helpful to put somewhere in pytest-pyodide, if I understand you correctly. I haven't come across this case/problem in any other downstream package, so I'm not sure (but perhaps Zarr is among the first).

If this is only problematic in testing environment, how about moving this into conftest.py?

Yes, makes sense to me! I'll move it to conftest.py and make it run only when PYTEST_CURRENT_TEST is active.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs release notes Automatically applied to PRs which haven't added release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants