Skip to content

Implement gmt xarray BackendEntrypoint #3919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 43 commits into from
Apr 26, 2025
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
b81aa06
Implement gmtread xarray BackendEntrypoint
weiji14 Apr 17, 2025
2b98a65
Add source encoding to raster xarray.DataArray
weiji14 Apr 17, 2025
f41a812
Set load_dataarray's default engine to gmtread
weiji14 Apr 17, 2025
b71dcf5
Support reading GeoTIFF images via the kind='image' argument
weiji14 Apr 17, 2025
60c6726
[skip ci] Satisfy type checking
weiji14 Apr 17, 2025
bf09a39
Make 'kind' a required kwarg with no default value
weiji14 Apr 17, 2025
e622878
Add .grd to set of supported extensions in guess_can_open
weiji14 Apr 17, 2025
b9e3eb4
Rename 'kind' parameter to 'decode_kind' and mypy fix
weiji14 Apr 17, 2025
3a109ac
Rename parameter kind to decode_kind in load_dataarray
weiji14 Apr 17, 2025
d764f78
Specify decode_kind="grid" in load_ functions
weiji14 Apr 17, 2025
98910f8
[skip ci] format
weiji14 Apr 17, 2025
126560e
Use type: ignore[override]
weiji14 Apr 17, 2025
436f0fc
Don't set default decode_kind in open_dataset
weiji14 Apr 17, 2025
328b6ab
Rename engine="gmtread" to engine="gmt"
weiji14 Apr 17, 2025
06aa39d
Add GMTBackendEntrypoint to doc/api/index.rst
weiji14 Apr 17, 2025
c79e50a
Apply suggestions from code review
weiji14 Apr 17, 2025
c0ccd4b
Add test for xr.load_dataarray with a GRD grid
weiji14 Apr 19, 2025
c5ebf9f
Update GMTBackendEntrypoint description to say NetCDF C library is used
weiji14 Apr 19, 2025
17fd9d7
lint
weiji14 Apr 19, 2025
1c8af3a
Move GMTBackendEntrypoint to a new "Xarray Integration" API section
weiji14 Apr 19, 2025
1d9f21a
Revert changes to datasets/samples.py and helpers/testing.py
weiji14 Apr 19, 2025
ca9132f
Apply suggestions from code review
weiji14 Apr 22, 2025
9669bcf
lint
weiji14 Apr 22, 2025
f722b47
Revert changes in pygmt/io.py
weiji14 Apr 22, 2025
a952df9
Improve docstring of guess_can_open
weiji14 Apr 22, 2025
0c6b8a3
Add example usage to docstring of GMTBackendEntrypoint
weiji14 Apr 22, 2025
3a256a6
Apply suggestions from code review
weiji14 Apr 22, 2025
64c50c9
Merge branch 'main' into backendentrypoint/gmtread
weiji14 Apr 22, 2025
e6ee6fa
Change type-hint from str | os.PathLike to pygmt._typing.PathLike
weiji14 Apr 22, 2025
5b90a8a
Use numpy.testing.assert_allclose to check min value
weiji14 Apr 22, 2025
31d1999
Use doctest: ELLIPSIS to represent byte size of xarray coordinates
weiji14 Apr 22, 2025
6a4e1a4
Rename decode_kind to raster_kind
weiji14 Apr 23, 2025
91df0ee
Put url to GMTBackendEntrypoint docs
weiji14 Apr 23, 2025
c97bd82
Remove guess_can_open method
weiji14 Apr 23, 2025
2e5fb6b
[skip ci] lint
weiji14 Apr 23, 2025
b136e26
Merge branch 'main' into backendentrypoint/gmtread
weiji14 Apr 24, 2025
19c5252
NetCDF to netCDF
weiji14 Apr 24, 2025
3bb1f44
Add Parameters section to docs of open_dataset
weiji14 Apr 24, 2025
8984c1c
Move pygmt/xarray_backend.py to pygmt/xarray/backend.py
weiji14 Apr 24, 2025
f2446fc
Workaround sphinx Attributes ivar using ellipsis
weiji14 Apr 24, 2025
52c0b7e
Remove extra blankspace
weiji14 Apr 24, 2025
143a375
Apply suggestions from code review
weiji14 Apr 24, 2025
acab8b2
Describe more on why one should choose the gmt engine
weiji14 Apr 24, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/api/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ Input/output
.. autosummary::
:toctree: generated

GMTBackendEntrypoint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it makes more sense to have a separate category like "Xarray Integration" and put both GMTBackendEntrypoint and GMTDataArrayAccessor in the category. Also, if #3854 is implemented, GMTDataArrayAccessor no longer makes sense in the "Metadata" category.

Copy link
Member Author

@weiji14 weiji14 Apr 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was wondering about this too. I can put GMTBackendEntrypoint in a new "Xarray Integration" section as you suggested, and then we'll remove the I/O section in a separate PR.

Copy link
Member

@seisman seisman Apr 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move GMTDataArrayAccessor to the "Xarray Integration" section in this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move GMTDataArrayAccessor to the "Xarray Integration" section in this PR?

I think do that in #3854. Hopefully this PR can be done before that.

load_dataarray

GMT Defaults
Expand Down
1 change: 1 addition & 0 deletions pygmt/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@
x2sys_init,
xyz2grd,
)
from pygmt.xarray_backend import GMTBackendEntrypoint

# Start our global modern mode session
_begin()
Expand Down
2 changes: 1 addition & 1 deletion pygmt/datasets/samples.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,7 @@ def _load_earth_relief_holes() -> xr.DataArray:
is in meters.
"""
fname = which("@earth_relief_20m_holes.grd", download="c")
return load_dataarray(fname, engine="netcdf4")
return load_dataarray(fname, engine="gmt", decode_kind="grid")


class GMTSampleData(NamedTuple):
Expand Down
2 changes: 1 addition & 1 deletion pygmt/helpers/testing.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ def load_static_earth_relief():
A grid of Earth relief for internal tests.
"""
fname = which("@static_earth_relief.nc", download="c")
return load_dataarray(fname)
return load_dataarray(fname, decode_kind="grid")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, with load_dataarray deprecated, we should use xr.load_dataarray(fname, engine="gmt", decode_kind="grid") instead, but it can be done in a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started draft PR at #3922, will cherrypick things across.



def skip_if_no(package):
Expand Down
14 changes: 12 additions & 2 deletions pygmt/io.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,14 @@
PyGMT input/output (I/O) utilities.
"""

from typing import Literal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're going to deprecate load_dataarray(xref: #3919 (comment)), we should revert the changes in this file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to change the default engine in load_dataarray to engine="gmt" or no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the default engine to engine="gmt" means that users have to set decode_kind, so it's a breaking change. It makes little sense to introduce a breaking change to a deprecated function, so my answer is no.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've reverted changes to io.py in commit f722b47. So keeping default engine behaviour as before for the load_dataarray function that will be deprecated in #3922.


import xarray as xr


def load_dataarray(filename_or_obj, **kwargs):
def load_dataarray(
filename_or_obj, *, engine="gmt", decode_kind: Literal["grid", "image"], **kwargs
):
"""
Open, load into memory, and close a DataArray from a file or file-like object
containing a single data variable.
Expand All @@ -32,6 +36,10 @@ def load_dataarray(filename_or_obj, **kwargs):
-------
datarray : xarray.DataArray
The newly created DataArray.
engine : str
Engine to use when reading files. Default engine is 'gmt'.
decode_kind
The kind of data to read. Valid values are ``"grid"`` or ``"image"``.

See Also
--------
Expand All @@ -41,7 +49,9 @@ def load_dataarray(filename_or_obj, **kwargs):
msg = "'cache' has no effect in this context."
raise TypeError(msg)

with xr.open_dataarray(filename_or_obj, **kwargs) as dataarray:
with xr.open_dataarray(
filename_or_obj, engine=engine, decode_kind=decode_kind, **kwargs
) as dataarray:
result = dataarray.load()
_ = result.gmt # load GMTDataArray accessor information

Expand Down
57 changes: 57 additions & 0 deletions pygmt/tests/test_xarray_backend.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
"""
Tests for xarray 'gmt' backend engine.
"""

import re

import pytest
import xarray as xr
from pygmt.enums import GridRegistration, GridType
from pygmt.exceptions import GMTInvalidInput


def test_xarray_backend_gmt_read_grid():
"""
Ensure that passing engine='gmt' to xarray.open_dataarray works for reading
NetCDF grids.
"""
with xr.open_dataarray(
filename_or_obj="@static_earth_relief.nc", engine="gmt", decode_kind="grid"
) as da:
assert da.sizes == {"lat": 14, "lon": 8}
assert da.dtype == "float32"
assert da.gmt.registration == GridRegistration.PIXEL
assert da.gmt.gtype == GridType.GEOGRAPHIC


def test_xarray_backend_gmt_read_image():
"""
Ensure that passing engine='gmt' to xarray.open_dataarray works for reading
GeoTIFF images.
"""
with xr.open_dataarray(
filename_or_obj="@earth_day_01d", engine="gmt", decode_kind="image"
) as da:
assert da.sizes == {"band": 3, "y": 180, "x": 360}
assert da.dtype == "uint8"
assert da.gmt.registration == GridRegistration.PIXEL
assert da.gmt.gtype == GridType.GEOGRAPHIC


def test_xarray_backend_gmt_read_invalid_kind():
"""
Check that xarray.open_dataarray(..., engine="gmt") fails with missing or incorrect
'decode_kind'.
"""
with pytest.raises(
TypeError,
match=re.escape(
"GMTBackendEntrypoint.open_dataset() missing 1 required keyword-only argument: 'decode_kind'"
),
):
xr.open_dataarray("nokind.nc", engine="gmt")

with pytest.raises(GMTInvalidInput):
xr.open_dataarray(
filename_or_obj="invalid.tif", engine="gmt", decode_kind="invalid"
)
72 changes: 72 additions & 0 deletions pygmt/xarray_backend.py
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering moving this file to pygmt/xarray/backend.py, and then potentially in #3854, we can move pygmt/accessors.py to pygmt/xarray/accessors.py? And if we do pandas accessors (xref #3854 (comment)) in the future, we could put it under pygmt/pandas/accessor.py?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
"""
An xarray backend for reading raster grid/image files using the 'gmt' engine.
"""

import os
from pathlib import Path
from typing import Literal

import xarray as xr
from pygmt.clib import Session
from pygmt.exceptions import GMTInvalidInput
from pygmt.helpers import build_arg_list
from pygmt.src.which import which
from xarray.backends import BackendEntrypoint


class GMTBackendEntrypoint(BackendEntrypoint):
"""
Xarray backend to read raster grid/image files using 'gmt' engine.

Relies on the libgdal-netcdf driver used by GMT C for NetCDF files, and libgdal for
GeoTIFF files.
"""

description = "Open raster (.grd, .nc or .tif) files in Xarray via GMT read."
open_dataset_parameters = ("filename_or_obj", "kind")
url = "https://github.com/GenericMappingTools/pygmt"

def open_dataset( # type: ignore[override]
self,
filename_or_obj: str | os.PathLike,
*,
drop_variables=None, # noqa: ARG002
decode_kind: Literal["grid", "image"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why choose the name decode_kind? The original xr.open_dataarray method has parameters like decode_cf/decode_times/decode_timedelta, but here decode implies reading metadata from netCDF and converting to meaningful Python objects (e.g., datetime64). Maybe raster_kind is a better name?

Copy link
Member Author

@weiji14 weiji14 Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it does sound like decode_ is more decoding CF metadata variables, especially after reading pydata/xarray#8548 where they're talking about encode and decode methods. Lemme rename decode_kind to raster_kind then. Edit: done at 6a4e1a4

# other backend specific keyword arguments
# `chunks` and `cache` DO NOT go here, they are handled by xarray
) -> xr.Dataset:
"""
Backend open_dataset method used by Xarray in :py:func:`~xarray.open_dataset`.
Copy link
Member

@seisman seisman Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the fix in #3927, sphinx-autogen will generate a stub file for this method. The method documentation is not ideal, at least one "Parameters" section is needed.

"""
if decode_kind not in {"grid", "image"}:
msg = f"Invalid raster kind: '{decode_kind}'. Valid values are 'grid' or 'image'."
raise GMTInvalidInput(msg)

with Session() as lib:
with lib.virtualfile_out(kind=decode_kind) as voutfile:
kwdict = {"T": {"grid": "g", "image": "i"}[decode_kind]}
lib.call_module(
module="read",
args=[filename_or_obj, voutfile, *build_arg_list(kwdict)],
)

raster: xr.DataArray = lib.virtualfile_to_raster(
vfname=voutfile, kind=decode_kind
)
# Add "source" encoding
source = which(fname=filename_or_obj)
raster.encoding["source"] = (
source[0] if isinstance(source, list) else source
)
_ = raster.gmt # Load GMTDataArray accessor information
return raster.to_dataset()

def guess_can_open(self, filename_or_obj) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add tests to ensure the guess_can_open function works?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guess_can_open is used to identify the proper engine to open your data file automatically in case the engine is not specified explicitly.

I was expecting that the following code should work:

import pygmt
import xarray as xr 

xr.load_dataarray("earth_relief_01d_g.grd", decode_kind="grid")

but I got the following error:

NetCDF4BackendEntrypoint.open_dataset() got an unexpected keyword argument 'decode_kind'

It's likely that the built-in NetCDF4 backend has a higher priority than other 3rd-party engines, so the guess_can_open function doesn't work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's likely that the built-in NetCDF4 backend has a higher priority than other 3rd-party engines, so the guess_can_open function doesn't work.

Yes, according to the docs at https://github.com/pydata/xarray/blob/ee862fe12163b728a638216195217aa935238dc3/xarray/backends/api.py#L750-L752 engine="netcdf4" will be preferred (if installed) over other engines. Either use xr.load_dataarray("earth_relief_01d_g.grd", engine="gmt", decode_kind="grid"), or uninstall netcdf4.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it means we can remove the guess_can_open function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to keep guess_can_open (even if it is optional), if we remove the NetCDF4 dependency (#3643). Or we can simply do def guess_can_open: return False following this so that engine='gmt' always has to be declared explicitly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about if we remove .nc from the guess_can_open list, and just keep .grd, .tif, etc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/pydata/xarray/blob/ee862fe12163b728a638216195217aa935238dc3/xarray/backends/netCDF4_.py#L627

The netCDF engine reads the file's magic number to determine its format. This allows it to recognize netCDF files regardless of their file extension.

The "rasterio" engine relies on file extensions and supports extensions like .tif and .tiff . You can see the list of supported extensions at
https://github.com/corteva/rioxarray/blob/2a1f01e6fd141bdb1f87f4222165cf34f517dd97/rioxarray/xarray_plugin.py#L13-L26.

As I understand it, the actual engine to use is determined by the order of the available engines. I have the following engines locally:

>>> xr.backends.list_engines()
{'netcdf4': <NetCDF4BackendEntrypoint>
   Open netCDF (.nc, .nc4 and .cdf) and most HDF5 files using netCDF4 in Xarray
   Learn more at https://docs.xarray.dev/en/stable/generated/xarray.backends.NetCDF4BackendEntrypoint.html,
 'scipy': <ScipyBackendEntrypoint>
   Open netCDF files (.nc, .nc4, .cdf and .gz) using scipy in Xarray
   Learn more at https://docs.xarray.dev/en/stable/generated/xarray.backends.ScipyBackendEntrypoint.html,
 'gmt': <GMTBackendEntrypoint>
   Open raster (.grd, .nc or .tif) files in Xarray via GMT.
   Learn more at https://github.com/GenericMappingTools/pygmt,
 'rasterio': <RasterioBackend>,
 'store': <StoreBackendEntrypoint>
   Open AbstractDataStore instances in Xarray
   Learn more at https://docs.xarray.dev/en/stable/generated/xarray.backends.StoreBackendEntrypoint.html}

So, when reading netCDF files, the default engine is always netcdf4; but when reading .tif files, the default engine is "gmt", not "rasterio". This can be verified with:

>>> xr.open_dataarray("earth_day_01d_p.tif")
...
TypeError: GMTBackendEntrypoint.open_dataset() missing 1 required keyword-only argument: 'decode_kind'

For the list of engines, standard engines like "netcdf4", "h5netcdf", and "scipy" are prioritized first. Any additional engines (like "gmt" and "rasterio") are then sorted alphabetically. This explains why "gmt" comes before "rasterio" and ends up being selected as the default for .tif files.

That raises the question: can we really trust the engine order?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the engine order doesn't look very deterministic, with ones like netcdf4 and scipy given higher priority based on historical reasons. Ideal case would be for backend-specific arguments to be passed into the engine parameter as suggested at pydata/xarray#8447, with proof of concept PR at pydata/xarray#8520. Then the code would look something like:

engine = GMTBackendEntrypoint(decode_kind="grid")
ds = xr.open_dataset("some_file.nc", engine=engine)

But that PR doesn't look like it will be merged anytime soon, so we still need to pass decode_kind via backend_kwargs. In that case, I suppose we should just remove the guess_can_open method from GMTBackendEntrypoint so that engine="gmt" always needs to be passed in explicitly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I suppose we should just remove the guess_can_open method from GMTBackendEntrypoint so that engine="gmt" always needs to be passed in explicitly.

Yes, I feel it's always good to explicitly specify the engine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guess_can_open method has been removed in commit c97bd82.

(I had an idea that we can try detecting if the file starts with @, since only GMT supports it as far as I know, but let's not go there 😆)

"""
Backend open_dataset method used by Xarray in :py:func:`~xarray.open_dataset`.
"""
try:
ext = Path(filename_or_obj).suffix
except TypeError:
return False
return ext in {".grd", ".nc", ".tif"}

Check warning on line 72 in pygmt/xarray_backend.py

View check run for this annotation

Codecov / codecov/patch

pygmt/xarray_backend.py#L68-L72

Added lines #L68 - L72 were not covered by tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about .tiff?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return ext in {".grd", ".nc", ".tif"}
return ext in {".grd", ".nc", ".tif", ".tiff"}

Copy link
Member Author

@weiji14 weiji14 Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, added .tiff. Any other known extensions to add? E.g. based on https://docs.generic-mapping-tools.org/6.5/reference/file-formats.html#grid-files?

3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,9 @@ all = [
"rioxarray",
]

[project.entry-points."xarray.backends"]
gmt = "pygmt.xarray_backend:GMTBackendEntrypoint"

[project.urls]
"Homepage" = "https://www.pygmt.org"
"Documentation" = "https://www.pygmt.org"
Expand Down
Loading