Description
If you're unclear about how PyGMT tests images, please read the "Testing plots" section in the contributing guides first.
In short, for image-based tests, we need to specify the baseline/reference image. When we make any changes to the code, we can generate the new "test" image and compare it with the "baseline" image. If the two images are different, then we know the changes break the tests. The most important thing is, to ensure that the "baseline" images are correct.
Currently, we have two different methods to generate the "baseline" image and compare them:
- using the decorator
@pytest.mark.mpl_image_compare
- using the decorator
@check_figures_equal()
The @pytest.mark.mpl_image_compare
method is the most straightforward way to do image testing. Using the decorator, we need to generate baseline images, check their correctness, and store them in the repository (https://github.com/GenericMappingTools/pygmt/tree/master/pygmt/tests/baseline).
Pros:
- We can visually check the baseline images to make sure they are correct
Cons:
- Have to store the static PNG images in the repository. The repository size grows quickly.
To avoid storing many large static images in the repository, we (mainly @weiji14 and @seisman) had some discussions (in #451, #522) and developed the @check_figures_equal
decorator (#555, #590, #600).
Below is an example test using the @check_figures_equal()
decorator:
pygmt/pygmt/tests/test_basemap.py
Lines 67 to 77 in e057927
In this example, the baseline/reference image fig_ref
is generated using basemap(R="0/360/0/1000", J="P6i", B="afg")
, while the test image fig_test
is generated using basemap(region=[0, 360, 0, 1000], projection="P6i", frame="afg")
. We can't see what the baseline image looks like, but we're somehow confident that the baseline image is correct, because the basemap
wrapper is very simple.
Pros:
- Don't need to store static images in the repository, thus keep the repository size small
Cons:
- For each test, we have to generate two images (baseline and test images), which doubles the execution time
- We can't visually check the correctness of the baseline images
- If we decided to disable single-character parameters (i.e,
J="X10c/10c"
is disallowed) as proposed in Disallow single character arguments #262 (also related to Fail for invalid input arguments #256), then most of the code for generating reference images will be invalid.
For some complicated wrappers, we even can't easily know if the reference image is correct. For example,
pygmt/pygmt/tests/test_subplot.py
Lines 30 to 42 in e057927
In this test, we expect that the baseline image has a 2-row-by-1-column subplot layout. However, if we make a silly mistake in Figure.subplot
, resulting in a 1-row-by-2-column layout, the test still passes, because both the baseline and test images have the same "wrong" layout. Then the test is useless to us.
Almost every plotting tools have to decide if they want to store static images in the repository. There are some similar discussions in the upstream GMT project (GenericMappingTools/gmt#3470) and the matplotlib project (matplotlib/matplotlib#16447).
As we're having more active developers now, I think we should rethink how we want to test PyGMT.