Skip to content

Rethink the testing mechanism for images #963

Closed
@seisman

Description

@seisman

If you're unclear about how PyGMT tests images, please read the "Testing plots" section in the contributing guides first.


In short, for image-based tests, we need to specify the baseline/reference image. When we make any changes to the code, we can generate the new "test" image and compare it with the "baseline" image. If the two images are different, then we know the changes break the tests. The most important thing is, to ensure that the "baseline" images are correct.

Currently, we have two different methods to generate the "baseline" image and compare them:

  1. using the decorator @pytest.mark.mpl_image_compare
  2. using the decorator @check_figures_equal()

The @pytest.mark.mpl_image_compare method is the most straightforward way to do image testing. Using the decorator, we need to generate baseline images, check their correctness, and store them in the repository (https://github.com/GenericMappingTools/pygmt/tree/master/pygmt/tests/baseline).

Pros:

  1. We can visually check the baseline images to make sure they are correct

Cons:

  1. Have to store the static PNG images in the repository. The repository size grows quickly.

To avoid storing many large static images in the repository, we (mainly @weiji14 and @seisman) had some discussions (in #451, #522) and developed the @check_figures_equal decorator (#555, #590, #600).

Below is an example test using the @check_figures_equal() decorator:

@check_figures_equal()
def test_basemap_polar():
"""
Create a polar basemap plot.
"""
fig_ref, fig_test = Figure(), Figure()
# Use single-character arguments for the reference image
fig_ref.basemap(R="0/360/0/1000", J="P6i", B="afg")
fig_test.basemap(region=[0, 360, 0, 1000], projection="P6i", frame="afg")
return fig_ref, fig_test

In this example, the baseline/reference image fig_ref is generated using basemap(R="0/360/0/1000", J="P6i", B="afg"), while the test image fig_test is generated using basemap(region=[0, 360, 0, 1000], projection="P6i", frame="afg"). We can't see what the baseline image looks like, but we're somehow confident that the baseline image is correct, because the basemap wrapper is very simple.

Pros:

  1. Don't need to store static images in the repository, thus keep the repository size small

Cons:

  1. For each test, we have to generate two images (baseline and test images), which doubles the execution time
  2. We can't visually check the correctness of the baseline images
  3. If we decided to disable single-character parameters (i.e, J="X10c/10c" is disallowed) as proposed in Disallow single character arguments #262 (also related to Fail for invalid input arguments  #256), then most of the code for generating reference images will be invalid.

For some complicated wrappers, we even can't easily know if the reference image is correct. For example,

@check_figures_equal()
def test_subplot_direct():
"""
Plot map elements to subplot directly using the panel parameter.
"""
fig_ref, fig_test = Figure(), Figure()
with fig_ref.subplot(nrows=2, ncols=1, Fs="3c/3c"):
fig_ref.basemap(region=[0, 3, 0, 3], frame="af", panel=0)
fig_ref.basemap(region=[0, 3, 0, 3], frame="af", panel=1)
with fig_test.subplot(nrows=2, ncols=1, subsize=("3c", "3c")):
fig_test.basemap(region=[0, 3, 0, 3], frame="af", panel=[0, 0])
fig_test.basemap(region=[0, 3, 0, 3], frame="af", panel=[1, 0])
return fig_ref, fig_test

In this test, we expect that the baseline image has a 2-row-by-1-column subplot layout. However, if we make a silly mistake in Figure.subplot, resulting in a 1-row-by-2-column layout, the test still passes, because both the baseline and test images have the same "wrong" layout. Then the test is useless to us.


Almost every plotting tools have to decide if they want to store static images in the repository. There are some similar discussions in the upstream GMT project (GenericMappingTools/gmt#3470) and the matplotlib project (matplotlib/matplotlib#16447).


As we're having more active developers now, I think we should rethink how we want to test PyGMT.

Metadata

Metadata

Assignees

No one assigned

    Labels

    maintenanceBoring but important stuff for the core devs

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions