Open
Description
What happens is the mean is computed on each axis in turn (mean of mean). When no nans are involved, we get theoretically the same result. In practice, we loose some precision but it was deemed acceptable so far. However, when nans are involved, the result is significantly wrong.
>>> arr = Array([[1, 3], [4, nan]], [Axis('a=a0,a1'), Axis('b=b0,b1')])
>>> arr
a\b b0 b1
a0 1.0 3.0
a1 4.0 nan
>>> arr.mean("a0,a1 >> a01", "b0,b1 >> b01")
2.75
While this should be 2.6666... What happens is that it computes:
>>> (((1 + 4) / 2) + ((3 + 0) / 1)) / 2
2.75
>>> 1/4 + 4/4 + 3/2
2.75
Instead of:
>>> (1 + 4 + 3) / 3
2.6666666666666665
>>> 1/3 + 4/3 + 3/3
2.6666666666666665
As a workaround until larray 0.35 is released, I have recommended using:
>>> # TODO: do not use this function anymore when larray 0.35 will be available
... def nd_mean(array, axes_or_groups):
... """
... Computes the mean of array over axes_or_groups.
...
... This function is temporarily necessary because larray versions up to (and including) 0.34.x
... behave badly when computing the means on groups over several dimensions
... when some values are nans. See https://github.com/larray-project/larray/issues/1118
... """
... return array.sum(*axes_or_groups) / (~isnan(array)).sum(*axes_or_groups)
>>> nd_mean(arr, ("a0,a1 >> a01", "b0,b1 >> b01"))
2.6666666666666665