Skip to content

TST: test_tofile_fromfile now uses initialized memory #12330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 7, 2018

Conversation

tylerjereddy
Copy link
Contributor

Try removing pathlib from Shippable CI config since apparently no other NumPy CI config uses this, it is apparently prone to causing issues, and depending on the backport maintenance of a Python 3 standard library module may be a bad idea anyway.

An alternative may be to switch to the newer pathlib2 2.x backport, but I don't think we've done that for any other CI either.

Maybe good to check the test skip counts for shippable in this PR -- I think for 2.x quite a few more show up without pathlib so that was my original reasoning for using it.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Nov 5, 2018

@tylerjereddy sometimes I feel like I work for the CIs....

@charris
Copy link
Member

charris commented Nov 5, 2018

ci/circleci complains about using stretch. I don't know if that is something for them to fix, or for us.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Nov 5, 2018

I think the debian archives are just down http://deb.debian.org/debian

@eric-wieser
Copy link
Member

apparently prone to causing issues

Can you link to such an issue?

@tylerjereddy
Copy link
Contributor Author

@eric-wieser
Copy link
Member

eric-wieser commented Nov 5, 2018

It's not obvious to me that the failure is caused by the backport - it might be that we actually have a real problem on python 2, but our tests that expose it happen to use pathlib.

If nothing else, I was trying to add explicit backport support in #12157, so it's nice to have CI that tests it somewhere.

It's pretty unfortunate that the CI output is truncated to the point of not actually showing the diff. I'd prefer that we work out why the test is failing rather than finding an easy way to disable it.

@charris
Copy link
Member

charris commented Nov 5, 2018

Agree with @eric-wieser that it would be good to understand why there is the failure. The fact that it is not reliably repeatable is a complication there, but many of us have Python 2.7 available, so it might be good to test. I would also be inclined to try pathlib2 if that is better maintained. What do we need to do to test, just install a backport?

@charris charris added this to the 1.16.0 release milestone Nov 5, 2018
@charris
Copy link
Member

charris commented Nov 5, 2018

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Nov 5, 2018

If pathlib is inconsistent (the last commit was 2015), then why try importing it at all in python2.

What would the logic look like if you try to support pathlib2?

if sys.version_info[0] >= 3:
    from pathlib import Path, PurePath
else:
    try:
        from pathlib2 import Path, PurePath
    except ImportError:
        try:
            from pathlib import Path, PurePath  # <- inconsistent behaviour expected?
        except ImportError:
            Path = PurePath = None

@tylerjereddy
Copy link
Contributor Author

@hmaarrfk I think the backport behavior is to just automagically show-up in the namespace where the Python 3 module normally is, so if it is a truly reliable backport then you don't need conditionals.

I've noted the feedback requesting investigation of the issue -- I'll try to take a look tomorrow.

@tylerjereddy tylerjereddy changed the title TST: remove pathlib from shippable CI WIP, TST: pathlib / shippable CI Nov 5, 2018
@hmaarrfk
Copy link
Contributor

hmaarrfk commented Nov 5, 2018

nevermind

is it because origin pos is never set?

    /* For Python 2 PyFileObject, use PyFile_AsFile */
#if !defined(NPY_PY3K)
    if (PyFile_Check(file)) {
        return PyFile_AsFile(file);
    }
#endif

nevermind:

    /* For Python 2 PyFileObject, do nothing */
#if !defined(NPY_PY3K)
    if (PyFile_Check(file)) {
        return 0;
    }
#endif

@paulmueller
Copy link
Contributor

@tylerjereddy In fact, this test failed at least once when the above mentioned PR was still open (#11348 (comment)). I thought it was a temporary issue with Shippable. I also doubt that this has anything to do with pathlib on Python2.

@tylerjereddy
Copy link
Contributor Author

Yeah, I'm leaning towards np.empty() and using assert_array_equal -- I don't think that is reliably portable for the precision issue.

@tylerjereddy
Copy link
Contributor Author

The test will pass if I use assert_allclose for the first two fields, and the original assert_array_equal for the third field

@tylerjereddy
Copy link
Contributor Author

Perhaps ARMv8 is prone to slightly different floating pulls / precision issues with empty() random picks and / or the comparison of said picks

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Nov 5, 2018

It might honestly be that an np.nan is showing up in the float field somehow. then:

In [3]: np.nan == np.nan                                                      
Out[3]: False

@tylerjereddy
Copy link
Contributor Author

In contrast to the standard usage in numpy, NaNs are compared like numbers, no assertion is raised if both objects have NaNs in the same positions.

From the assert_array_equal docs. So NaNs should be ok, but those big e-309 style np.empty() draws are almost certainly problematic

@tylerjereddy tylerjereddy force-pushed the shippable_remove_pathlib branch from 39394a4 to d4a11bf Compare November 5, 2018 23:08
@hmaarrfk
Copy link
Contributor

hmaarrfk commented Nov 5, 2018

oooohhh shippable is running aarch64. Does it have a software FPU or hardware FPU?

@tylerjereddy tylerjereddy changed the title WIP, TST: pathlib / shippable CI TST: test_tofile_fromfile precision adjustment for shippable CI Nov 5, 2018
@tylerjereddy
Copy link
Contributor Author

I've updated this PR to use assert_allclose on the numeric fields in the "structured" arrays, but still require an exact match on the bytes-like things I see in the third field for the problematic (on ARMv8) test.

This fixes the comparison in my own local tests anyway--the full contents of the x and a arrays are visible in my debug fork shippable run here with print statements. I confirmed locally that those values are exactly the same by visual inspection at all elements, but assert_array_equal just seems too strict for the float comparisons in that case, as far as I can tell.

The super-small e-309 style draws from np.empty() are perhaps a good explanation for the stochastic 2.7 or 3.7 failures--sometimes the floats are ok for exact comparison, but other times not.

@hmaarrfk It is proper ARMv8 hardware instead of emulation at the software level, if that is what you are asking me. I don't know if that is a sufficient explanation for the different floating point draws / comparison susceptibilities, but I think using assert_allclose() on float comparisons is a best practice anyway. Perhaps wasn't chosen initially because you can't use assert_allclose directly on the structured array with non-numeric field 3 elements.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Nov 5, 2018

I guess it kinda makes sense. What I don't understand is if assert_array_equal tests for bitwise equality, why it isn't respected.

Re: hard-fp soft-fp, yes that is what I was asking. I've run into peculiarities of soft-fp before not raising errors or only partiallialy implementing IEEE standards.

1E-309 is a denormal number. I wonder if AARM64 implements comparison for them correctly.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Nov 6, 2018

It is totally the np.nan. The reason is that it is inside a structured array.

import numpy as np
from numpy.testing import assert_array_equal
a = np.zeros(1, dtype='f8,f8')
a[0] = (np.nan, 0)
assert_array_equal(a, a)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-6-96868a4fad08> in <module>
----> 1 assert_array_equal(a, a)

/home/mark/git/numpy/numpy/testing/_private/utils.py in assert_array_equal(x, y, err_msg, verbose)
    871     __tracebackhide__ = True  # Hide traceback for py.test
    872     assert_array_compare(operator.__eq__, x, y, err_msg=err_msg,
--> 873                          verbose=verbose, header='Arrays are not equal')
    874 
    875 

/home/mark/git/numpy/numpy/testing/_private/utils.py in assert_array_compare(comparison, x, y, err_msg, verbose, header, precision, equal_nan, equal_inf)
    795                                 verbose=verbose, header=header,
    796                                 names=('x', 'y'), precision=precision)
--> 797             raise AssertionError(msg)
    798     except ValueError:
    799         import traceback

AssertionError: 
Arrays are not equal

(mismatch 100.0%)
 x: array([(nan, 0.)], dtype=[('f0', '<f8'), ('f1', '<f8')])
 y: array([(nan, 0.)], dtype=[('f0', '<f8'), ('f1', '<f8')])
testing patch to add
diff --git a/numpy/testing/tests/test_utils.py b/numpy/testing/tests/test_utils.py
index e54fbc390..18643b537 100644
--- a/numpy/testing/tests/test_utils.py
+++ b/numpy/testing/tests/test_utils.py
@@ -118,6 +118,10 @@ class TestArrayEqual(_GenericTest):
         c = np.array([1, 2, 3])
         self._test_not_equal(c, b)
 
+    def test_nan_structured_array(self):
+        a = np.array([(np.nan, 0)], dtype='f8,f8')
+        assert_array_equal(a, a)
+
     def test_string_arrays(self):
         """Test two arrays with different shapes are found not equal."""
         a = np.array(['floupi', 'floupa'])

@charris
Copy link
Member

charris commented Nov 6, 2018

Using np.empty is just asking for trouble, or at least, edge cases :) Random or fixed numbers would be better.

@tylerjereddy
Copy link
Contributor Author

@hmaarrfk Right you are I guess. I think the docstring is perhaps a bit confusing since there's some implication of handling nans appropriately there.

Still, maybe not entirely inaccurate to categorize as a floating point issue of some sort.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Nov 6, 2018

I think the line assert_array_equal should be left unchaned in test_tofile_fromfile. Maybe we should follow Charles' suggestion to avoid future edge cases?

I think this should probably be considered a bug in structured arrays, or at least how assert_array_equal works for structured arrays. I can't speak for the authors of assert_array_equal but it seems they wrote the function to test for bitwise equality. Maybe a seperate issue/PR should be opened for that one.

assert_array_equal(x, a)
# check numeric fields within a reasonable
# floating point precision match
assert_allclose(x['f0'], a['f0'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just use assert_array_equal here. seems to work in the failer I showed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as there's a good reason for that with floating point numbers -- we don't want to check within i.e., 1e-7 precision, we want an "exact" match?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Serializing to and from a file should yield exact results. We are dumping the bits right? At least I use tofile that way.

@eric-wieser
Copy link
Member

In the past, @seberg has made an effort to remove all uses of np.empty in tests, as they cause false positives in Valgrind - I think we should take the same approach here.

@eric-wieser
Copy link
Member

eric-wieser commented Nov 6, 2018

The reason is that it is inside a structured array.

Might be worth opening an issue about that - that seems inconsistent.

but it seems they wrote the function to test for bitwise equality.

Wait till you try using > on structured arrays - IIRC, that does byte-by-byte comparison too, and completely ignores field endianess.

@tylerjereddy tylerjereddy force-pushed the shippable_remove_pathlib branch from d4a11bf to f95d197 Compare November 6, 2018 22:07
@tylerjereddy tylerjereddy changed the title TST: test_tofile_fromfile precision adjustment for shippable CI TST: test_tofile_fromfile now uses initialized memory Nov 6, 2018
@tylerjereddy
Copy link
Contributor Author

Revised based on above comments

@@ -332,7 +332,8 @@ class TestPathUsage(object):
def test_tofile_fromfile(self):
with temppath(suffix='.bin') as path:
path = Path(path)
a = np.empty(10, dtype='f8,i4,a5')
np.random.seed(123)
a = np.random.rand(10).astype('f8,i4,a5')
a[5] = (0.5,10,'abcde')
a.newbyteorder('<')
Copy link
Member

@charris charris Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, newbyteorder doesn't change anything about a, and I'm not clear about the purpose of setting a[5].

Copy link
Member

@charris charris Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newbyteorder returns a new array. I think it can be omitted and the read should default to native order. Reversing the order of the original could also lead to problematic floats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've revised to remove the byte ordering stuff.

The purpose of setting a[5] might be to put some more "interesting" data into the array for the purpose of the roundtrip--if empty or random put a bunch of zeros in there I suppose one might argue it is a slightly less thorough check than i.e., heterogeneous data. Indeed, with my change, the integer field becomes all zero before that I think, because of the casting.

But that's just speculation -- I didn't write the original test of course.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't write the original test of course.

Yeah, the original tests seems more a test of records than path :)

* replaced usage of np.empty() in the unit test
at TestPathUsage.test_tofile_fromfile with a pinned
random seed array

* the usage of np.empty() was causing stochastic
unit test failures on ARMv8 architecture in both
Python 2.7 and 3.7.
@charris
Copy link
Member

charris commented Nov 7, 2018

Thanks Tyler.

@tylerjereddy tylerjereddy deleted the shippable_remove_pathlib branch November 7, 2018 03:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants