Skip to content

float-cast-overflow: converting NaN to integers in numpy C code triggered by pandas #12303

Closed
@vonosmas

Description

@vonosmas

Converting floating-point value NaN to any integer data type is an undefined behavior in C. However, it actually happens in numpy extension module, which is probably caused by incorrect usage of it from pandas. If the former is built with Clang+UBSan (http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html), there are error reports indicating the issue.

Now, this is somewhat tricky to reproduce, because it involves building NumPy C code with UBSan. The following instructions should work on Ubuntu 14.04

  git clone git://github.com/numpy/numpy.git
  cd numpy
  CC=clang CXX=clang++ LDSHARED=clang CFLAGS="-fsanitize=float-cast-overflow" python setup.py install
  • Fetch latest pandas
  • Export ASan runtime library to provide UBSan implementation, setup runtime flags for sanitizers:
export ASAN_OPTIONS=detect_leaks=0
export UBSAN_OPTIONS=print_stacktrace=1
export LD_PRELOAD=/lib/clang/3.9.0/lib/linux/libclang_rt.asan-x86_64.so 
  • Build pandas
cd pandas
python setup.py build_ext --inplace
python setup.py install
  • Run tests from the test suite triggering the issue
nosetests pandas/tests/test_groupby.py:TestGroupBy.test_agg_nested_dicts
numpy/core/src/multiarray/lowlevel_strided_loops.c.src:865:17: runtime error: value nan is outside the range of representable values of type 'long'
    #0 0x7fc581bffe41 in _cast_double_to_long /usr/local/google/numpy/numpy/core/src/multiarray/lowlevel_strided_loops.c.src:865:17
    #1 0x7fc581b7faf7 in raw_array_assign_array /usr/local/google/numpy/numpy/core/src/multiarray/array_assign_array.c:96:9
    #2 0x7fc581b80355 in PyArray_AssignArray /usr/local/google/numpy/numpy/core/src/multiarray/array_assign_array.c:351:13
    #3 0x7fc581c19a80 in array_astype /usr/local/google/numpy/numpy/core/src/multiarray/methods.c:832:13
    #4 0x49968c in PyEval_EvalFrameEx (/usr/bin/python2.7+0x49968c)

nosetests pandas/tseries/tests/test_resample.py:TestResample.test_custom_grouper
numpy/core/src/multiarray/lowlevel_strided_loops.c.src:867:22: runtime error: value nan is outside the range of representable values of type 'long'
    #0 0x7f04ba194da1 in _aligned_cast_double_to_long /usr/local/google/numpy/numpy/core/src/multiarray/lowlevel_strided_loops.c.src:867:22
    #1 0x7f04ba13d5d2 in PyArray_CastRawArrays /usr/local/google/numpy/numpy/core/src/multiarray/dtype_transfer.c:3843:5
    #2 0x7f04ba11485f in PyArray_AssignRawScalar /usr/local/google/numpy/numpy/core/src/multiarray/array_assign_scalar.c:248:13
    #3 0x7f04ba11f28a in PyArray_FillWithScalar /usr/local/google/numpy/numpy/core/src/multiarray/convert.c:464:19
    #4 0x7f04ba1af2a0 in array_fill /usr/local/google/numpy/numpy/core/src/multiarray/methods.c:150:9
    #5 0x49968c in PyEval_EvalFrameEx (/usr/bin/python2.7+0x49968c)

It seems that pandas are using "np.nan" to aggressively: by calling astype(<integer_type>) on arrays that can contain NaNs, and in calling fill(np.nan) on np.empty() arrays of integral types.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BuildLibrary building on various platformsLinuxLinux OSTestingpandas testing functions or related to the test suite

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions