Skip to content

BUG: Segfault due to float_precision='round_trip' #15148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

BUG: Segfault due to float_precision='round_trip' #15148

wants to merge 1 commit into from

Conversation

Rufflewind
Copy link
Contributor

@Rufflewind Rufflewind commented Jan 17, 2017

round_trip calls back into Python, so the GIL must be held. It also fails to silence the Python exception, leading to spurious errors.

Closes #15140.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master | flake8 --diff (Most of them are false positives because flake8 doesn't understand Cython or it was code I didn't even touch)
  • whatsnew entry

@codecov-io
Copy link

codecov-io commented Jan 18, 2017

Current coverage is 85.54% (diff: 100%)

No coverage report found for master at d50eaa8.

Powered by Codecov. Last update d50eaa8...652c14c

@jreback jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions IO CSV read_csv, to_csv labels Jan 18, 2017
@@ -384,3 +384,5 @@ Bug Fixes
- Bug in ``pd.read_csv()`` for the C engine where ``usecols`` were being indexed incorrectly with ``parse_dates`` (:issue:`14792`)

- Bug in ``Series.dt.round`` inconsistent behaviour on NAT's with different arguments (:issue:`14940`)

- Bug in ``pd.read_csv()`` with ``float_precision='round_trip'`` which caused a segfault when a text entry is parsed on Python 2.7 or later
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number; remove Python 2.7 or later (we in fact don't support < 2.7)

@@ -1774,7 +1774,9 @@ double precise_xstrtod(const char *str, char **endptr, char decimal, char sci,
double round_trip(const char *p, char **q, char decimal, char sci, char tsep,
int skip_trailing) {
#if PY_VERSION_HEX >= 0x02070000
return PyOS_string_to_double(p, q, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact the entire ifdef can be removed here, we only support >= 2.7

@@ -483,10 +484,13 @@ cdef class TextReader:
self.verbose = verbose
self.low_memory = low_memory
self.parser.converter = xstrtod
self.parser.converter_nogil = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while we are at it, can you rename self.parser.converter -> self.parser.double_converter (and same with convert_nogil -> double_converter_nogil

@@ -1699,7 +1703,12 @@ cdef _try_double(parser_t *parser, int col, int line_start, int line_end,
result = np.empty(lines, dtype=np.float64)
data = <double *> result.data
na_fset = kset_float64_from_list(na_flist)
with nogil:
if parser.converter_nogil:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put a 1-line comment here, as the naming convention is confusing

@@ -1774,7 +1774,9 @@ double precise_xstrtod(const char *str, char **endptr, char decimal, char sci,
double round_trip(const char *p, char **q, char decimal, char sci, char tsep,
int skip_trailing) {
#if PY_VERSION_HEX >= 0x02070000
return PyOS_string_to_double(p, q, 0);
double r = PyOS_string_to_double(p, q, 0);
PyErr_Clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the test cover the error condition here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. An exception would otherwise occur in test_float_precision_round_trip_with_text.

@Rufflewind
Copy link
Contributor Author

Updated.

@@ -482,11 +485,14 @@ cdef class TextReader:

self.verbose = verbose
self.low_memory = low_memory
self.parser.converter = xstrtod
self.parser.double_converter = xstrtod
self.parser.double_converter_nogil = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of an indicator variable, why don't we do this

self.parser.double_converter_nogil = xstrtod
self.parser.double_converter_withgil = None
if float_precision == ..: 
....
elif float_precsion == 'round_trip':
    self.parser.double_converter_nogil = None
    self.parser.double_converter_withgil = round_trip

Then the function pointer itself is descriptive. same effect, just easier to read (and you don't have to know what the boolean is).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

round_trip calls back into Python, so the GIL must be held.  It also
fails to silence the Python exception, leading to spurious errors.

Closes #15140.
@jreback
Copy link
Contributor

jreback commented Jan 20, 2017

lgtm
ping on green

@jreback jreback closed this in 681e6a9 Jan 20, 2017
@jreback
Copy link
Contributor

jreback commented Jan 20, 2017

thanks @Rufflewind

@jreback jreback added this to the 0.20.0 milestone Jan 20, 2017
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
`round_trip` calls back into Python, so the GIL must be held.  It also
fails to silence the Python exception, leading to spurious errors.
Closes pandas-dev#15140.

Author: Phil Ruffwind <rf@rufflewind.com>

Closes pandas-dev#15148 from Rufflewind/master and squashes the following commits:

c513d2e [Phil Ruffwind] BUG: Segfault due to float_precision='round_trip'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segmentation fault with read_csv(io.StringIO("a\na"), float_precision="round_trip")
3 participants