Skip to content

Fix out-of-bounds heap access in internals.pyx. #47935

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 8, 2022

Conversation

hawkinsp
Copy link
Contributor

@hawkinsp hawkinsp commented Aug 3, 2022

If blknos is empty, we unconditionally access blknos[start] where start
is 0. This is an out-of-bounds heap access that can be caught by
AddressSanitizer, but it's easy to avoid since
we are going to ignore the result anyway.

No new tests, because this does not change any observable behaviors.

If blknos is empty, we unconditionally access blknos[start] where start
is 0. This is an out-of-bounds heap access that can be caught by
AddressSanitizer, but it's easy to avoid since
we are going to ignore the result anyway.
Copy link
Member

@mzeitlin11 mzeitlin11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hawkinsp, nice catch!

If you caught this with ASAN, do you mind explaining how you built + ran with ASAN to find this? (For context, it would be really useful for there to be a snippet in the developer guide about building with ASAN - there some issues especially related to memory leaks where ASAN would be helpful (especially when developing OS's that don't easily support valgrind).

@mzeitlin11 mzeitlin11 added the Internals Related to non-user accessible pandas implementation label Aug 3, 2022
@mzeitlin11 mzeitlin11 added this to the 1.5 milestone Aug 3, 2022
@hawkinsp
Copy link
Contributor Author

hawkinsp commented Aug 3, 2022

If you caught this with ASAN, do you mind explaining how you built + ran with ASAN to find this? (For context, it would be really useful for there to be a snippet in the developer guide about building with ASAN - there some issues especially related to memory leaks where ASAN would be helpful (especially when developing OS's that don't easily support valgrind).

I caught this in a Google-internal build, and I'm actually not sure how to reproduce it outside our environment.

In essence I found this using a statically-linked build where a self-built Python interpreter, NumPy, pandas, etc were all instrumented with ASAN. I don't think you could even find this particular bug without instrumenting both NumPy and pandas because the dereference here is of a NumPy array. It may or may not be necessary to be using a statically-linked build, which is a nonstandard configuration of Python and its modules we use internally.

While it's probably possible to reproduce the asan report outside our environment with some persistence it doesn't seem easy to do. I'm sorry that's a dissatisfying answer!

@mzeitlin11
Copy link
Member

I caught this in a Google-internal build, and I'm actually not sure how to reproduce it outside our environment.

In essence I found this using a statically-linked build where a self-built Python interpreter, NumPy, pandas, etc were all instrumented with ASAN. I don't think you could even find this particular bug without instrumenting both NumPy and pandas because the dereference here is of a NumPy array. It may or may not be necessary to be using a statically-linked build, which is a nonstandard configuration of Python and its modules we use internally.

While it's probably possible to reproduce the asan report outside our environment with some persistence it doesn't seem easy to do. I'm sorry that's a dissatisfying answer!

Thanks for explaining - I worried this would be the case. I remember trying to build pandas with ASAN on macOS and running into the issue of needing to instrument numpy also and running into many build issues and warnings. It got very complicated very quickly :)

@mroeschke mroeschke merged commit 74e08d7 into pandas-dev:main Aug 8, 2022
@mroeschke
Copy link
Member

Thanks @hawkinsp. Really happy to have more of these types of fixes found by ASAN

noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022
If blknos is empty, we unconditionally access blknos[start] where start
is 0. This is an out-of-bounds heap access that can be caught by
AddressSanitizer, but it's easy to avoid since
we are going to ignore the result anyway.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants