-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Switch arrow type for string array to large string #56220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 16 commits
752e368
d813e8d
ed07537
c0c42a8
3196f32
e807652
1341ffc
6dc3f20
848f7ed
7244889
a24ebac
3d90cc7
46d7f16
3fdf256
c2bd9d2
a01d4e5
a22e625
847b74c
47fda87
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -172,9 +172,15 @@ def _convert_arrays_to_dataframe( | |
) | ||
if dtype_backend == "pyarrow": | ||
pa = import_optional_dependency("pyarrow") | ||
arrays = [ | ||
ArrowExtensionArray(pa.array(arr, from_pandas=True)) for arr in arrays | ||
] | ||
import pyarrow.compute as pc | ||
|
||
result_arrays = [] | ||
for arr in arrays: | ||
pa_array = pa.array(arr, from_pandas=True) | ||
if arr.dtype == "string": | ||
pa_array = pc.cast(pa_array, pa.string()) | ||
result_arrays.append(ArrowExtensionArray(pa_array)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the reason for this cast? (and maybe add a comment about it) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. arrow is inferring this as regular strings, I think we had failing tests without this cast There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I'm still confused about this as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm the comment above was incorrect, its like this: We are now using large_string in our String Extension arrays, e.g. if you convert this to an ArrowExtensionArray it will also be large_string. This is inconsistent with the other I/O methods where ArrowExtensionArray is still pa.string, that's why I am casting it back here. I am happy to change this as well, but rather in a follow up There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah okay that makes sense. I'm OK with this then but would be good to have a |
||
arrays = result_arrays # type: ignore[assignment] | ||
if arrays: | ||
df = DataFrame(dict(zip(list(range(len(columns))), arrays))) | ||
df.columns = columns | ||
|
Uh oh!
There was an error while loading. Please reload this page.