-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
REF: centralize pyarrow Table to pandas conversions and types_mapper handling #60324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -48,10 +48,7 @@ | |
is_object_dtype, | ||
is_string_dtype, | ||
) | ||
from pandas.core.dtypes.dtypes import ( | ||
ArrowDtype, | ||
DatetimeTZDtype, | ||
) | ||
from pandas.core.dtypes.dtypes import DatetimeTZDtype | ||
from pandas.core.dtypes.missing import isna | ||
|
||
from pandas import get_option | ||
|
@@ -67,6 +64,8 @@ | |
from pandas.core.internals.construction import convert_object_array | ||
from pandas.core.tools.datetimes import to_datetime | ||
|
||
from pandas.io._util import arrow_table_to_pandas | ||
|
||
if TYPE_CHECKING: | ||
from collections.abc import ( | ||
Callable, | ||
|
@@ -2208,23 +2207,13 @@ def read_table( | |
else: | ||
stmt = f"SELECT {select_list} FROM {table_name}" | ||
|
||
mapping: type[ArrowDtype] | None | Callable | ||
if dtype_backend == "pyarrow": | ||
mapping = ArrowDtype | ||
elif dtype_backend == "numpy_nullable": | ||
from pandas.io._util import _arrow_dtype_mapping | ||
|
||
mapping = _arrow_dtype_mapping().get | ||
elif using_string_dtype(): | ||
from pandas.io._util import arrow_string_types_mapper | ||
|
||
mapping = arrow_string_types_mapper() | ||
else: | ||
mapping = None | ||
|
||
with self.con.cursor() as cur: | ||
cur.execute(stmt) | ||
df = cur.fetch_arrow_table().to_pandas(types_mapper=mapping) | ||
pa_table = cur.fetch_arrow_table() | ||
dtype_backend = ( | ||
lib.no_default if dtype_backend == "numpy" else dtype_backend | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any harm to forwarding along the argument of "numpy"? Seems a little strange to handle this one-off in some invocations There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's also strange that we use "numpy" as the default in the SQL code, and Now, I can certainly forward it and handle it inside There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm that seems like an issue in the SQL code. Looks like the read_sql API uses lib.no_default so that must be getting messed up in the internals. Let's keep this PR the way it is and track in #60326 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for opening that issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good. Still something to be cleaned up there, though not urgent for now |
||
) | ||
df = arrow_table_to_pandas(pa_table, dtype_backend=dtype_backend) | ||
|
||
return _wrap_result_adbc( | ||
df, | ||
|
@@ -2292,23 +2281,13 @@ def read_query( | |
if chunksize: | ||
raise NotImplementedError("'chunksize' is not implemented for ADBC drivers") | ||
|
||
mapping: type[ArrowDtype] | None | Callable | ||
if dtype_backend == "pyarrow": | ||
mapping = ArrowDtype | ||
elif dtype_backend == "numpy_nullable": | ||
from pandas.io._util import _arrow_dtype_mapping | ||
|
||
mapping = _arrow_dtype_mapping().get | ||
elif using_string_dtype(): | ||
from pandas.io._util import arrow_string_types_mapper | ||
|
||
mapping = arrow_string_types_mapper() | ||
else: | ||
mapping = None | ||
|
||
with self.con.cursor() as cur: | ||
cur.execute(sql) | ||
df = cur.fetch_arrow_table().to_pandas(types_mapper=mapping) | ||
pa_table = cur.fetch_arrow_table() | ||
dtype_backend = ( | ||
lib.no_default if dtype_backend == "numpy" else dtype_backend | ||
) | ||
df = arrow_table_to_pandas(pa_table, dtype_backend=dtype_backend) | ||
|
||
return _wrap_result_adbc( | ||
df, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this is for compatability, but is this a feature or a bug that the CSV reader does this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea, I didn't look in detail at why this is happening in the CSV reader, for now just wanded to keep the same behaviour (but this is certainly an ugly keyword, the problem with centralizing the conversion as I am doing, it's not otherwise possible to change it on the CSV side)