Closed
Description
We have a failure in pandas/core/test_downstream.py
in our CI, which I guess it's caused by a change in Dask.
The reason is that when creating a Series
from a dask dataframe, the object is treated as a dict, since it's a dict-like, but then we check if the object is empty by calling bool()
on it, and dask raises. This is the behavior causing the issue:
>>> from pandas.core.dtypes.common import is_dict_like
>>> data = dask.dataframe.from_array(numpy.array([1, 2, 3]))
>>> is_dict_like(data)
True
>>> bool(data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mgarcia/.miniforge3/envs/pandas-dev/lib/python3.10/site-packages/dask_expr/_collection.py", line 435, in __bool__
raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.any() or a.all().
I personally don't understand this test much, creating a Series
from a DataFrame
seems a bit odd, what's the expected output, a Series where each element is another Series
?
This also seems to be failing when the dataframe is a pandas dataframe instead of a dask one, but we don't seem to have a test to check that it works:
>>> df = pandas.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> pandas.Series(df)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mgarcia/src/pandas/pandas/core/series.py", line 456, in __init__
data, index = self._init_dict(data, index, dtype)
File "/home/mgarcia/src/pandas/pandas/core/series.py", line 546, in _init_dict
if data:
File "/home/mgarcia/src/pandas/pandas/core/generic.py", line 1482, in __nonzero__
raise ValueError(
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Any objection to just get rid of the test @mroeschke @phofl ?