Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df = pd.DataFrame({"col": [True, False, True]})
df.to_csv("example.csv", index=False)
df2 = pd.read_csv("example.csv", dtype={"col": "bool[pyarrow]"})
Issue Description
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:912, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
899 kwds_defaults = _refine_defaults_read(
900 dialect,
901 delimiter,
(...)
908 dtype_backend=dtype_backend,
909 )
910 kwds.update(kwds_defaults)
--> 912 return _read(filepath_or_buffer, kwds)
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:583, in _read(filepath_or_buffer, kwds)
580 return parser
582 with parser:
--> 583 return parser.read(nrows)
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1704, in TextFileReader.read(self, nrows)
1697 nrows = validate_integer("nrows", nrows)
1698 try:
1699 # error: "ParserBase" has no attribute "read"
1700 (
1701 index,
1702 columns,
1703 col_dict,
-> 1704 ) = self._engine.read( # type: ignore[attr-defined]
1705 nrows
1706 )
1707 except Exception:
1708 self.close()
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py:234, in CParserWrapper.read(self, nrows)
232 try:
233 if self.low_memory:
--> 234 chunks = self._reader.read_low_memory(nrows)
235 # destructive to chunks
236 data = _concatenate_chunks(chunks)
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:812, in pandas._libs.parsers.TextReader.read_low_memory()
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:889, in pandas._libs.parsers.TextReader._read_rows()
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:1034, in pandas._libs.parsers.TextReader._convert_column_data()
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:1073, in pandas._libs.parsers.TextReader._convert_tokens()
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:1173, in pandas._libs.parsers.TextReader._convert_with_dtype()
TypeError: _from_sequence_of_strings() got an unexpected keyword argument 'true_values'
Expected Behavior
In the above example, I would expect that the CSV data is loaded onto a DataFrame with PyArrow-backed types. This behavior works when the type is a string, int or float. However, it produces an error when bool[arrow]
is specified.
Installed Versions
INSTALLED VERSIONS
commit : 37ea63d
python : 3.9.13.final.0
python-bits : 64
OS : Darwin
OS-release : 22.4.0
Version : Darwin Kernel Version 22.4.0: Mon Mar 6 21:00:41 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T8103
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.0.1
numpy : 1.24.2
pytz : 2023.3
dateutil : 2.8.2
setuptools : 67.6.1
pip : 23.1.2
Cython : None
pytest : 7.3.1
hypothesis : None
sphinx : 6.1.3
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : 1.0.3
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.12.0
pandas_datareader: None
bs4 : 4.12.2
bottleneck : None
brotli : None
fastparquet : None
fsspec : 2023.4.0
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 12.0.0
pyreadstat : None
pyxlsb : None
s3fs : 0.4.2
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None