Skip to content

Commit b2f6700

Browse files
committed
Merge remote-tracking branch 'upstream/main' into sql
# Conflicts: # pandas/tests/io/test_sql.py
2 parents 9f8c24f + 8117a55 commit b2f6700

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+832
-620
lines changed

asv_bench/benchmarks/array.py

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,5 +90,46 @@ def time_setitem_list(self, multiple_chunks):
9090
def time_setitem_slice(self, multiple_chunks):
9191
self.array[::10] = "foo"
9292

93+
def time_setitem_null_slice(self, multiple_chunks):
94+
self.array[:] = "foo"
95+
9396
def time_tolist(self, multiple_chunks):
9497
self.array.tolist()
98+
99+
100+
class ArrowExtensionArray:
101+
102+
params = [
103+
[
104+
"boolean[pyarrow]",
105+
"float64[pyarrow]",
106+
"int64[pyarrow]",
107+
"string[pyarrow]",
108+
"timestamp[ns][pyarrow]",
109+
],
110+
[False, True],
111+
]
112+
param_names = ["dtype", "hasna"]
113+
114+
def setup(self, dtype, hasna):
115+
N = 100_000
116+
if dtype == "boolean[pyarrow]":
117+
data = np.random.choice([True, False], N, replace=True)
118+
elif dtype == "float64[pyarrow]":
119+
data = np.random.randn(N)
120+
elif dtype == "int64[pyarrow]":
121+
data = np.arange(N)
122+
elif dtype == "string[pyarrow]":
123+
data = tm.rands_array(10, N)
124+
elif dtype == "timestamp[ns][pyarrow]":
125+
data = pd.date_range("2000-01-01", freq="s", periods=N)
126+
else:
127+
raise NotImplementedError
128+
129+
arr = pd.array(data, dtype=dtype)
130+
if hasna:
131+
arr[::2] = pd.NA
132+
self.arr = arr
133+
134+
def time_to_numpy(self, dtype, hasna):
135+
self.arr.to_numpy()

asv_bench/benchmarks/reshape.py

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,17 @@
1515

1616

1717
class Melt:
18-
def setup(self):
19-
self.df = DataFrame(np.random.randn(10000, 3), columns=["A", "B", "C"])
20-
self.df["id1"] = np.random.randint(0, 10, 10000)
21-
self.df["id2"] = np.random.randint(100, 1000, 10000)
18+
params = ["float64", "Float64"]
19+
param_names = ["dtype"]
20+
21+
def setup(self, dtype):
22+
self.df = DataFrame(
23+
np.random.randn(100_000, 3), columns=["A", "B", "C"], dtype=dtype
24+
)
25+
self.df["id1"] = pd.Series(np.random.randint(0, 10, 10000))
26+
self.df["id2"] = pd.Series(np.random.randint(100, 1000, 10000))
2227

23-
def time_melt_dataframe(self):
28+
def time_melt_dataframe(self, dtype):
2429
melt(self.df, id_vars=["id1", "id2"])
2530

2631

ci/deps/actions-38-downstream_compat.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,6 @@ dependencies:
5656
- zstandard
5757

5858
# downstream packages
59-
- aiobotocore
6059
- botocore
6160
- cftime
6261
- dask

doc/source/development/contributing_environment.rst

Lines changed: 51 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -15,24 +15,11 @@ locally before pushing your changes. It's recommended to also install the :ref:`
1515
.. contents:: Table of contents:
1616
:local:
1717

18+
Step 1: install a C compiler
19+
----------------------------
1820

19-
Option 1: creating an environment without Docker
20-
------------------------------------------------
21-
22-
Installing a C compiler
23-
~~~~~~~~~~~~~~~~~~~~~~~
24-
25-
pandas uses C extensions (mostly written using Cython) to speed up certain
26-
operations. To install pandas from source, you need to compile these C
27-
extensions, which means you need a C compiler. This process depends on which
28-
platform you're using.
29-
30-
If you have setup your environment using :ref:`mamba <contributing.mamba>`, the packages ``c-compiler``
31-
and ``cxx-compiler`` will install a fitting compiler for your platform that is
32-
compatible with the remaining mamba packages. On Windows and macOS, you will
33-
also need to install the SDKs as they have to be distributed separately.
34-
These packages will automatically be installed by using the ``pandas``
35-
``environment.yml`` file.
21+
How to do this will depend on your platform. If you choose to user ``Docker``
22+
in the next step, then you can skip this step.
3623

3724
**Windows**
3825

@@ -48,6 +35,9 @@ You will need `Build Tools for Visual Studio 2022
4835
Alternatively, you can install the necessary components on the commandline using
4936
`vs_BuildTools.exe <https://learn.microsoft.com/en-us/visualstudio/install/use-command-line-parameters-to-install-visual-studio?source=recommendations&view=vs-2022>`_
5037

38+
Alternatively, you could use the `WSL <https://learn.microsoft.com/en-us/windows/wsl/install>`_
39+
and consult the ``Linux`` instructions below.
40+
5141
**macOS**
5242

5343
To use the :ref:`mamba <contributing.mamba>`-based compilers, you will need to install the
@@ -71,67 +61,40 @@ which compilers (and versions) are installed on your system::
7161

7262
`GCC (GNU Compiler Collection) <https://gcc.gnu.org/>`_, is a widely used
7363
compiler, which supports C and a number of other languages. If GCC is listed
74-
as an installed compiler nothing more is required. If no C compiler is
75-
installed (or you wish to install a newer version) you can install a compiler
76-
(GCC in the example code below) with::
64+
as an installed compiler nothing more is required.
7765

78-
# for recent Debian/Ubuntu:
79-
sudo apt install build-essential
80-
# for Red Had/RHEL/CentOS/Fedora
81-
yum groupinstall "Development Tools"
82-
83-
For other Linux distributions, consult your favorite search engine for
84-
compiler installation instructions.
66+
If no C compiler is installed, or you wish to upgrade, or you're using a different
67+
Linux distribution, consult your favorite search engine for compiler installation/update
68+
instructions.
8569

8670
Let us know if you have any difficulties by opening an issue or reaching out on our contributor
8771
community :ref:`Slack <community.slack>`.
8872

89-
.. _contributing.mamba:
90-
91-
Option 1a: using mamba (recommended)
92-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
73+
Step 2: create an isolated environment
74+
----------------------------------------
9375

94-
Now create an isolated pandas development environment:
76+
Before we begin, please:
9577

96-
* Install `mamba <https://mamba.readthedocs.io/en/latest/installation.html>`_
97-
* Make sure your mamba is up to date (``mamba update mamba``)
9878
* Make sure that you have :any:`cloned the repository <contributing.forking>`
9979
* ``cd`` to the pandas source directory
10080

101-
We'll now kick off a three-step process:
81+
.. _contributing.mamba:
82+
83+
Option 1: using mamba (recommended)
84+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
10285

103-
1. Install the build dependencies
104-
2. Build and install pandas
105-
3. Install the optional dependencies
86+
* Install `mamba <https://mamba.readthedocs.io/en/latest/installation.html>`_
87+
* Make sure your mamba is up to date (``mamba update mamba``)
10688

10789
.. code-block:: none
10890
10991
# Create and activate the build environment
11092
mamba env create --file environment.yml
11193
mamba activate pandas-dev
11294
113-
# Build and install pandas
114-
python setup.py build_ext -j 4
115-
python -m pip install -e . --no-build-isolation --no-use-pep517
116-
117-
At this point you should be able to import pandas from your locally built version::
118-
119-
$ python
120-
>>> import pandas
121-
>>> print(pandas.__version__) # note: the exact output may differ
122-
1.5.0.dev0+1355.ge65a30e3eb.dirty
123-
124-
This will create the new environment, and not touch any of your existing environments,
125-
nor any existing Python installation.
126-
127-
To return to your root environment::
128-
129-
mamba deactivate
130-
131-
Option 1b: using pip
132-
~~~~~~~~~~~~~~~~~~~~
95+
Option 2: using pip
96+
~~~~~~~~~~~~~~~~~~~
13397

134-
If you aren't using mamba for your development environment, follow these instructions.
13598
You'll need to have at least the :ref:`minimum Python version <install.version>` that pandas supports.
13699
You also need to have ``setuptools`` 51.0.0 or later to build pandas.
137100

@@ -150,10 +113,6 @@ You also need to have ``setuptools`` 51.0.0 or later to build pandas.
150113
# Install the build dependencies
151114
python -m pip install -r requirements-dev.txt
152115
153-
# Build and install pandas
154-
python setup.py build_ext -j 4
155-
python -m pip install -e . --no-build-isolation --no-use-pep517
156-
157116
**Unix**/**macOS with pyenv**
158117

159118
Consult the docs for setting up pyenv `here <https://github.com/pyenv/pyenv>`__.
@@ -162,7 +121,6 @@ Consult the docs for setting up pyenv `here <https://github.com/pyenv/pyenv>`__.
162121
163122
# Create a virtual environment
164123
# Use an ENV_DIR of your choice. We'll use ~/Users/<yourname>/.pyenv/versions/pandas-dev
165-
166124
pyenv virtualenv <version> <name-to-give-it>
167125
168126
# For instance:
@@ -174,19 +132,15 @@ Consult the docs for setting up pyenv `here <https://github.com/pyenv/pyenv>`__.
174132
# Now install the build dependencies in the cloned pandas repo
175133
python -m pip install -r requirements-dev.txt
176134
177-
# Build and install pandas
178-
python setup.py build_ext -j 4
179-
python -m pip install -e . --no-build-isolation --no-use-pep517
180-
181135
**Windows**
182136

183137
Below is a brief overview on how to set-up a virtual environment with Powershell
184138
under Windows. For details please refer to the
185139
`official virtualenv user guide <https://virtualenv.pypa.io/en/latest/user_guide.html#activators>`__.
186140

187-
Use an ENV_DIR of your choice. We'll use ~\\virtualenvs\\pandas-dev where
188-
'~' is the folder pointed to by either $env:USERPROFILE (Powershell) or
189-
%USERPROFILE% (cmd.exe) environment variable. Any parent directories
141+
Use an ENV_DIR of your choice. We'll use ``~\\virtualenvs\\pandas-dev`` where
142+
``~`` is the folder pointed to by either ``$env:USERPROFILE`` (Powershell) or
143+
``%USERPROFILE%`` (cmd.exe) environment variable. Any parent directories
190144
should already exist.
191145

192146
.. code-block:: powershell
@@ -200,16 +154,10 @@ should already exist.
200154
# Install the build dependencies
201155
python -m pip install -r requirements-dev.txt
202156
203-
# Build and install pandas
204-
python setup.py build_ext -j 4
205-
python -m pip install -e . --no-build-isolation --no-use-pep517
206-
207-
Option 2: creating an environment using Docker
208-
----------------------------------------------
157+
Option 3: using Docker
158+
~~~~~~~~~~~~~~~~~~~~~~
209159

210-
Instead of manually setting up a development environment, you can use `Docker
211-
<https://docs.docker.com/get-docker/>`_ to automatically create the environment with just several
212-
commands. pandas provides a ``DockerFile`` in the root directory to build a Docker image
160+
pandas provides a ``DockerFile`` in the root directory to build a Docker image
213161
with a full pandas development environment.
214162

215163
**Docker Commands**
@@ -226,13 +174,6 @@ Run Container::
226174
# but if not alter ${PWD} to match your local repo path
227175
docker run -it --rm -v ${PWD}:/home/pandas pandas-dev
228176

229-
When inside the running container you can build and install pandas the same way as the other methods
230-
231-
.. code-block:: bash
232-
233-
python setup.py build_ext -j 4
234-
python -m pip install -e . --no-build-isolation --no-use-pep517
235-
236177
*Even easier, you can integrate Docker with the following IDEs:*
237178

238179
**Visual Studio Code**
@@ -246,3 +187,26 @@ See https://code.visualstudio.com/docs/remote/containers for details.
246187
Enable Docker support and use the Services tool window to build and manage images as well as
247188
run and interact with containers.
248189
See https://www.jetbrains.com/help/pycharm/docker.html for details.
190+
191+
Step 3: build and install pandas
192+
--------------------------------
193+
194+
You can now run::
195+
196+
# Build and install pandas
197+
python setup.py build_ext -j 4
198+
python -m pip install -e . --no-build-isolation --no-use-pep517
199+
200+
At this point you should be able to import pandas from your locally built version::
201+
202+
$ python
203+
>>> import pandas
204+
>>> print(pandas.__version__) # note: the exact output may differ
205+
2.0.0.dev0+880.g2b9e661fbb.dirty
206+
207+
This will create the new environment, and not touch any of your existing environments,
208+
nor any existing Python installation.
209+
210+
.. note::
211+
You will need to repeat this step each time the C extensions change, for example
212+
if you modified any file in ``pandas/_libs`` or if you did a fetch and merge from ``upstream/main``.

doc/source/user_guide/io.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -471,7 +471,9 @@ Setting ``use_nullable_dtypes=True`` will result in nullable dtypes for every co
471471
3,4.5,False,b,6,7.5,True,a,12-31-2019,
472472
"""
473473
474-
pd.read_csv(StringIO(data), use_nullable_dtypes=True, parse_dates=["i"])
474+
df = pd.read_csv(StringIO(data), use_nullable_dtypes=True, parse_dates=["i"])
475+
df
476+
df.dtypes
475477
476478
.. _io.categorical:
477479

doc/source/whatsnew/v1.5.3.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ Bug fixes
3737

3838
Other
3939
~~~~~
40+
- Reverted deprecation (:issue:`45324`) of behavior of :meth:`Series.__getitem__` and :meth:`Series.__setitem__` slicing with an integer :class:`Index`; this will remain positional (:issue:`49612`)
4041
-
4142

4243
.. ---------------------------------------------------------------------------

doc/source/whatsnew/v2.0.0.rst

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ The ``use_nullable_dtypes`` keyword argument has been expanded to the following
4444
Additionally a new global configuration, ``mode.nullable_backend`` can now be used in conjunction with the parameter ``use_nullable_dtypes=True`` in the following functions
4545
to select the nullable dtypes implementation.
4646

47-
* :func:`read_csv` (with ``engine="pyarrow"``)
47+
* :func:`read_csv` (with ``engine="pyarrow"`` or ``engine="python"``)
4848
* :func:`read_excel`
4949
* :func:`read_parquet`
5050
* :func:`read_orc`
@@ -738,6 +738,7 @@ Performance improvements
738738
- Performance improvement in :meth:`MultiIndex.isin` when ``level=None`` (:issue:`48622`, :issue:`49577`)
739739
- Performance improvement in :meth:`MultiIndex.putmask` (:issue:`49830`)
740740
- Performance improvement in :meth:`Index.union` and :meth:`MultiIndex.union` when index contains duplicates (:issue:`48900`)
741+
- Performance improvement in :meth:`Series.rank` for pyarrow-backed dtypes (:issue:`50264`)
741742
- Performance improvement in :meth:`Series.fillna` for extension array dtypes (:issue:`49722`, :issue:`50078`)
742743
- Performance improvement for :meth:`Series.value_counts` with nullable dtype (:issue:`48338`)
743744
- Performance improvement for :class:`Series` constructor passing integer numpy array with nullable dtype (:issue:`48338`)
@@ -750,6 +751,8 @@ Performance improvements
750751
- Reduce memory usage of :meth:`DataFrame.to_pickle`/:meth:`Series.to_pickle` when using BZ2 or LZMA (:issue:`49068`)
751752
- Performance improvement for :class:`~arrays.StringArray` constructor passing a numpy array with type ``np.str_`` (:issue:`49109`)
752753
- Performance improvement in :meth:`~arrays.ArrowExtensionArray.factorize` (:issue:`49177`)
754+
- Performance improvement in :meth:`~arrays.ArrowExtensionArray.__setitem__` when key is a null slice (:issue:`50248`)
755+
- Performance improvement in :meth:`~arrays.ArrowExtensionArray.to_numpy` (:issue:`49973`)
753756
- Performance improvement in :meth:`DataFrame.join` when joining on a subset of a :class:`MultiIndex` (:issue:`48611`)
754757
- Performance improvement for :meth:`MultiIndex.intersection` (:issue:`48604`)
755758
- Performance improvement in ``var`` for nullable dtypes (:issue:`48379`).
@@ -785,6 +788,7 @@ Datetimelike
785788
- Bug in ``pandas.tseries.holiday.Holiday`` where a half-open date interval causes inconsistent return types from :meth:`USFederalHolidayCalendar.holidays` (:issue:`49075`)
786789
- Bug in rendering :class:`DatetimeIndex` and :class:`Series` and :class:`DataFrame` with timezone-aware dtypes with ``dateutil`` or ``zoneinfo`` timezones near daylight-savings transitions (:issue:`49684`)
787790
- Bug in :func:`to_datetime` was raising ``ValueError`` when parsing :class:`Timestamp`, ``datetime.datetime``, ``datetime.date``, or ``np.datetime64`` objects when non-ISO8601 ``format`` was passed (:issue:`49298`, :issue:`50036`)
791+
- Bug in :func:`to_datetime` was raising ``ValueError`` when parsing empty string and non-ISO8601 format was passed. Now, empty strings will be parsed as :class:`NaT`, for compatibility with how is done for ISO8601 formats (:issue:`50251`)
788792
- Bug in :class:`Timestamp` was showing ``UserWarning``, which was not actionable by users, when parsing non-ISO8601 delimited date strings (:issue:`50232`)
789793
-
790794

@@ -831,6 +835,7 @@ Interval
831835

832836
Indexing
833837
^^^^^^^^
838+
- Bug in :meth:`DataFrame.__setitem__` raising when indexer is a :class:`DataFrame` with ``boolean`` dtype (:issue:`47125`)
834839
- Bug in :meth:`DataFrame.reindex` filling with wrong values when indexing columns and index for ``uint`` dtypes (:issue:`48184`)
835840
- Bug in :meth:`DataFrame.loc` coercing dtypes when setting values with a list indexer (:issue:`49159`)
836841
- Bug in :meth:`DataFrame.loc` raising ``ValueError`` with ``bool`` indexer and :class:`MultiIndex` (:issue:`47687`)
@@ -870,6 +875,7 @@ I/O
870875
- Bug in :func:`read_sas` caused fragmentation of :class:`DataFrame` and raised :class:`.errors.PerformanceWarning` (:issue:`48595`)
871876
- Improved error message in :func:`read_excel` by including the offending sheet name when an exception is raised while reading a file (:issue:`48706`)
872877
- Bug when a pickling a subset PyArrow-backed data that would serialize the entire data instead of the subset (:issue:`42600`)
878+
- Bug in :func:`read_sql_query` ignoring ``dtype`` argument when ``chunksize`` is specified and result is empty (:issue:`50245`)
873879
- Bug in :func:`read_csv` for a single-line csv with fewer columns than ``names`` raised :class:`.errors.ParserError` with ``engine="c"`` (:issue:`47566`)
874880
- Bug in displaying ``string`` dtypes not showing storage option (:issue:`50099`)
875881
- Bug in :func:`DataFrame.to_string` with ``header=False`` that printed the index name on the same line as the first row of the data (:issue:`49230`)
@@ -906,6 +912,7 @@ Reshaping
906912
^^^^^^^^^
907913
- Bug in :meth:`DataFrame.pivot_table` raising ``TypeError`` for nullable dtype and ``margins=True`` (:issue:`48681`)
908914
- Bug in :meth:`DataFrame.unstack` and :meth:`Series.unstack` unstacking wrong level of :class:`MultiIndex` when :class:`MultiIndex` has mixed names (:issue:`48763`)
915+
- Bug in :meth:`DataFrame.melt` losing extension array dtype (:issue:`41570`)
909916
- Bug in :meth:`DataFrame.pivot` not respecting ``None`` as column name (:issue:`48293`)
910917
- Bug in :func:`join` when ``left_on`` or ``right_on`` is or includes a :class:`CategoricalIndex` incorrectly raising ``AttributeError`` (:issue:`48464`)
911918
- Bug in :meth:`DataFrame.pivot_table` raising ``ValueError`` with parameter ``margins=True`` when result is an empty :class:`DataFrame` (:issue:`49240`)

0 commit comments

Comments
 (0)