From 92e23ba65d2eec3fb0b0e030ea85f7d96770b9f2 Mon Sep 17 00:00:00 2001 From: Patrick Hoefler Date: Wed, 9 Aug 2023 14:23:45 +0200 Subject: [PATCH 1/7] Add whatsnew for arrow --- doc/source/whatsnew/v2.1.0.rst | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/doc/source/whatsnew/v2.1.0.rst b/doc/source/whatsnew/v2.1.0.rst index 5cafaa5759a5b..86e89bef3a019 100644 --- a/doc/source/whatsnew/v2.1.0.rst +++ b/doc/source/whatsnew/v2.1.0.rst @@ -14,6 +14,26 @@ including other versions of pandas. Enhancements ~~~~~~~~~~~~ +.. _whatsnew_210.enhancements.pyarrow_dependency: + +PyArrow will become a required dependency with pandas 3.0 +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +[PyArrow](https://arrow.apache.org/docs/python/index.html) will become a required +dependency of pandas starting with pandas 3.0. This decision was made based on +[PDEP 12](https://pandas.pydata.org/pdeps/0010-required-pyarrow-dependency.html). + +This will enable more changes that are hugely beneficial to pandas users, including +but not limited to: + +- inferring strings as PyArrow backed strings by default enabling a significant + reduction of the memory footprint and huge performance improvements. +- inferring more complex dtypes with PyArrow by default, like ``Decimal``, ``lists``, + ``bytes``, ``structured data`` and more. +- Better interoperability with other libraries that depend on Apache Arrow. + +We are collecting feedback on this decision [here](https://github.com/pandas-dev/pandas/issues/54466). + .. _whatsnew_210.enhancements.reduction_extension_dtypes: DataFrame reductions preserve extension dtypes From afa2733375558234581888165a245854e6a45a2b Mon Sep 17 00:00:00 2001 From: Patrick Hoefler Date: Wed, 9 Aug 2023 22:14:16 +0200 Subject: [PATCH 2/7] Update --- doc/source/whatsnew/v2.1.0.rst | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/doc/source/whatsnew/v2.1.0.rst b/doc/source/whatsnew/v2.1.0.rst index 86e89bef3a019..a918767427d2c 100644 --- a/doc/source/whatsnew/v2.1.0.rst +++ b/doc/source/whatsnew/v2.1.0.rst @@ -34,6 +34,26 @@ but not limited to: We are collecting feedback on this decision [here](https://github.com/pandas-dev/pandas/issues/54466). +.. _whatsnew_210.enhancements.infer_strings: + +Avoid NumPy object dtype for strings by default +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Previously, all strings were stored in columns with NumPy object dtype. +This release introduces an option ``future.infer_string`` that infers all +strings as PyArrow backed strings with dtype ``pd.ArrowDtype(pa.string())`` instead. +This option only works if PyArrow is installed. PyArrow backed strings have a +significantly reduced memory footprint and provide a big performance improvement +compared to NumPy object. + +The option can be enabled with: + +``` +pd.options.future.infer_string = True +``` + +It is expected that this behavior will become the default with pandas 3.0. + .. _whatsnew_210.enhancements.reduction_extension_dtypes: DataFrame reductions preserve extension dtypes From 30c6d555ad15af9be6e70a0a57fa38d141d15d01 Mon Sep 17 00:00:00 2001 From: Patrick Hoefler <61934744+phofl@users.noreply.github.com> Date: Wed, 9 Aug 2023 22:28:15 +0200 Subject: [PATCH 3/7] Update doc/source/whatsnew/v2.1.0.rst Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> --- doc/source/whatsnew/v2.1.0.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/whatsnew/v2.1.0.rst b/doc/source/whatsnew/v2.1.0.rst index a918767427d2c..50508df392d12 100644 --- a/doc/source/whatsnew/v2.1.0.rst +++ b/doc/source/whatsnew/v2.1.0.rst @@ -19,7 +19,7 @@ Enhancements PyArrow will become a required dependency with pandas 3.0 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -[PyArrow](https://arrow.apache.org/docs/python/index.html) will become a required +`PyArrow `_ will become a required dependency of pandas starting with pandas 3.0. This decision was made based on [PDEP 12](https://pandas.pydata.org/pdeps/0010-required-pyarrow-dependency.html). From da9347fc3258148ca0f621442fe5503deabe57ab Mon Sep 17 00:00:00 2001 From: Patrick Hoefler <61934744+phofl@users.noreply.github.com> Date: Wed, 9 Aug 2023 22:31:41 +0200 Subject: [PATCH 4/7] Update doc/source/whatsnew/v2.1.0.rst Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> --- doc/source/whatsnew/v2.1.0.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/source/whatsnew/v2.1.0.rst b/doc/source/whatsnew/v2.1.0.rst index 50508df392d12..5202282f9d08d 100644 --- a/doc/source/whatsnew/v2.1.0.rst +++ b/doc/source/whatsnew/v2.1.0.rst @@ -48,9 +48,9 @@ compared to NumPy object. The option can be enabled with: -``` -pd.options.future.infer_string = True -``` +.. code-block:: python + + pd.options.future.infer_string = True It is expected that this behavior will become the default with pandas 3.0. From 830398ae4431c4d62540bc0858a5645cb4bb6e59 Mon Sep 17 00:00:00 2001 From: Patrick Hoefler <61934744+phofl@users.noreply.github.com> Date: Wed, 9 Aug 2023 22:31:46 +0200 Subject: [PATCH 5/7] Update doc/source/whatsnew/v2.1.0.rst Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> --- doc/source/whatsnew/v2.1.0.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/whatsnew/v2.1.0.rst b/doc/source/whatsnew/v2.1.0.rst index 5202282f9d08d..97e8776be03d0 100644 --- a/doc/source/whatsnew/v2.1.0.rst +++ b/doc/source/whatsnew/v2.1.0.rst @@ -52,7 +52,7 @@ The option can be enabled with: pd.options.future.infer_string = True -It is expected that this behavior will become the default with pandas 3.0. +This behavior will become the default with pandas 3.0. .. _whatsnew_210.enhancements.reduction_extension_dtypes: From fc54640232a46b439910b086a4b997e1c481aaa8 Mon Sep 17 00:00:00 2001 From: Patrick Hoefler <61934744+phofl@users.noreply.github.com> Date: Wed, 9 Aug 2023 22:32:46 +0200 Subject: [PATCH 6/7] Update doc/source/whatsnew/v2.1.0.rst Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> --- doc/source/whatsnew/v2.1.0.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/whatsnew/v2.1.0.rst b/doc/source/whatsnew/v2.1.0.rst index 97e8776be03d0..0b3607412638c 100644 --- a/doc/source/whatsnew/v2.1.0.rst +++ b/doc/source/whatsnew/v2.1.0.rst @@ -21,7 +21,7 @@ PyArrow will become a required dependency with pandas 3.0 `PyArrow `_ will become a required dependency of pandas starting with pandas 3.0. This decision was made based on -[PDEP 12](https://pandas.pydata.org/pdeps/0010-required-pyarrow-dependency.html). +`PDEP 12 `_. This will enable more changes that are hugely beneficial to pandas users, including but not limited to: From b650d2c0b60ddf11af71aa6e444860963f2d74b1 Mon Sep 17 00:00:00 2001 From: Patrick Hoefler <61934744+phofl@users.noreply.github.com> Date: Wed, 9 Aug 2023 22:33:17 +0200 Subject: [PATCH 7/7] Update doc/source/whatsnew/v2.1.0.rst Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> --- doc/source/whatsnew/v2.1.0.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/whatsnew/v2.1.0.rst b/doc/source/whatsnew/v2.1.0.rst index 0b3607412638c..50c04815ecfb2 100644 --- a/doc/source/whatsnew/v2.1.0.rst +++ b/doc/source/whatsnew/v2.1.0.rst @@ -32,7 +32,7 @@ but not limited to: ``bytes``, ``structured data`` and more. - Better interoperability with other libraries that depend on Apache Arrow. -We are collecting feedback on this decision [here](https://github.com/pandas-dev/pandas/issues/54466). +We are collecting feedback on this decision `here `_. .. _whatsnew_210.enhancements.infer_strings: