Skip to content

Commit 3e9b539

Browse files
committed
Merge remote-tracking branch 'upstream/main' into deps/mpl/37
2 parents d4443cc + 2070bb8 commit 3e9b539

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

78 files changed

+2241
-708
lines changed

.pre-commit-config.yaml

Lines changed: 12 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -15,15 +15,22 @@ default_stages: [
1515
ci:
1616
autofix_prs: false
1717
repos:
18+
- repo: local
19+
hooks:
20+
# NOTE: we make `black` a local hook because if it's installed from
21+
# PyPI (rather than from source) then it'll run twice as fast thanks to mypyc
22+
- id: black
23+
name: black
24+
description: "Black: The uncompromising Python code formatter"
25+
entry: black
26+
language: python
27+
require_serial: true
28+
types_or: [python, pyi]
29+
additional_dependencies: [black==23.1.0]
1830
- repo: https://github.com/charliermarsh/ruff-pre-commit
1931
rev: v0.0.244
2032
hooks:
2133
- id: ruff
22-
- repo: https://github.com/MarcoGorelli/absolufy-imports
23-
rev: v0.3.1
24-
hooks:
25-
- id: absolufy-imports
26-
files: ^pandas/
2734
- repo: https://github.com/jendrikseipp/vulture
2835
rev: 'v2.7'
2936
hooks:
@@ -116,16 +123,6 @@ repos:
116123
- id: sphinx-lint
117124
- repo: local
118125
hooks:
119-
# NOTE: we make `black` a local hook because if it's installed from
120-
# PyPI (rather than from source) then it'll run twice as fast thanks to mypyc
121-
- id: black
122-
name: black
123-
description: "Black: The uncompromising Python code formatter"
124-
entry: black
125-
language: python
126-
require_serial: true
127-
types_or: [python, pyi]
128-
additional_dependencies: [black==23.1.0]
129126
- id: pyright
130127
# note: assumes python env is setup and activated
131128
name: pyright

ci/code_checks.sh

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
9191
pandas.Series.size \
9292
pandas.Series.T \
9393
pandas.Series.hasnans \
94-
pandas.Series.to_timestamp \
9594
pandas.Series.to_list \
9695
pandas.Series.__iter__ \
9796
pandas.Series.keys \
@@ -218,7 +217,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
218217
pandas.Period.year \
219218
pandas.Period.asfreq \
220219
pandas.Period.now \
221-
pandas.Period.to_timestamp \
222220
pandas.arrays.PeriodArray \
223221
pandas.Interval.closed \
224222
pandas.Interval.left \
@@ -562,7 +560,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
562560
pandas.DataFrame.swapaxes \
563561
pandas.DataFrame.first_valid_index \
564562
pandas.DataFrame.last_valid_index \
565-
pandas.DataFrame.to_timestamp \
566563
pandas.DataFrame.attrs \
567564
pandas.DataFrame.plot \
568565
pandas.DataFrame.sparse.density \
@@ -576,7 +573,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
576573
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=EX02 --ignore_functions \
577574
pandas.DataFrame.plot.line \
578575
pandas.Series.plot.line \
579-
pandas.Timestamp.fromtimestamp \
580576
pandas.api.types.infer_dtype \
581577
pandas.api.types.is_datetime64_any_dtype \
582578
pandas.api.types.is_datetime64_ns_dtype \
@@ -590,15 +586,11 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
590586
pandas.api.types.is_timedelta64_dtype \
591587
pandas.api.types.is_timedelta64_ns_dtype \
592588
pandas.api.types.is_unsigned_integer_dtype \
593-
pandas.core.groupby.DataFrameGroupBy.take \
594-
pandas.core.groupby.SeriesGroupBy.take \
595589
pandas.io.formats.style.Styler.concat \
596590
pandas.io.formats.style.Styler.export \
597591
pandas.io.formats.style.Styler.set_td_classes \
598592
pandas.io.formats.style.Styler.use \
599593
pandas.io.json.build_table_schema \
600-
pandas.merge_ordered \
601-
pandas.option_context \
602594
pandas.plotting.andrews_curves \
603595
pandas.plotting.autocorrelation_plot \
604596
pandas.plotting.lag_plot \

doc/source/development/contributing.rst

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -331,6 +331,26 @@ To automatically fix formatting errors on each commit you make, you can
331331
set up pre-commit yourself. First, create a Python :ref:`environment
332332
<contributing_environment>` and then set up :ref:`pre-commit <contributing.pre-commit>`.
333333

334+
.. _contributing.update-dev:
335+
336+
Updating the development environment
337+
------------------------------------
338+
339+
After updating your branch to merge in main from upstream, you may need to update
340+
your development environment to reflect any changes to the various packages that
341+
are used during development.
342+
343+
If using :ref:`mamba <contributing.mamba>`, do::
344+
345+
mamba deactivate
346+
mamba env update -f environment.yml
347+
mamba activate pandas-dev
348+
349+
If using :ref:`pip <contributing.pip>` , do::
350+
351+
# activate the virtual environment based on your platform
352+
pythom -m pip install --upgrade -r requirements-dev.txt
353+
334354
Tips for a successful pull request
335355
==================================
336356

doc/source/development/contributing_codebase.rst

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,12 @@ without needing to have done ``pre-commit install`` beforehand.
8989
you may run into issues if you're using conda. To solve this, you can downgrade
9090
``virtualenv`` to version ``20.0.33``.
9191

92+
.. note::
93+
94+
If you have recently merged in main from the upstream branch, some of the
95+
dependencies used by ``pre-commit`` may have changed. Make sure to
96+
:ref:`update your development environment <contributing.update-dev>`.
97+
9298
Optional dependencies
9399
---------------------
94100

@@ -266,17 +272,21 @@ This module will ultimately house types for repeatedly used concepts like "path-
266272
Validating type hints
267273
~~~~~~~~~~~~~~~~~~~~~
268274

269-
pandas uses `mypy <http://mypy-lang.org>`_ and `pyright <https://github.com/microsoft/pyright>`_ to statically analyze the code base and type hints. After making any change you can ensure your type hints are correct by running
275+
pandas uses `mypy <http://mypy-lang.org>`_ and `pyright <https://github.com/microsoft/pyright>`_ to statically analyze the code base and type hints. After making any change you can ensure your type hints are consistent by running
270276

271277
.. code-block:: shell
272278
279+
pre-commit run --hook-stage manual --all-files mypy
280+
pre-commit run --hook-stage manual --all-files pyright
281+
pre-commit run --hook-stage manual --all-files pyright_reportGeneralTypeIssues
273282
# the following might fail if the installed pandas version does not correspond to your local git version
274-
pre-commit run --hook-stage manual --all-files
283+
pre-commit run --hook-stage manual --all-files stubtest
275284
276-
# if the above fails due to stubtest
277-
SKIP=stubtest pre-commit run --hook-stage manual --all-files
285+
in your python environment.
286+
287+
.. warning::
278288

279-
in your activated python environment. A recent version of ``numpy`` (>=1.22.0) is required for type validation.
289+
* Please be aware that the above commands will use the current python environment. If your python packages are older/newer than those installed by the pandas CI, the above commands might fail. This is often the case when the ``mypy`` or ``numpy`` versions do not match. Please see :ref:`how to setup the python environment <contributing.mamba>` or select a `recently succeeded workflow <https://github.com/pandas-dev/pandas/actions/workflows/code-checks.yml?query=branch%3Amain+is%3Asuccess>`_, select the "Docstring validation, typing, and other manual pre-commit hooks" job, then click on "Set up Conda" and "Environment info" to see which versions the pandas CI installs.
280290

281291
.. _contributing.ci:
282292

doc/source/development/contributing_environment.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,8 @@ Option 1: using mamba (recommended)
9595
mamba env create --file environment.yml
9696
mamba activate pandas-dev
9797
98+
.. _contributing.pip:
99+
98100
Option 2: using pip
99101
~~~~~~~~~~~~~~~~~~~
100102

doc/source/reference/arrays.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,8 @@ values.
113113

114114
ArrowDtype
115115

116+
For more information, please see the :ref:`PyArrow user guide <pyarrow>`
117+
116118
.. _api.arrays.datetime:
117119

118120
Datetimes

doc/source/user_guide/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ Guides
6464
dsintro
6565
basics
6666
io
67+
pyarrow
6768
indexing
6869
advanced
6970
merging

doc/source/user_guide/io.rst

Lines changed: 22 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -290,6 +290,16 @@ date_parser : function, default ``None``
290290
values from the columns defined by parse_dates into a single array and pass
291291
that; and 3) call date_parser once for each row using one or more strings
292292
(corresponding to the columns defined by parse_dates) as arguments.
293+
294+
.. deprecated:: 2.0.0
295+
Use ``date_format`` instead, or read in as ``object`` and then apply
296+
:func:`to_datetime` as-needed.
297+
date_format : str or dict of column -> format, default ``None``
298+
If used in conjunction with ``parse_dates``, will parse dates according to this
299+
format. For anything more complex,
300+
please read in as ``object`` and then apply :func:`to_datetime` as-needed.
301+
302+
.. versionadded:: 2.0.0
293303
dayfirst : boolean, default ``False``
294304
DD/MM format dates, international and European format.
295305
cache_dates : boolean, default True
@@ -800,7 +810,7 @@ Specifying date columns
800810
+++++++++++++++++++++++
801811

802812
To better facilitate working with datetime data, :func:`read_csv`
803-
uses the keyword arguments ``parse_dates`` and ``date_parser``
813+
uses the keyword arguments ``parse_dates`` and ``date_format``
804814
to allow users to specify a variety of columns and date/time formats to turn the
805815
input text data into ``datetime`` objects.
806816

@@ -898,33 +908,15 @@ data columns:
898908
Date parsing functions
899909
++++++++++++++++++++++
900910

901-
Finally, the parser allows you to specify a custom ``date_parser`` function to
902-
take full advantage of the flexibility of the date parsing API:
903-
904-
.. ipython:: python
905-
906-
df = pd.read_csv(
907-
"tmp.csv", header=None, parse_dates=date_spec, date_parser=pd.to_datetime
908-
)
909-
df
910-
911-
pandas will try to call the ``date_parser`` function in three different ways. If
912-
an exception is raised, the next one is tried:
913-
914-
1. ``date_parser`` is first called with one or more arrays as arguments,
915-
as defined using ``parse_dates`` (e.g., ``date_parser(['2013', '2013'], ['1', '2'])``).
916-
917-
2. If #1 fails, ``date_parser`` is called with all the columns
918-
concatenated row-wise into a single array (e.g., ``date_parser(['2013 1', '2013 2'])``).
911+
Finally, the parser allows you to specify a custom ``date_format``.
912+
Performance-wise, you should try these methods of parsing dates in order:
919913

920-
Note that performance-wise, you should try these methods of parsing dates in order:
914+
1. If you know the format, use ``date_format``, e.g.:
915+
``date_format="%d/%m/%Y"`` or ``date_format={column_name: "%d/%m/%Y"}``.
921916

922-
1. If you know the format, use ``pd.to_datetime()``:
923-
``date_parser=lambda x: pd.to_datetime(x, format=...)``.
924-
925-
2. If you have a really non-standard format, use a custom ``date_parser`` function.
926-
For optimal performance, this should be vectorized, i.e., it should accept arrays
927-
as arguments.
917+
2. If you different formats for different columns, or want to pass any extra options (such
918+
as ``utc``) to ``to_datetime``, then you should read in your data as ``object`` dtype, and
919+
then use ``to_datetime``.
928920

929921

930922
.. ipython:: python
@@ -952,16 +944,13 @@ an object-dtype column with strings, even with ``parse_dates``.
952944
df = pd.read_csv(StringIO(content), parse_dates=["a"])
953945
df["a"]
954946
955-
To parse the mixed-timezone values as a datetime column, pass a partially-applied
956-
:func:`to_datetime` with ``utc=True`` as the ``date_parser``.
947+
To parse the mixed-timezone values as a datetime column, read in as ``object`` dtype and
948+
then call :func:`to_datetime` with ``utc=True``.
957949

958950
.. ipython:: python
959951
960-
df = pd.read_csv(
961-
StringIO(content),
962-
parse_dates=["a"],
963-
date_parser=lambda col: pd.to_datetime(col, utc=True),
964-
)
952+
df = pd.read_csv(StringIO(content))
953+
df["a"] = pd.to_datetime(df["a"], utc=True)
965954
df["a"]
966955
967956

0 commit comments

Comments
 (0)