Skip to content

Commit 00a474d

Browse files
committed
Merge branch 'master' of https://github.com/pandas-dev/pandas into add-nrows-to-read-json
2 parents 3b139b3 + cb35d8a commit 00a474d

File tree

349 files changed

+12360
-8523
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

349 files changed

+12360
-8523
lines changed

.travis.yml

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ cache:
1414

1515
env:
1616
global:
17+
# Variable for test workers
18+
- PYTEST_WORKERS="auto"
1719
# create a github personal access token
1820
# cd pandas-dev/pandas
1921
# travis encrypt 'PANDAS_GH_TOKEN=personal_access_token' -r pandas-dev/pandas
@@ -27,12 +29,21 @@ matrix:
2729
fast_finish: true
2830

2931
include:
32+
# In allowed failures
33+
- dist: bionic
34+
python: 3.9-dev
35+
env:
36+
- JOB="3.9-dev" PATTERN="(not slow and not network and not clipboard)"
3037
- env:
3138
- JOB="3.8" ENV_FILE="ci/deps/travis-38.yaml" PATTERN="(not slow and not network and not clipboard)"
3239

3340
- env:
3441
- JOB="3.7" ENV_FILE="ci/deps/travis-37.yaml" PATTERN="(not slow and not network and not clipboard)"
3542

43+
- arch: arm64
44+
env:
45+
- JOB="3.7, arm64" PYTEST_WORKERS=8 ENV_FILE="ci/deps/travis-37-arm64.yaml" PATTERN="(not slow and not network and not clipboard)"
46+
3647
- env:
3748
- JOB="3.6, locale" ENV_FILE="ci/deps/travis-36-locale.yaml" PATTERN="((not slow and not network and not clipboard) or (single and db))" LOCALE_OVERRIDE="zh_CN.UTF-8" SQL="1"
3849
services:
@@ -53,11 +64,18 @@ matrix:
5364
services:
5465
- mysql
5566
- postgresql
67+
allow_failures:
68+
- arch: arm64
69+
env:
70+
- JOB="3.7, arm64" PYTEST_WORKERS=8 ENV_FILE="ci/deps/travis-37-arm64.yaml" PATTERN="(not slow and not network and not clipboard)"
71+
- dist: bionic
72+
python: 3.9-dev
73+
env:
74+
- JOB="3.9-dev" PATTERN="(not slow and not network)"
5675

5776
before_install:
5877
- echo "before_install"
59-
# set non-blocking IO on travis
60-
# https://github.com/travis-ci/travis-ci/issues/8920#issuecomment-352661024
78+
# Use blocking IO on travis. Ref: https://github.com/travis-ci/travis-ci/issues/8920#issuecomment-352661024
6179
- python -c 'import os,sys,fcntl; flags = fcntl.fcntl(sys.stdout, fcntl.F_GETFL); fcntl.fcntl(sys.stdout, fcntl.F_SETFL, flags&~os.O_NONBLOCK);'
6280
- source ci/travis_process_gbq_encryption.sh
6381
- export PATH="$HOME/miniconda3/bin:$PATH"
@@ -83,7 +101,7 @@ install:
83101
script:
84102
- echo "script start"
85103
- echo "$JOB"
86-
- source activate pandas-dev
104+
- if [ "$JOB" != "3.9-dev" ]; then source activate pandas-dev; fi
87105
- ci/run_tests.sh
88106

89107
after_script:

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
[![Downloads](https://anaconda.org/conda-forge/pandas/badges/downloads.svg)](https://pandas.pydata.org)
1717
[![Gitter](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/pydata/pandas)
1818
[![Powered by NumFOCUS](https://img.shields.io/badge/powered%20by-NumFOCUS-orange.svg?style=flat&colorA=E1523D&colorB=007D8A)](https://numfocus.org)
19+
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
1920

2021
## What is it?
2122

asv_bench/benchmarks/algorithms.py

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,16 @@ class Factorize:
3434
params = [
3535
[True, False],
3636
[True, False],
37-
["int", "uint", "float", "string", "datetime64[ns]", "datetime64[ns, tz]"],
37+
[
38+
"int",
39+
"uint",
40+
"float",
41+
"string",
42+
"datetime64[ns]",
43+
"datetime64[ns, tz]",
44+
"Int64",
45+
"boolean",
46+
],
3847
]
3948
param_names = ["unique", "sort", "dtype"]
4049

@@ -49,13 +58,15 @@ def setup(self, unique, sort, dtype):
4958
"datetime64[ns, tz]": pd.date_range(
5059
"2011-01-01", freq="H", periods=N, tz="Asia/Tokyo"
5160
),
61+
"Int64": pd.array(np.arange(N), dtype="Int64"),
62+
"boolean": pd.array(np.random.randint(0, 2, N), dtype="boolean"),
5263
}[dtype]
5364
if not unique:
5465
data = data.repeat(5)
55-
self.idx = data
66+
self.data = data
5667

5768
def time_factorize(self, unique, sort, dtype):
58-
self.idx.factorize(sort=sort)
69+
pd.factorize(self.data, sort=sort)
5970

6071

6172
class Duplicated:

asv_bench/benchmarks/arithmetic.py

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,59 @@ def time_frame_op_with_series_axis1(self, opname):
101101
getattr(operator, opname)(self.df, self.ser)
102102

103103

104+
class FrameWithFrameWide:
105+
# Many-columns, mixed dtypes
106+
107+
params = [
108+
[
109+
# GH#32779 has discussion of which operators are included here
110+
operator.add,
111+
operator.floordiv,
112+
operator.gt,
113+
]
114+
]
115+
param_names = ["op"]
116+
117+
def setup(self, op):
118+
# we choose dtypes so as to make the blocks
119+
# a) not perfectly match between right and left
120+
# b) appreciably bigger than single columns
121+
n_cols = 2000
122+
n_rows = 500
123+
124+
# construct dataframe with 2 blocks
125+
arr1 = np.random.randn(n_rows, int(n_cols / 2)).astype("f8")
126+
arr2 = np.random.randn(n_rows, int(n_cols / 2)).astype("f4")
127+
df = pd.concat(
128+
[pd.DataFrame(arr1), pd.DataFrame(arr2)], axis=1, ignore_index=True,
129+
)
130+
# should already be the case, but just to be sure
131+
df._consolidate_inplace()
132+
133+
# TODO: GH#33198 the setting here shoudlnt need two steps
134+
arr1 = np.random.randn(n_rows, int(n_cols / 4)).astype("f8")
135+
arr2 = np.random.randn(n_rows, int(n_cols / 2)).astype("i8")
136+
arr3 = np.random.randn(n_rows, int(n_cols / 4)).astype("f8")
137+
df2 = pd.concat(
138+
[pd.DataFrame(arr1), pd.DataFrame(arr2), pd.DataFrame(arr3)],
139+
axis=1,
140+
ignore_index=True,
141+
)
142+
# should already be the case, but just to be sure
143+
df2._consolidate_inplace()
144+
145+
self.left = df
146+
self.right = df2
147+
148+
def time_op_different_blocks(self, op):
149+
# blocks (and dtypes) are not aligned
150+
op(self.left, self.right)
151+
152+
def time_op_same_blocks(self, op):
153+
# blocks (and dtypes) are aligned
154+
op(self.left, self.left)
155+
156+
104157
class Ops:
105158

106159
params = [[True, False], ["default", 1]]

azure-pipelines.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@ trigger:
55
pr:
66
- master
77

8+
variables:
9+
PYTEST_WORKERS: auto
10+
811
jobs:
912
# Mac and Linux use the same template
1013
- template: ci/azure/posix.yml

ci/build39.sh

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
#!/bin/bash -e
2+
# Special build for python3.9 until numpy puts its own wheels up
3+
4+
sudo apt-get install build-essential gcc xvfb
5+
pip install --no-deps -U pip wheel setuptools
6+
pip install python-dateutil pytz pytest pytest-xdist hypothesis
7+
pip install cython --pre # https://github.com/cython/cython/issues/3395
8+
9+
git clone https://github.com/numpy/numpy
10+
cd numpy
11+
python setup.py build_ext --inplace
12+
python setup.py install
13+
cd ..
14+
rm -rf numpy
15+
16+
python setup.py build_ext -inplace
17+
python -m pip install --no-build-isolation -e .
18+
19+
python -c "import sys; print(sys.version_info)"
20+
python -c "import pandas as pd"
21+
python -c "import hypothesis"

ci/code_checks.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -353,8 +353,8 @@ fi
353353
### DOCSTRINGS ###
354354
if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
355355

356-
MSG='Validate docstrings (GL03, GL04, GL05, GL06, GL07, GL09, GL10, SS04, SS05, PR03, PR04, PR05, PR10, EX04, RT01, RT04, RT05, SA02, SA03, SA05)' ; echo $MSG
357-
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=GL03,GL04,GL05,GL06,GL07,GL09,GL10,SS04,SS05,PR03,PR04,PR05,PR10,EX04,RT01,RT04,RT05,SA02,SA03,SA05
356+
MSG='Validate docstrings (GL03, GL04, GL05, GL06, GL07, GL09, GL10, SS04, SS05, PR03, PR04, PR05, PR10, EX04, RT01, RT04, RT05, SA02, SA03)' ; echo $MSG
357+
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=GL03,GL04,GL05,GL06,GL07,GL09,GL10,SS04,SS05,PR03,PR04,PR05,PR10,EX04,RT01,RT04,RT05,SA02,SA03
358358
RET=$(($RET + $?)) ; echo $MSG "DONE"
359359

360360
MSG='Validate correct capitalization among titles in documentation' ; echo $MSG

ci/deps/azure-37-numpydev.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@ dependencies:
1414
- pytz
1515
- pip
1616
- pip:
17-
- cython>=0.29.16
17+
- cython==0.29.16 # GH#34014
1818
- "git+git://github.com/dateutil/dateutil.git"
19-
- "-f https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com"
19+
- "--extra-index-url https://pypi.anaconda.org/scipy-wheels-nightly/simple"
2020
- "--pre"
2121
- "numpy"
2222
- "scipy"

ci/deps/travis-37-arm64.yaml

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
name: pandas-dev
2+
channels:
3+
- defaults
4+
- conda-forge
5+
dependencies:
6+
- python=3.7.*
7+
8+
# tools
9+
- cython>=0.29.13
10+
- pytest>=5.0.1
11+
- pytest-xdist>=1.21
12+
- hypothesis>=3.58.0
13+
14+
# pandas dependencies
15+
- botocore>=1.11
16+
- numpy
17+
- python-dateutil
18+
- pytz
19+
- pip
20+
- pip:
21+
- moto

ci/run_tests.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ if [[ $(uname) == "Linux" && -z $DISPLAY ]]; then
2020
XVFB="xvfb-run "
2121
fi
2222

23-
PYTEST_CMD="${XVFB}pytest -m \"$PATTERN\" -n auto --dist=loadfile -s --strict --durations=10 --junitxml=test-data.xml $TEST_ARGS $COVERAGE pandas"
23+
PYTEST_CMD="${XVFB}pytest -m \"$PATTERN\" -n $PYTEST_WORKERS --dist=loadfile -s --strict --durations=30 --junitxml=test-data.xml $TEST_ARGS $COVERAGE pandas"
2424

2525
echo $PYTEST_CMD
2626
sh -c "$PYTEST_CMD"

ci/setup_env.sh

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
#!/bin/bash -e
22

3+
if [ "$JOB" == "3.9-dev" ]; then
4+
/bin/bash ci/build39.sh
5+
exit 0
6+
fi
7+
38
# edit the locale file if needed
49
if [[ "$(uname)" == "Linux" && -n "$LC_ALL" ]]; then
510
echo "Adding locale to the first line of pandas/__init__.py"
@@ -36,9 +41,17 @@ else
3641
exit 1
3742
fi
3843

39-
wget -q "https://repo.continuum.io/miniconda/Miniconda3-latest-$CONDA_OS.sh" -O miniconda.sh
44+
if [ "${TRAVIS_CPU_ARCH}" == "arm64" ]; then
45+
sudo apt-get -y install xvfb
46+
CONDA_URL="https://github.com/conda-forge/miniforge/releases/download/4.8.2-1/Miniforge3-4.8.2-1-Linux-aarch64.sh"
47+
else
48+
CONDA_URL="https://repo.continuum.io/miniconda/Miniconda3-latest-$CONDA_OS.sh"
49+
fi
50+
wget -q $CONDA_URL -O miniconda.sh
4051
chmod +x miniconda.sh
41-
./miniconda.sh -b
52+
53+
# Installation path is required for ARM64 platform as miniforge script installs in path $HOME/miniforge3.
54+
./miniconda.sh -b -p $MINICONDA_DIR
4255

4356
export PATH=$MINICONDA_DIR/bin:$PATH
4457

doc/source/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -410,7 +410,7 @@
410410
intersphinx_mapping = {
411411
"dateutil": ("https://dateutil.readthedocs.io/en/latest/", None),
412412
"matplotlib": ("https://matplotlib.org/", None),
413-
"numpy": ("https://docs.scipy.org/doc/numpy/", None),
413+
"numpy": ("https://numpy.org/doc/stable/", None),
414414
"pandas-gbq": ("https://pandas-gbq.readthedocs.io/en/latest/", None),
415415
"py": ("https://pylib.readthedocs.io/en/latest/", None),
416416
"python": ("https://docs.python.org/3/", None),

doc/source/development/contributing.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ version control to allow many people to work together on the project.
110110
Some great resources for learning Git:
111111

112112
* the `GitHub help pages <https://help.github.com/>`_.
113-
* the `NumPy's documentation <https://docs.scipy.org/doc/numpy/dev/index.html>`_.
113+
* the `NumPy's documentation <https://numpy.org/doc/stable/dev/index.html>`_.
114114
* Matthew Brett's `Pydagogue <https://matthew-brett.github.com/pydagogue/>`_.
115115

116116
Getting started with Git
@@ -974,7 +974,7 @@ it is worth getting in the habit of writing tests ahead of time so this is never
974974
Like many packages, pandas uses `pytest
975975
<https://docs.pytest.org/en/latest/>`_ and the convenient
976976
extensions in `numpy.testing
977-
<https://docs.scipy.org/doc/numpy/reference/routines.testing.html>`_.
977+
<https://numpy.org/doc/stable/reference/routines.testing.html>`_.
978978

979979
.. note::
980980

doc/source/development/extending.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -219,7 +219,7 @@ and re-boxes it if necessary.
219219

220220
If applicable, we highly recommend that you implement ``__array_ufunc__`` in your
221221
extension array to avoid coercion to an ndarray. See
222-
`the numpy documentation <https://docs.scipy.org/doc/numpy/reference/generated/numpy.lib.mixins.NDArrayOperatorsMixin.html>`__
222+
`the numpy documentation <https://numpy.org/doc/stable/reference/generated/numpy.lib.mixins.NDArrayOperatorsMixin.html>`__
223223
for an example.
224224

225225
As part of your implementation, we require that you defer to pandas when a pandas

doc/source/ecosystem.rst

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ substantial projects that you feel should be on this list, please let us know.
3030
Data cleaning and validation
3131
----------------------------
3232

33-
`pyjanitor <https://github.com/ericmjl/pyjanitor/>`__
33+
`Pyjanitor <https://github.com/ericmjl/pyjanitor/>`__
3434
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3535

3636
Pyjanitor provides a clean API for cleaning data, using method chaining.
@@ -115,7 +115,7 @@ It is very similar to the matplotlib plotting backend, but provides interactive
115115
web-based charts and maps.
116116

117117

118-
`seaborn <https://seaborn.pydata.org>`__
118+
`Seaborn <https://seaborn.pydata.org>`__
119119
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
120120

121121
Seaborn is a Python visualization library based on
@@ -136,7 +136,7 @@ provides a powerful, declarative and extremely general way to generate bespoke p
136136
Various implementations to other languages are available.
137137
A good implementation for Python users is `has2k1/plotnine <https://github.com/has2k1/plotnine/>`__.
138138

139-
`IPython Vega <https://github.com/vega/ipyvega>`__
139+
`IPython vega <https://github.com/vega/ipyvega>`__
140140
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
141141

142142
`IPython Vega <https://github.com/vega/ipyvega>`__ leverages `Vega
@@ -147,7 +147,7 @@ A good implementation for Python users is `has2k1/plotnine <https://github.com/h
147147

148148
`Plotly’s <https://plot.ly/>`__ `Python API <https://plot.ly/python/>`__ enables interactive figures and web shareability. Maps, 2D, 3D, and live-streaming graphs are rendered with WebGL and `D3.js <https://d3js.org/>`__. The library supports plotting directly from a pandas DataFrame and cloud-based collaboration. Users of `matplotlib, ggplot for Python, and Seaborn <https://plot.ly/python/matplotlib-to-plotly-tutorial/>`__ can convert figures into interactive web-based plots. Plots can be drawn in `IPython Notebooks <https://plot.ly/ipython-notebooks/>`__ , edited with R or MATLAB, modified in a GUI, or embedded in apps and dashboards. Plotly is free for unlimited sharing, and has `cloud <https://plot.ly/product/plans/>`__, `offline <https://plot.ly/python/offline/>`__, or `on-premise <https://plot.ly/product/enterprise/>`__ accounts for private use.
149149

150-
`QtPandas <https://github.com/draperjames/qtpandas>`__
150+
`Qtpandas <https://github.com/draperjames/qtpandas>`__
151151
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
152152

153153
Spun off from the main pandas library, the `qtpandas <https://github.com/draperjames/qtpandas>`__
@@ -187,7 +187,7 @@ See :ref:`Options and Settings <options>` and
187187
:ref:`Available Options <options.available>`
188188
for pandas ``display.`` settings.
189189

190-
`quantopian/qgrid <https://github.com/quantopian/qgrid>`__
190+
`Quantopian/qgrid <https://github.com/quantopian/qgrid>`__
191191
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
192192

193193
qgrid is "an interactive grid for sorting and filtering
@@ -249,12 +249,12 @@ The following data feeds are available:
249249
* Stooq Index Data
250250
* MOEX Data
251251

252-
`quandl/Python <https://github.com/quandl/Python>`__
252+
`Quandl/Python <https://github.com/quandl/Python>`__
253253
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
254254
Quandl API for Python wraps the Quandl REST API to return
255255
Pandas DataFrames with timeseries indexes.
256256

257-
`pydatastream <https://github.com/vfilimonov/pydatastream>`__
257+
`Pydatastream <https://github.com/vfilimonov/pydatastream>`__
258258
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
259259
PyDatastream is a Python interface to the
260260
`Refinitiv Datastream (DWS) <https://www.refinitiv.com/en/products/datastream-macroeconomic-analysis>`__
@@ -384,7 +384,7 @@ Pandas provides an interface for defining
384384
system. The following libraries implement that interface to provide types not
385385
found in NumPy or pandas, which work well with pandas' data containers.
386386

387-
`cyberpandas`_
387+
`Cyberpandas`_
388388
~~~~~~~~~~~~~~
389389

390390
Cyberpandas provides an extension type for storing arrays of IP Addresses. These
@@ -411,4 +411,4 @@ Library Accessor Classes Description
411411
.. _pdvega: https://altair-viz.github.io/pdvega/
412412
.. _Altair: https://altair-viz.github.io/
413413
.. _pandas_path: https://github.com/drivendataorg/pandas-path/
414-
.. _pathlib.Path: https://docs.python.org/3/library/pathlib.html
414+
.. _pathlib.Path: https://docs.python.org/3/library/pathlib.html

0 commit comments

Comments
 (0)