From 049d7c3673814af23365871d1eee9be12b2a20f2 Mon Sep 17 00:00:00 2001 From: Will Ayd Date: Wed, 26 Jun 2019 09:26:25 -0400 Subject: [PATCH 01/10] Updated contributing guide --- doc/source/development/contributing.rst | 103 ++++++++++++++++++++++++ 1 file changed, 103 insertions(+) diff --git a/doc/source/development/contributing.rst b/doc/source/development/contributing.rst index c9c76f307d93f..f5873e2516cd2 100644 --- a/doc/source/development/contributing.rst +++ b/doc/source/development/contributing.rst @@ -696,6 +696,109 @@ You'll also need to See :ref:`contributing.warnings` for more. +Type Hints +---------- + +*pandas* strongly encourages the use of :pep:`484` style type hints. New development should contain type hints and pull requests to annotate existing code is strongly encouraged. + +Syntax Requirements +~~~~~~~~~~~~~~~~~~~ + +Because *pandas* still supports Python 3.5, :pep:`526` does not apply and variables **must** be annotated with type comments. Specifically, this is a valid annotation within pandas: + +.. code-block:: python + + primes = [] # type: List[int] + +Whereas this is **NOT** allowed: + +.. code-block:: python + + primes: List[int] = [] + +Note that function signatures can always be annotated per :pep:`3107`: + +.. code-block:: python + + def sum_of_primes(primes: List[int] = []) -> int: # this is a valid annotation! + +Style Guidelines +~~~~~~~~~~~~~~~~ + +Types imports should follow the ``from typing import ...`` convention. So rather than either of + +.. code-block:: python + + import typing + + primes = [] # type: typing.List[int] + +You should write + +.. code-block:: python + + from typing import List + + primes = [] # type: List[int] + +``Optional`` should be used where applicable, so instead of + +.. code-block:: python + + from typing import Union + + maybe_primes = [] # type: Union[int, None] + +You should write + +.. code-block:: python + + from typing import Optional + + maybe_primes = [] # type: Optional[int] + +If a function accepts multiple arguments, every parameter should appear on a separate line. So rather than doing this: + +.. code-block:: python + + def some_func(a: str, b: float, c: Union[int, float]) -> float: + +The preferred style would be + +.. code-block:: python + + def some_func(a: str, + b: float, + c: Union[int, float] + ) -> float: + +When dealing with parameters with a default argument of ``None``, you should not use ``Optional`` as this will be inferred by the static type checker. So instead of: + +.. code-block:: python + + def maybe_upcase(value: Optional[str] = None) -> Optional[str]: + +You should write + +.. code-block:: python + + def maybe_upcase(value: str = None) -> Optional[str]: + +Pandas-specific Types +~~~~~~~~~~~~~~~~~~~~~ + +Commonly used types specific to *pandas* will appear in pandas._typing and you should use these where applicable. This module is private for now but ultimately this should be exposed to third party libraries who want to implement type checking against pandas. + +Validating Type Hints +~~~~~~~~~~~~~~~~~~~~~ + +*pandas* uses `mypy `_ to statically analyze the code base and type hints. After making any change you can ensure your type hints are correct by running + +.. code-block:: shell + + mypy pandas + +From the project root. .. _contributing.ci: From ea8e5606293bb23bb2b454d4fed71691b31dba7a Mon Sep 17 00:00:00 2001 From: Will Ayd Date: Wed, 26 Jun 2019 09:36:29 -0400 Subject: [PATCH 02/10] Rewording and cleanups --- doc/source/development/contributing.rst | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/doc/source/development/contributing.rst b/doc/source/development/contributing.rst index f5873e2516cd2..2f8d74db6498a 100644 --- a/doc/source/development/contributing.rst +++ b/doc/source/development/contributing.rst @@ -699,7 +699,7 @@ See :ref:`contributing.warnings` for more. Type Hints ---------- -*pandas* strongly encourages the use of :pep:`484` style type hints. New development should contain type hints and pull requests to annotate existing code is strongly encouraged. +*pandas* strongly encourages the use of :pep:`484` style type hints. New development should contain type hints and pull requests to annotate existing code are accepted as well! Syntax Requirements ~~~~~~~~~~~~~~~~~~~ @@ -714,18 +714,18 @@ Whereas this is **NOT** allowed: .. code-block:: python - primes: List[int] = [] + primes: List[int] = [] # not supported in Python 3.5! Note that function signatures can always be annotated per :pep:`3107`: .. code-block:: python - def sum_of_primes(primes: List[int] = []) -> int: # this is a valid annotation! + def sum_of_primes(primes: List[int] = []) -> int: Style Guidelines ~~~~~~~~~~~~~~~~ -Types imports should follow the ``from typing import ...`` convention. So rather than either of +Types imports should follow the ``from typing import ...`` convention. So rather than .. code-block:: python @@ -757,7 +757,7 @@ You should write maybe_primes = [] # type: Optional[int] -If a function accepts multiple arguments, every parameter should appear on a separate line. So rather than doing this: +If a function accepts multiple arguments, every parameter should appear on a separate line. So rather than .. code-block:: python @@ -770,7 +770,7 @@ The preferred style would be def some_func(a: str, b: float, c: Union[int, float] - ) -> float: + ) -> float: When dealing with parameters with a default argument of ``None``, you should not use ``Optional`` as this will be inferred by the static type checker. So instead of: @@ -798,8 +798,6 @@ Validating Type Hints mypy pandas -From the project root. - .. _contributing.ci: Testing With Continuous Integration From cc7cd4e5c90c2b76f47bb59cad48738bfb48a9fe Mon Sep 17 00:00:00 2001 From: Will Ayd Date: Wed, 26 Jun 2019 22:02:26 -0500 Subject: [PATCH 03/10] Review comments --- doc/source/development/contributing.rst | 33 +++++++++++-------------- 1 file changed, 14 insertions(+), 19 deletions(-) diff --git a/doc/source/development/contributing.rst b/doc/source/development/contributing.rst index 2f8d74db6498a..716178cf006b0 100644 --- a/doc/source/development/contributing.rst +++ b/doc/source/development/contributing.rst @@ -745,32 +745,17 @@ You should write .. code-block:: python - from typing import Union + from typing import List, Union - maybe_primes = [] # type: Union[int, None] + maybe_primes = [] # type: List[Union[int, None]] You should write .. code-block:: python - from typing import Optional + from typing import List, Optional - maybe_primes = [] # type: Optional[int] - -If a function accepts multiple arguments, every parameter should appear on a separate line. So rather than - -.. code-block:: python - - def some_func(a: str, b: float, c: Union[int, float]) -> float: - -The preferred style would be - -.. code-block:: python - - def some_func(a: str, - b: float, - c: Union[int, float] - ) -> float: + maybe_primes = [] # type: List[Optional[int]] When dealing with parameters with a default argument of ``None``, you should not use ``Optional`` as this will be inferred by the static type checker. So instead of: @@ -789,6 +774,16 @@ Pandas-specific Types Commonly used types specific to *pandas* will appear in pandas._typing and you should use these where applicable. This module is private for now but ultimately this should be exposed to third party libraries who want to implement type checking against pandas. +For example, quite a few functions in *pandas* accept a ``dtype`` argument. This can be expressed as a string like ``"object"``, a numpy.dtype like ``np.int64`` or even a pandas ``ExtensionDtype`` like ``pd.CategoricalDtype``. Rather than burden the user with having to constantly annotate all of those options, this can simply be imported and reused from the pandas._typing module + +.. code-block:: python + + from pandas._typing import Dtype + + def as_type(dtype: Dtype) -> ...: + +This module will ultimately house types for repeatedly used concepts like "path-like", "array-like", "numeric", etc... and can also hold aliases for commonly appearing parameters like `axis`. Development of this module is active so be sure to refer to the source for the most up to date list of available types. + Validating Type Hints ~~~~~~~~~~~~~~~~~~~~~ From 1cce3945a42e6f833a30cf75d546ae7d80f81688 Mon Sep 17 00:00:00 2001 From: Will Ayd Date: Wed, 26 Jun 2019 23:04:15 -0500 Subject: [PATCH 04/10] lint fixup --- doc/source/development/contributing.rst | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/doc/source/development/contributing.rst b/doc/source/development/contributing.rst index 716178cf006b0..45fcaa2ac35b6 100644 --- a/doc/source/development/contributing.rst +++ b/doc/source/development/contributing.rst @@ -708,6 +708,8 @@ Because *pandas* still supports Python 3.5, :pep:`526` does not apply and variab .. code-block:: python + from typing import List + primes = [] # type: List[int] Whereas this is **NOT** allowed: @@ -720,7 +722,8 @@ Note that function signatures can always be annotated per :pep:`3107`: .. code-block:: python - def sum_of_primes(primes: List[int] = []) -> int: + def sum_of_primes(primes: List[int] = []) -> int: + ... Style Guidelines ~~~~~~~~~~~~~~~~ @@ -761,13 +764,15 @@ When dealing with parameters with a default argument of ``None``, you should not .. code-block:: python - def maybe_upcase(value: Optional[str] = None) -> Optional[str]: + def maybe_upcase_wrong(value: Optional[str] = None) -> Optional[str]: + ... You should write .. code-block:: python - def maybe_upcase(value: str = None) -> Optional[str]: + def maybe_upcase_right(value: str = None) -> Optional[str]: + ... Pandas-specific Types ~~~~~~~~~~~~~~~~~~~~~ @@ -781,6 +786,7 @@ For example, quite a few functions in *pandas* accept a ``dtype`` argument. This from pandas._typing import Dtype def as_type(dtype: Dtype) -> ...: + ... This module will ultimately house types for repeatedly used concepts like "path-like", "array-like", "numeric", etc... and can also hold aliases for commonly appearing parameters like `axis`. Development of this module is active so be sure to refer to the source for the most up to date list of available types. From 0618be6640dfb5b82ea552e0c20d8a513ca4149c Mon Sep 17 00:00:00 2001 From: Will Ayd Date: Fri, 28 Jun 2019 10:56:34 -0500 Subject: [PATCH 05/10] Comments addressed --- doc/source/development/contributing.rst | 68 ++++++++++++++----------- 1 file changed, 37 insertions(+), 31 deletions(-) diff --git a/doc/source/development/contributing.rst b/doc/source/development/contributing.rst index 9c254a1ed2835..7ae63f0143a50 100644 --- a/doc/source/development/contributing.rst +++ b/doc/source/development/contributing.rst @@ -696,35 +696,13 @@ You'll also need to See :ref:`contributing.warnings` for more. +.. _contributing.type_hints: + Type Hints ---------- *pandas* strongly encourages the use of :pep:`484` style type hints. New development should contain type hints and pull requests to annotate existing code are accepted as well! -Syntax Requirements -~~~~~~~~~~~~~~~~~~~ - -Because *pandas* still supports Python 3.5, :pep:`526` does not apply and variables **must** be annotated with type comments. Specifically, this is a valid annotation within pandas: - -.. code-block:: python - - from typing import List - - primes = [] # type: List[int] - -Whereas this is **NOT** allowed: - -.. code-block:: python - - primes: List[int] = [] # not supported in Python 3.5! - -Note that function signatures can always be annotated per :pep:`3107`: - -.. code-block:: python - - def sum_of_primes(primes: List[int] = []) -> int: - ... - Style Guidelines ~~~~~~~~~~~~~~~~ @@ -760,26 +738,54 @@ You should write maybe_primes = [] # type: List[Optional[int]] -When dealing with parameters with a default argument of ``None``, you should not use ``Optional`` as this will be inferred by the static type checker. So instead of: +In some cases in the code base classes may defined class variables that shadow builtins. This causes an issue as described in `Mypy 1775 `_. The defensive solution here is to create an unambiguous alias of the builtin and use that without your annotation. For example, if you come across a definition like .. code-block:: python - def maybe_upcase_wrong(value: Optional[str] = None) -> Optional[str]: - ... + class SomeClass: + str = None -You should write +The appropriate way to annotate this would be as follows .. code-block:: python - def maybe_upcase_right(value: str = None) -> Optional[str]: + str_type = str + + class SomeClass: + str = None # type: str_type + + +Syntax Requirements +~~~~~~~~~~~~~~~~~~~ + +Because *pandas* still supports Python 3.5, :pep:`526` does not apply and variables **must** be annotated with type comments. Specifically, this is a valid annotation within pandas: + +.. code-block:: python + + from typing import List + + primes = [] # type: List[int] + +Whereas this is **NOT** allowed: + +.. code-block:: python + + primes: List[int] = [] # not supported in Python 3.5! + +Note that function signatures can always be annotated per :pep:`3107`: + +.. code-block:: python + + def sum_of_primes(primes: List[int] = []) -> int: ... + Pandas-specific Types ~~~~~~~~~~~~~~~~~~~~~ -Commonly used types specific to *pandas* will appear in pandas._typing and you should use these where applicable. This module is private for now but ultimately this should be exposed to third party libraries who want to implement type checking against pandas. +Commonly used types specific to *pandas* will appear in `pandas._typing `_ and you should use these where applicable. This module is private for now but ultimately this should be exposed to third party libraries who want to implement type checking against pandas. -For example, quite a few functions in *pandas* accept a ``dtype`` argument. This can be expressed as a string like ``"object"``, a numpy.dtype like ``np.int64`` or even a pandas ``ExtensionDtype`` like ``pd.CategoricalDtype``. Rather than burden the user with having to constantly annotate all of those options, this can simply be imported and reused from the pandas._typing module +For example, quite a few functions in *pandas* accept a ``dtype`` argument. This can be expressed as a string like ``"object"``, a ``numpy.dtype`` like ``np.int64`` or even a pandas ``ExtensionDtype`` like ``pd.CategoricalDtype``. Rather than burden the user with having to constantly annotate all of those options, this can simply be imported and reused from the pandas._typing module .. code-block:: python From 1c2a8cac9b5545d512beaee024cdc7d9058b314a Mon Sep 17 00:00:00 2001 From: Will Ayd Date: Fri, 28 Jun 2019 11:52:47 -0500 Subject: [PATCH 06/10] lint --- doc/source/development/contributing.rst | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/doc/source/development/contributing.rst b/doc/source/development/contributing.rst index 7ae63f0143a50..4452c8da39b28 100644 --- a/doc/source/development/contributing.rst +++ b/doc/source/development/contributing.rst @@ -718,7 +718,7 @@ You should write .. code-block:: python - from typing import List + from typing import List, Optional, Union primes = [] # type: List[int] @@ -726,23 +726,19 @@ You should write .. code-block:: python - from typing import List, Union - maybe_primes = [] # type: List[Union[int, None]] You should write .. code-block:: python - from typing import List, Optional - maybe_primes = [] # type: List[Optional[int]] In some cases in the code base classes may defined class variables that shadow builtins. This causes an issue as described in `Mypy 1775 `_. The defensive solution here is to create an unambiguous alias of the builtin and use that without your annotation. For example, if you come across a definition like .. code-block:: python - class SomeClass: + class SomeClass1: str = None The appropriate way to annotate this would be as follows @@ -751,7 +747,7 @@ The appropriate way to annotate this would be as follows str_type = str - class SomeClass: + class SomeClass2: str = None # type: str_type From 372020529d563a992649c863f744374d57d39e8d Mon Sep 17 00:00:00 2001 From: Will Ayd Date: Fri, 28 Jun 2019 12:45:16 -0500 Subject: [PATCH 07/10] failure fixup --- doc/source/development/contributing.rst | 2 -- 1 file changed, 2 deletions(-) diff --git a/doc/source/development/contributing.rst b/doc/source/development/contributing.rst index 4452c8da39b28..7ffbb454fd9e1 100644 --- a/doc/source/development/contributing.rst +++ b/doc/source/development/contributing.rst @@ -758,8 +758,6 @@ Because *pandas* still supports Python 3.5, :pep:`526` does not apply and variab .. code-block:: python - from typing import List - primes = [] # type: List[int] Whereas this is **NOT** allowed: From abb22c42586aa120f43424825e6d35480bc56477 Mon Sep 17 00:00:00 2001 From: William Ayd Date: Sun, 30 Jun 2019 11:11:21 -0500 Subject: [PATCH 08/10] Update doc/source/development/contributing.rst Co-Authored-By: Joris Van den Bossche --- doc/source/development/contributing.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/development/contributing.rst b/doc/source/development/contributing.rst index 7ffbb454fd9e1..ba15ed63fd121 100644 --- a/doc/source/development/contributing.rst +++ b/doc/source/development/contributing.rst @@ -734,7 +734,7 @@ You should write maybe_primes = [] # type: List[Optional[int]] -In some cases in the code base classes may defined class variables that shadow builtins. This causes an issue as described in `Mypy 1775 `_. The defensive solution here is to create an unambiguous alias of the builtin and use that without your annotation. For example, if you come across a definition like +In some cases in the code base classes may define class variables that shadow builtins. This causes an issue as described in `Mypy 1775 `_. The defensive solution here is to create an unambiguous alias of the builtin and use that without your annotation. For example, if you come across a definition like .. code-block:: python From 1ad0cb117f5e7c3add17ca098d7c42f8e7c51f1e Mon Sep 17 00:00:00 2001 From: Will Ayd Date: Wed, 10 Jul 2019 14:53:51 -0700 Subject: [PATCH 09/10] Added section on cast --- doc/source/development/contributing.rst | 28 +++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/doc/source/development/contributing.rst b/doc/source/development/contributing.rst index 82a6b733dc63c..f5165ff597ed5 100644 --- a/doc/source/development/contributing.rst +++ b/doc/source/development/contributing.rst @@ -764,6 +764,34 @@ The appropriate way to annotate this would be as follows class SomeClass2: str = None # type: str_type +In some cases you may be tempted to use ``cast`` from the typing module when you know better than the analyzer. This occurs particularly when using custom inference functions. For example + +.. code-block:: + + from typing import cast + + from pandas.core.dtypes.common import is_number + + def cannot_infer_bad(obj: Union[str, int, float]): + + if is_number(obj): + ... + else: # Reasonably only str objects would reach this but... + obj = cast(str, obj) # Mypy complains without this! + return obj.upper() + +The limitation here is that while a human can reasonably understand that ``is_number`` would catch the ``int`` and ``float`` types mypy cannot make that same inference just yet (see `mypy #5206 `_. While the above works, the use of ``cast`` is **strongly discouraged**. Where applicable a refactor of the code to appease static analysis is preferable + +.. code-block:: + + def cannot_infer_good(obj: Union[str, int, float]): + + if isinstance(obj, str): + return obj.upper() + else: + ... + +With custom types and inference this is not always possible so exceptions are made, but every effort should be exhausted to avoid ``cast`` before going down such paths. Syntax Requirements ~~~~~~~~~~~~~~~~~~~ From 01fa88b041e863e60146be1ec9e00eb0a3ff2465 Mon Sep 17 00:00:00 2001 From: Will Ayd Date: Thu, 11 Jul 2019 07:39:11 -0700 Subject: [PATCH 10/10] fixed code-block warnings --- doc/source/development/contributing.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/source/development/contributing.rst b/doc/source/development/contributing.rst index ebb285cd732f6..548760039ab28 100644 --- a/doc/source/development/contributing.rst +++ b/doc/source/development/contributing.rst @@ -766,7 +766,7 @@ The appropriate way to annotate this would be as follows In some cases you may be tempted to use ``cast`` from the typing module when you know better than the analyzer. This occurs particularly when using custom inference functions. For example -.. code-block:: +.. code-block:: python from typing import cast @@ -782,7 +782,7 @@ In some cases you may be tempted to use ``cast`` from the typing module when you The limitation here is that while a human can reasonably understand that ``is_number`` would catch the ``int`` and ``float`` types mypy cannot make that same inference just yet (see `mypy #5206 `_. While the above works, the use of ``cast`` is **strongly discouraged**. Where applicable a refactor of the code to appease static analysis is preferable -.. code-block:: +.. code-block:: python def cannot_infer_good(obj: Union[str, int, float]):