Skip to content

Make the spec implementation in this repo type checkable #187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
Jul 10, 2023

Conversation

MarcoGorelli
Copy link
Contributor

@MarcoGorelli MarcoGorelli commented Jun 24, 2023

Currently, the repo can't be type-checked. In particular, it can't be used to verify whether implementations conform to it

Let's change that! 🚀

Summary of changes:

  • adding a generic type for DataFrame, Series, and Scalar, so we can use that in type annotations. If you see a Column method like def foo(self) -> Column[T], it means it returns a column of the same dtype

This helped find quite a few missing bits in https://github.com/MarcoGorelli/impl-dataframe-api

@MarcoGorelli MarcoGorelli marked this pull request as draft June 24, 2023 08:15
@rgommers
Copy link
Member

renaming the files .py -> .pyi (so their bodies don't need filling in)

I'm not sure this can work with the doc build. Unless Sphinx gained some capability to read docstrings from .pyi files? I'm not aware of docstrings in .pyi files being a thing at all.

@MarcoGorelli MarcoGorelli marked this pull request as ready for review June 24, 2023 13:53
@MarcoGorelli
Copy link
Contributor Author

yup, that's right, thanks - I've gone for using mypy.ini to ignore the empty-body code instead

@MarcoGorelli MarcoGorelli requested a review from rgommers June 27, 2023 12:11
@MarcoGorelli
Copy link
Contributor Author

I've updated to make DataFrame no longer generic, this simplifies things a bit

Sometimes I've used Column[Any] when perhaps something more precise would have been used, but for now I'd suggest we start with this so we at least have something type-checkable which we can use to test implementations

Any objections to getting this in?

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MarcoGorelli. The Column[DtypeT] and DataFrame changes look quite good - nice to see that things are still readable and they now type-check.

The one thing that raises questions for me is class Scalar(Generic[DTypeT]): and using things like Scalar[Bool]. That does not seem right, or at a minimum confusing, because there are no scalar instances in the spec - where we used Scalar till now we really mean instances of Python bool, int or float, with the caveat that they may be ducktyped. The way you have it now seems to hint at "dataframe scalars" a la numpy's "array scalars".

@rgommers
Copy link
Member

we really mean instances of Python bool, int or float, with the caveat that they may be ducktyped.

See https://data-apis.org/dataframe-api/draft/design_topics/python_builtin_types.html

@MarcoGorelli MarcoGorelli marked this pull request as draft June 29, 2023 07:42
@MarcoGorelli MarcoGorelli marked this pull request as ready for review June 29, 2023 08:31
@MarcoGorelli
Copy link
Contributor Author

Thanks, have updated

I've marked a few types as Any (hopefully temporarily) until #188 and #189 are resolved, but this at least makes it so that implementations can use it

@rgommers
Copy link
Member

rgommers commented Jul 4, 2023

Did you try making Scalar a type alias of Any? E.g., Mypy seems perfectly happy with this:

diff --git a/spec/API_specification/dataframe_api/column_object.py b/spec/API_specification/dataframe_api/column_object.py
index 3880406..145ac8a 100644
--- a/spec/API_specification/dataframe_api/column_object.py
+++ b/spec/API_specification/dataframe_api/column_object.py
@@ -11,6 +11,11 @@ if TYPE_CHECKING:
 __all__ = ['Column']
 
 
+# Type alias: Mypy needs Any, but for readability we need to make clear this
+# is a Python scalar (i.e., an instance of `bool`, `int`, `float`, `str`, etc.)
+Scalar = Any
+
+
 class Column(Generic[DType]):
     """
     Column object
@@ -138,7 +143,7 @@ class Column(Generic[DType]):
         """
         ...
 
-    def __eq__(self: Column[DType], other: Column[DType] | Any) -> Column[Bool]:  # type: ignore[override]
+    def __eq__(self: Column[DType], other: Column[DType] | Scalar) -> Column[Bool]:  # type: ignore[override]
         """
         Compare for equality.

Typing-wise it's fully equivalent, and it avoids throwing away the intent. It'd also make the diff of this PR quite a bit smaller.

@rgommers
Copy link
Member

rgommers commented Jul 4, 2023

Besides that issue, this looks pretty good to me now.

@@ -59,7 +60,7 @@ def concat(dataframes: Sequence[DataFrame]) -> DataFrame:
"""
...

def column_from_sequence(sequence: Sequence[object], *, dtype: DType) -> Column:
def column_from_sequence(sequence: Sequence[Any], *, dtype: Any) -> Column[Any]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtype: DType seems more specific that it actually needs to be a dtype instance, and not eg a string repr?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that too, but I think it can't be addressed with a type alias like I suggested for Scalar, because DType is already a TypeVar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's address #189 first, then we can figure out how to type this

for now I've marked as Any to not block the benefits which this PR brings

@rgommers rgommers added the static typing type annotations, use of type checkers directly from the spec label Jul 5, 2023
@rgommers rgommers changed the title Make type checkable Make the spec implementation in this repo type checkable Jul 5, 2023
@MarcoGorelli MarcoGorelli requested a review from rgommers July 7, 2023 16:50
Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite good now, CI is green and it's helpful for prototyping in https://github.com/data-apis/dataframe-api-compat/ - so let's get this in. Thanks @MarcoGorelli!

@rgommers rgommers merged commit d10e22a into data-apis:main Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
static typing type annotations, use of type checkers directly from the spec
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants