-
Notifications
You must be signed in to change notification settings - Fork 21
Make the spec implementation in this repo type checkable #187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm not sure this can work with the doc build. Unless Sphinx gained some capability to read docstrings from |
yup, that's right, thanks - I've gone for using |
I've updated to make Sometimes I've used Any objections to getting this in? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MarcoGorelli. The Column[DtypeT]
and DataFrame
changes look quite good - nice to see that things are still readable and they now type-check.
The one thing that raises questions for me is class Scalar(Generic[DTypeT]):
and using things like Scalar[Bool]
. That does not seem right, or at a minimum confusing, because there are no scalar instances in the spec - where we used Scalar
till now we really mean instances of Python bool
, int
or float
, with the caveat that they may be ducktyped. The way you have it now seems to hint at "dataframe scalars" a la numpy's "array scalars".
See https://data-apis.org/dataframe-api/draft/design_topics/python_builtin_types.html |
Did you try making diff --git a/spec/API_specification/dataframe_api/column_object.py b/spec/API_specification/dataframe_api/column_object.py
index 3880406..145ac8a 100644
--- a/spec/API_specification/dataframe_api/column_object.py
+++ b/spec/API_specification/dataframe_api/column_object.py
@@ -11,6 +11,11 @@ if TYPE_CHECKING:
__all__ = ['Column']
+# Type alias: Mypy needs Any, but for readability we need to make clear this
+# is a Python scalar (i.e., an instance of `bool`, `int`, `float`, `str`, etc.)
+Scalar = Any
+
+
class Column(Generic[DType]):
"""
Column object
@@ -138,7 +143,7 @@ class Column(Generic[DType]):
"""
...
- def __eq__(self: Column[DType], other: Column[DType] | Any) -> Column[Bool]: # type: ignore[override]
+ def __eq__(self: Column[DType], other: Column[DType] | Scalar) -> Column[Bool]: # type: ignore[override]
"""
Compare for equality. Typing-wise it's fully equivalent, and it avoids throwing away the intent. It'd also make the diff of this PR quite a bit smaller. |
Besides that issue, this looks pretty good to me now. |
@@ -59,7 +60,7 @@ def concat(dataframes: Sequence[DataFrame]) -> DataFrame: | |||
""" | |||
... | |||
|
|||
def column_from_sequence(sequence: Sequence[object], *, dtype: DType) -> Column: | |||
def column_from_sequence(sequence: Sequence[Any], *, dtype: Any) -> Column[Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dtype: DType
seems more specific that it actually needs to be a dtype instance, and not eg a string repr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that too, but I think it can't be addressed with a type alias like I suggested for Scalar
, because DType
is already a TypeVar
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's address #189 first, then we can figure out how to type this
for now I've marked as Any
to not block the benefits which this PR brings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks quite good now, CI is green and it's helpful for prototyping in https://github.com/data-apis/dataframe-api-compat/ - so let's get this in. Thanks @MarcoGorelli!
Currently, the repo can't be type-checked. In particular, it can't be used to verify whether implementations conform to it
Let's change that! 🚀
Summary of changes:
DataFrame
,Series
, andScalar
, so we can use that in type annotations. If you see aColumn
method likedef foo(self) -> Column[T]
, it means it returns a column of the same dtypeThis helped find quite a few missing bits in https://github.com/MarcoGorelli/impl-dataframe-api