Skip to content

Add an is_dtype function #199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Aug 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions spec/API_specification/dataframe_api/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
"Float64",
"Float32",
"Bool",
"is_dtype",
]


Expand Down Expand Up @@ -239,3 +240,35 @@ class Float32:

class Bool:
"""Boolean type with 8 bits of precision."""


def is_dtype(dtype: Any, kind: str | tuple[str, ...]) -> bool:
"""
Returns a boolean indicating whether a provided dtype is of a specified data type “kind”.

Parameters
----------
dtype: Any
The input dtype.
kind: str
Copy link
Contributor Author

@MarcoGorelli MarcoGorelli Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the array api also allows dtype, but I don't think we need that here. If you want to check whether a dtype is of a specific dtype, you can just use isinstance, e.g.

isinstance(column.dtype, namespace.Float64)

I think this function is more useful for checking conceptual hierarchies, like 'integral'

So, for now, I'm only allowing str | tuple[str, ...]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the reason for allowing dtype instances is to allow arbitrary composition of dtype sets beyond the supported kinds and allow an escape hatch beyond the kinds officially supported by the array API standard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't you do that with

is_dtype(column.dtype, 'integral') or isinstance(column.dtype, MyFancyDType)

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that becomes a bit unwieldy quickly if you want to arbitrarily combine multiple "kinds" and/or individual dtypes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest keeping this simple to begin with, we can always increase complexity later if necessary

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have made the same comment as @kgryte, I think it'll be needed and is better than using isinstance. However, it's a fully backwards-compatible extension, so I'm fine with leaving this comment unresolved here and a follow-up PR to extend the functionality to:

kind: str | tuple[str, ...] | DType | tuple[DType, ..]

data type kind.
The function must return a boolean indicating whether
the input dtype is of a specified data type kind.
The following dtype kinds must be supported:

- 'bool': boolean data type (Bool).
- 'signed integer': signed integer data types (Int8, Int16, Int32, Int64).
- 'unsigned integer': unsigned integer data types (UInt8, UInt16, UInt32, UInt64).
- 'floating': floating-point data types (Float32, Float64).
- 'integral': integer data types. Shorthand for ('signed integer', 'unsigned integer').
- 'numeric': numeric data types. Shorthand for ('integral', 'floating').

If kind is a tuple, the tuple specifies a union of dtypes and/or kinds,
and the function must return a boolean indicating whether the input dtype
is either equal to a specified dtype or belongs to at least one specified
data type kind.

Returns
-------
bool
"""
1 change: 1 addition & 0 deletions spec/API_specification/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ of objects and functions in the top-level namespace. The latter are:
Float64
Float32
Bool
is_dtype
column_from_sequence
column_from_1d_array
dataframe_from_dict
Expand Down