-
Notifications
You must be signed in to change notification settings - Fork 21
Add a fill_null
method to dataframe and column
#173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -774,3 +774,35 @@ def fill_nan(self, value: float | 'null', /) -> DataFrame: | |
|
||
""" | ||
... | ||
|
||
def fill_null( | ||
self, value: Scalar, /, *, column_names : list[str] | None = None | ||
) -> DataFrame: | ||
""" | ||
Fill null values with the given fill value. | ||
|
||
This method can only be used if all columns that are to be filled are | ||
of the same dtype (e.g., all of ``Float64`` or all of string dtype). | ||
If that is not the case, it is not possible to use a single Python | ||
scalar type that matches the dtype of all columns to which | ||
``fill_null`` is being applied, and hence an exception will be raised. | ||
|
||
Parameters | ||
---------- | ||
value : Scalar | ||
Value used to replace any ``null`` values in the dataframe with. | ||
Must be of the Python scalar type matching the dtype(s) of the dataframe. | ||
column_names : list[str] | None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Potential alternative name could be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We currently don't use either of Anyone else have a preference? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this seems fine |
||
A list of column names for which to replace nulls with the given | ||
scalar value. If ``None``, nulls will be replaced in all columns. | ||
|
||
Raises | ||
------ | ||
TypeError | ||
If the columns of the dataframe are not all of the same kind. | ||
KeyError | ||
If ``column_names`` contains a column name that is not present in | ||
the dataframe. | ||
|
||
""" | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the PR to discuss it but it would be really nice if for typing purposes we could type this as:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gh-187 makes a big step in this direction. That said, the generics don't quite work here I think, since the
T
inScalar[T]
is a Python scalar while inColumn[T]
it's a dtype. We happened to have a good discussion about gh-187 yesterday, and striking a balance between making the spec fully type-checkable and keeping it readable. I suspect this kind of case requires overloads, which may be useful at some point but can't be rendered as html - so it divorces static typing a bit from spec writing.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there should be a way to make
Scalar[T]
work, and every library can define what their scalar type is corresponding to each of the standard's types. But that's for another issue, let's pick it up again after #187 is in (I'll update it today)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick comment: they are, or at least include, Python builtin types, which one has no control over. So it'd be
int) -> Column[Int64]
or some such thing.Sounds good, thanks.