Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
I stumbled over the problem of finding the first non-null value in a Series: https://stackoverflow.com/questions/42137529/pandas-find-first-non-null-value-in-column
It seems that Series.first_valid_index
is the intended function for this, however, it doesn't work well when there are duplicate indices. I have some real world data that looks like this:
import pandas as pd
values = [pd.NA, pd.NA, 0.6, 1.7, pd.NA, pd.NA]
index = pd.Index(["2020-12-09 09:48:38"] * len(values), name="time", dtype="datetime64[ns]")
s = pd.Series(values, dtype="Float32", name="value", index=index)
time
2020-12-09 09:48:38 <NA>
2020-12-09 09:48:38 <NA>
2020-12-09 09:48:38 0.6
2020-12-09 09:48:38 1.7
2020-12-09 09:48:38 <NA>
2020-12-09 09:48:38 <NA>
Name: value, dtype: Float32
Obviously first_valid_index
is completely useless here, as s.loc[s.first_valid_index()]
returns the Series as-is.
One can solve the problem and obtain the first non-null value by doing one of the following:
s.iloc[s.reset_index(drop=True).first_valid_index()]
s.loc[s.first_valid_index()].dropna().iloc[0]
s.dropna().loc[s.first_valid_index()].iloc[0]
However, all of these are quite verbose, or come with other disadvantages (e.g. computational overhead).
Feature Description
Two ideas:
- Add an option
skipna: bool = False
toSeries.items
, such that the returned generator skips over null values. Then the first non-null value can be obtained viaidx, val = next(iter(s.items(skipna=True)))
- Add a function that gives the first non-null value (or
None
for empty Series / Series consisting only of nulls).
(2) can be implemented trivially using (1), however I think (1) could be quite useful in other contexts as well.
Alternative Solutions
Update documentation on Series.first_valid_index
on how to robustly find the first non-null value.
Additional Context
No response