Skip to content

ENH: Add option skipna: bool to Series.items() / introduce first_valid_value #54165

Open
@randolf-scholz

Description

@randolf-scholz

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I stumbled over the problem of finding the first non-null value in a Series: https://stackoverflow.com/questions/42137529/pandas-find-first-non-null-value-in-column

It seems that Series.first_valid_index is the intended function for this, however, it doesn't work well when there are duplicate indices. I have some real world data that looks like this:

import pandas as pd

values = [pd.NA, pd.NA, 0.6, 1.7, pd.NA, pd.NA]
index = pd.Index(["2020-12-09 09:48:38"] * len(values), name="time", dtype="datetime64[ns]")
s = pd.Series(values, dtype="Float32", name="value", index=index)
time
2020-12-09 09:48:38    <NA>
2020-12-09 09:48:38    <NA>
2020-12-09 09:48:38     0.6
2020-12-09 09:48:38     1.7
2020-12-09 09:48:38    <NA>
2020-12-09 09:48:38    <NA>
Name: value, dtype: Float32

Obviously first_valid_index is completely useless here, as s.loc[s.first_valid_index()] returns the Series as-is.

One can solve the problem and obtain the first non-null value by doing one of the following:

  1. s.iloc[s.reset_index(drop=True).first_valid_index()]
  2. s.loc[s.first_valid_index()].dropna().iloc[0]
  3. s.dropna().loc[s.first_valid_index()].iloc[0]

However, all of these are quite verbose, or come with other disadvantages (e.g. computational overhead).

Feature Description

Two ideas:

  1. Add an option skipna: bool = False to Series.items, such that the returned generator skips over null values. Then the first non-null value can be obtained via idx, val = next(iter(s.items(skipna=True)))
  2. Add a function that gives the first non-null value (or None for empty Series / Series consisting only of nulls).

(2) can be implemented trivially using (1), however I think (1) could be quite useful in other contexts as well.

Alternative Solutions

Update documentation on Series.first_valid_index on how to robustly find the first non-null value.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions