Skip to content

DISCUSS: disambiguation of NA and "NA" in reprs #30415

Closed
@anisotropi4

Description

@anisotropi4

Change to str dtype behaviour for missing elements

Following comments the discussion about how to handle missing NA scalar values in #28778 I was asked to raise my question as this seperate issue.

My rather prosaic question is how if missing str elements are given the value NA, how would I distinguish between a missing str value and the two-character string 'NA'?

I ask as NA is a common abbreviation for 'Not Applicable', 'North America' et al, in a way that in my experience that 'NaN' or 'Not a Number' isn't

That is, if 'NA' were generated as the default missing str dtype value, especially if introduced as change rather than as a opt-in, it risks becoming a UX developer issue as I (for one) would no longer know if 'NA' is a valid or a missing data value.

For what it's worth, current idiomatic behaviour is that in a missing values would be replaced by None dtype:

   >>> array = [['No-one', 'Nadie'], ['Expects']]
   >>> df = pd.DataFrame(array, columns=['En', 'Es'])
           En     Es
   0   No-one  Nadie
   1  Expects   *None*

The dtypes here are:

   >>> [type(i) for i in df['Es']]
   [<class 'str'>, <class 'NoneType'>]

Given this, my thought is that NA is not a suitable default replacement for missing str dtype elements rather None of NoneType dtype

Metadata

Metadata

Assignees

No one assigned

    Labels

    ExtensionArrayExtending pandas with custom dtypes or arrays.Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateOutput-Formatting__repr__ of pandas objects, to_string

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions