Description
Pandas version checks
- I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html
Documentation problem
When adding a new column to an exisitng dataframe using pd.NA is tricky.
>>> df = pd.DataFrame({'A': [1, 2]})
>>> df.dtypes
A int64
dtype: object
>>> df
A
0 1
1 2
Adding a new column of type datetime64[ns]
works as expected using pd.NaT
:
>>> df['T'] = pd.NaT
>>> df.dtypes
A int64
T datetime64[ns]
dtype: object
>>> df
A T
0 1 NaT
1 2 NaT
However, adding a column using pd.NA
produces a column of type object
:
>>> df['N'] = pd.NA
>>> df.dtypes
A int64
T datetime64[ns]
N object
dtype: object
>>> df
A T N
0 1 NaT <NA>
1 2 NaT <NA>
That is exactly what it is asked for but not what I was naively expecting (a column of type Int64
)
This is logical as the pd.NA
type is used also for string
and boolean
types and maybe others.
The best way I found to create a new, empty column of a desired nullable type is the following:
>>> df['N'] = pd.Series(dtype='Int64')
>>> df['S'] = pd.Series(dtype='string')
>>> df['B'] = pd.Series(dtype='boolean')
>>> df.dtypes
A int64
T datetime64[ns]
N Int64
S string
B boolean
dtype: object
>>> df
A T N S B
0 1 NaT <NA> <NA> <NA>
1 2 NaT <NA> <NA> <NA>
Suggested fix for documentation
I think it will be enough to mention this point as an additional entry in the page from nullable datatypes, to show that creating a column initialised with pd.NA
is not different to creating a column with any other, non numeric python object.