Description
Code Sample, a copy-pastable example if possible
import pandas as pd
df = pd.DataFrame(
{
"x": [0, 1],
"date": pd.to_datetime(["2020-01-05T12:00", "2020-01-05T13:00"]),
}
)
df2 = pd.DataFrame({"x": [0, 2], "y": [0, 1]})
merged = pd.merge(df, df2, on="x", how="outer")
merged.info()
print(merged)
<class 'pandas.core.frame.DataFrame'> Int64Index: 3 entries, 0 to 2 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 x 3 non-null int64 1 date 3 non-null object 2 y 2 non-null float64 dtypes: float64(1), int64(1), object(1) memory usage: 96.0+ bytes
x date y 0 0 1578225600000000000 0.0 1 1 1578229200000000000 NaN 2 2 -9223372036854775808 1.0
merged["date"].map(type)
0 <class 'int'> 1 <class 'int'> 2 <class 'int'> Name: date, dtype: object
print(pd.merge(df, df2, on="x", how="inner"))
x date y 0 0 2020-01-05 12:00:00 0
Problem description
Merging 2 DataFrames which result in missing Datetime values silently converts the column to integers with an object
dtype, instead of keeping datetime64[ns]
as dtype, which was the behaviour in pandas 0.25.3, and the behaviour I would expect.
Expected Output
merged = pd.merge(df, df2, on="x", how="outer")
merged.info()
print(merged)
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3 entries, 0 to 2
Data columns (total 3 columns):
x 3 non-null int64
date 2 non-null datetime64[ns]
y 2 non-null float64
dtypes: datetime64[ns](1), float64(1), int64(1)
memory usage: 96.0 bytes
x date y
0 0 2020-01-05 12:00:00 0.0
1 1 2020-01-05 13:00:00 NaN
2 2 NaT 1.0
merged["date"].map(type)
0 <class 'pandas._libs.tslibs.timestamps.Timesta... 1 <class 'pandas._libs.tslibs.timestamps.Timesta... 2 <class 'pandas._libs.tslibs.nattype.NaTType'> Name: date, dtype: object
This is the output from Python 0.25.3
Whether the y
-column should be converted to float or Int64
is open for debate, but not the issue at hand here.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 8.1
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 1.0.0rc0
numpy : 1.17.5
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 45.1.0.post20200119
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : 3.0.2
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : None