Description
-
[ X] I have checked that this issue has not already been reported.
-
[ X] I have confirmed this bug exists on the latest version of pandas.
-
[ X] (optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
import pandas as pd
# this works fine in 0.25.x and 1.0.x
json_line= pd.DataFrame([([1, 2], "hector")], columns=['accounts', 'name']).to_json(lines=True, orient='records')
data = pd.read_json(json_line)
# this works fine in 0.25.x and 1.0.x --- convert_dates = False (non-default behavior)
json_line= pd.DataFrame([([1, 2], ['2020-03-05', '2020-04-08T09:58:49+00:00'], "hector")], columns=['accounts', 'date', 'name']).to_json(lines=True, orient='records')
json_line
data = pd.read_json(json_line, convert_dates=False)
# this does not error in 0.25.x but errors in 1.0.x -- error list is unhashable
json_line= pd.DataFrame([([1, 2], ['2020-03-05', '2020-04-08T09:58:49+00:00'], "hector")], columns=['accounts', 'date', 'name']).to_json(lines=True, orient='records')
json_line
data = pd.read_json(json_line)
Problem description
In pandas 0.25.x (and below), pandas read_json(path, lines=True) was able to read newline-delimited json files with blobs with entries that contained lists of all types. In Pandas 1.0.x, this same data causes a unhashable object error. The error appears to be due to lists that contain date-like objects that are converted by default with convert_dates=True.
In pandas 0.25.x, it handles the lists of dates, but does not convert those items. they are kept as strings.
Expected Output
pandas dataframe containing lists without error.
df.to_json()
returns
'{"accounts":{"0":[1,2]},"event_time":{"0":["2020-04-08T09:50:49+00:00","2020-04-08T09:58:49+00:00"]},"name":{"0":"hector"}}'
Output of 1.0.3
Using pd.read_json(json_line, lines=True)
Errors vary slightly depending on context:
- TypeError: unhashable type: 'list'
- TypeError: <class 'list'> is not convertible to datetime. in the small example above
using pd.read_json(json_line, lines=True, convert_dates=False) returns expected output consistent with pandas 0.25.x.
[paste the output of 0.25.3 here leaving a blank line after the details tag]
'{"accounts":{"0":[1,2]},"event_time":{"0":["2020-04-08T09:50:49+00:00","2020-04-08T09:58:49+00:00"]},"name":{"0":"hector"}}'