Skip to content

ValueError when reading JSON lines file  #30716

Closed
@danijar

Description

@danijar

Overview

Using pandas==0.25.1 with Python 3.7.1 on Debian, loading the following JSON lines file fails using pandas.read_json() but succeeds when read manually.

After looking into this a bit, I think it might be related to NaN in the JSON file which is not supported by the spec but accepted by json.loads(). If that turns out to be the case, it would be good to have an option to ignore those entries or at least provide a detailed error message.

Data file: https://gist.github.com/danijar/37ba75a6991d61de9e77755329bb5ef4

Manual

Reading the file manually using json.loads() and passing it to a pd.DataFrame works fine:

import json
import pandas as pd
with open(filename) as f:
  df = pd.DataFrame([json.loads(l) for l in f.readlines()])
print(df)  # Shows data frame as expected
Terminal output
       step  train/return  train/length  episodes  ...  value_loss  action_loss  action_ent        fps
0      1000           1.0         500.0       1.0  ...         NaN          NaN         NaN        NaN
1      2000           0.0         500.0       2.0  ...         NaN          NaN         NaN        NaN
2      3000         163.0         500.0       3.0  ...         NaN          NaN         NaN        NaN
3      4000           0.0         500.0       4.0  ...         NaN          NaN         NaN        NaN
4      5000           0.0         500.0       5.0  ...         NaN          NaN         NaN        NaN
..      ...           ...           ...       ...  ...         ...          ...         ...        ...
798  383000           0.0         500.0     383.0  ...         NaN          NaN         NaN        NaN
799  383000           NaN           NaN       NaN  ...         NaN          NaN         NaN  19.500059
800  384000           0.0         500.0     384.0  ...         NaN          NaN         NaN        NaN
801  384000           NaN           NaN       NaN  ...         NaN          NaN         NaN  19.608651
802  385000        1000.0         500.0     385.0  ...         NaN          NaN         NaN        NaN

[803 rows x 19 columns]

Pandas

But reading the same file with pandas.read_json() fails with an Pandas internal error:

import pandas as pd
df = pd.read_json(filename, lines=True)  # ValueError: Expected object or value
Terminal output
<path-to-python3.7>/site-packages/pandas/io/json/_json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, lines, chunksize, compression)
    590         return json_reader
    591
--> 592     result = json_reader.read()
    593     if should_close:
    594         try:

<path-to-python3.7>/site-packages/pandas/io/json/_json.py in read(self)
    713         elif self.lines:
    714             data = ensure_str(self.data)
--> 715             obj = self._get_object_parser(self._combine_lines(data.split("\n")))
    716         else:
    717             obj = self._get_object_parser(self.data)

<path-to-python3.7>/site-packages/pandas/io/json/_json.py in _get_object_parser(self, json)
    737         obj = None
    738         if typ == "frame":
--> 739             obj = FrameParser(json, **kwargs).parse()
    740
    741         if typ == "series" or obj is None:

<path-to-python3.7>/site-packages/pandas/io/json/_json.py in parse(self)
    847
    848         else:
--> 849             self._parse_no_numpy()
    850
    851         if self.obj is None:

<path-to-python3.7>/site-packages/pandas/io/json/_json.py in _parse_no_numpy(self)
   1091         if orient == "columns":
   1092             self.obj = DataFrame(
-> 1093                 loads(json, precise_float=self.precise_float), dtype=None
   1094             )
   1095         elif orient == "split":

ValueError: Expected object or value

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions