Skip to content

BUG: since pandas==1.1.0 pd.read_json() fails for strings that look similar to fsspec_url #36271

Closed
@tbachlechner

Description

@tbachlechner
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd

input_json = '[{"0":"this is a string ://"}]'

print('Input json string: {}'.format(input_json))
print('URL? {}'.format(str(pd.io.common.is_url(input_json))))
print('fsspec? {}'.format(str(pd.io.common.is_fsspec_url(input_json))))

print(pd.read_json(input_json))

output:

Input json string: [{"0":"this is a string ://"}]
URL? False
fsspec? True
---------------------------------------------------------------------------

ImportError                               Traceback (most recent call last)

<ipython-input-1-f3bc94ba133f> in <module>
      7 print('fsspec? {}'.format(str(pd.io.common.is_fsspec_url(input_json))))
      8 
----> 9 pd.read_json(input_json)

~/miniconda3/envs/eltest/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    197                 else:
    198                     kwargs[new_arg_name] = new_arg_value
--> 199             return func(*args, **kwargs)
    200 
    201         return cast(F, wrapper)

~/miniconda3/envs/eltest/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    294                 )
    295                 warnings.warn(msg, FutureWarning, stacklevel=stacklevel)
--> 296             return func(*args, **kwargs)
    297 
    298         return wrapper

~/miniconda3/envs/eltest/lib/python3.7/site-packages/pandas/io/json/_json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, lines, chunksize, compression, nrows)
    592     compression = infer_compression(path_or_buf, compression)
    593     filepath_or_buffer, _, compression, should_close = get_filepath_or_buffer(
--> 594         path_or_buf, encoding=encoding, compression=compression
    595     )
    596 

~/miniconda3/envs/eltest/lib/python3.7/site-packages/pandas/io/common.py in get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode, storage_options)
    201         if filepath_or_buffer.startswith("s3n://"):
    202             filepath_or_buffer = filepath_or_buffer.replace("s3n://", "s3://")
--> 203         fsspec = import_optional_dependency("fsspec")
    204 
    205         # If botocore is installed we fallback to reading with anon=True

~/miniconda3/envs/eltest/lib/python3.7/site-packages/pandas/compat/_optional.py in import_optional_dependency(name, extra, raise_on_missing, on_version)
    108     except ImportError:
    109         if raise_on_missing:
--> 110             raise ImportError(msg) from None
    111         else:
    112             return None

ImportError: Missing optional dependency 'fsspec'.  Use pip or conda to install fsspec.

Problem description

The method pd.read_json() is widely used and accepts either a path or a json string. Since pandas==1.1.0 passing a string containing a json input is often interpreted as a fssspec_url and results in an error.

Expected Output

Input json string: [{"0":"this is a string ://"}]
URL? False
fsspec? False
                       0
0  this is a string ://

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : 2a7d3326dee660824a8433ffd01065f8ac37f7d6
python           : 3.7.7.final.0
python-bits      : 64
OS               : Linux
OS-release       : 4.15.0-112-generic
Version          : #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.1.2
numpy            : 1.19.1
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.2.2
setuptools       : 49.6.0.post20200814
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : 2.8.5 (dt dec pq3 ext lo64)
jinja2           : 3.0.0a1
IPython          : 7.18.1
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : 3.3.1
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pyxlsb           : None
s3fs             : None
scipy            : 1.5.2
sqlalchemy       : 1.3.13
tables           : None
tabulate         : 0.8.7
xarray           : None
xlrd             : None
xlwt             : None
numba            : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO JSONread_json, to_json, json_normalize

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions