Skip to content

TRACKER: Simple analysis of IO JSON open issues #55046

Open
@loco-philippe

Description

@loco-philippe

Research

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

IO JSON open issues

Question about pandas

You can find below a quick analysis of open Json issues:

  • the third column is a (personal!) classification to mainly identify 'type' or 'dtype' problems
  • the fourth column is a subcategory only of the 'type' category
  • the fifth column identifies the issues that proposal PDEP0012 solves or provides an alternative solution

My first summary is as follows:

  • the first twelve issues will be impacted by PDEP0012
  • five issues are with numeric column name -> maybe they can be grouped together
  • three issues seem closed to me (can anyone check?)
  • eight issues concern None, NA, NaN or NaT values
  • ten issues concern the json_normalize function
Label category sub-category including PDEP0012
16492 No way with to_json to write only date out of datetime type date ok
49585 BUG: Series read_json tries to convert all column values to dates even when using keep_default_dates=True - if one column has an na value type datetime - NA ok
12997 to_json converts to UTC when encoding ISO formatted datetimes type datetime - tz ok
53252 ENH: simple - compact and reversible JSON interface type extend type ok
14358 read_json Raises AttributeError with Valid JSON as Input type null - NA ok
35464 BUG: Type mismatch in read_json type null - NA ok
51375 "BUG: to_json/read_json with orient=""table"" does not preserve types with pd.NA" type null - NA ok
36211 BUG: to_json for DataFrame containing Path objects crash with infinite recursion type Path ok
50782 BUG: Complex Numbers Not Imported Correctly Under JSON Read type table - complex ok
35420 to_json/read_json can't handle interval index type table - interval ok
39537 Error when converting df to json table (utc timezone date time object causes the error) type table - tz ok
52595 BUG: json that could be read by pandas 1.5.3 cannot be read by 2.0.0 type table - tz ok
16848 UnicodeDecodeError with html.table_schema = True type binary
25336 [BUG?] pd.read_json does not convert date before 1971-01-01 type datetime - conversion
22317 Request to add more date formats in to_json method type datetime - format
47930 ENH: Add new date_format option to_json matching datetime.isoformat exactly type datetime - tz
21454 pd.read_json converts large floats to inf type float - conversion
23328 inconsistent float rounding in to_json type float - conversion
44684 BUG: the precision of big integer in read_json type int - conversion
28609 OverflowError on using to_json to serialize NaN value with type Decimal type null - NA
31801 to_json index with Null Value Broken in 1.0 type null - NA
44693 BUG: dtypes cast when reading JSON type null - NA
46627 BUG: Pandas's ujson module incorrectly returns None when it reads NaN type null - NA
20608 read_json reads large integers as strings incorrectly if dtype not explicitly mentioned type str - conversion
42471 BUG: read_json converts Numeric Strings to Numbers type str - conversion
29025 Incorrect json round-trip with orient='table' when dataframe contains duplicate index values type table - index
19129 Raise ValueError for read_json and orient='table' With Numeric Column Names type table - int col
38256 "BUG: pandas to_json with orient ""table"" returns wrong schema & data string" type table - int col
40674 BUG: pd.read_json sets wrong value for numeric column names type table - int col
46392 BUG: Integer column index breaks json roundtrip with orient=table type table - int col
32037(44705) JSON table orient not roundtripping extension types type table - int col
26692 If tuples used as index pd.read_json( orient='split') does not read file saved by df.to_json(orient='split) type tuple index
21140 "Add Timedelta Support to JSON Reader with orient=""table""" to close
23584 Series to_json Docstring Updates to close
31917 to_json of Series with period dtype results in AttributeError to close
37100 BUG: Series.to_json produces incorrect json format to be completed
45959 QST: Why to_json defaults to force_ascii=True question
27241 ENH: Ignore flattening certain keys in json_normalize normalize
33414 ENH: Optionally pass dtypes as a dict into json_normalize normalize
34028 What is the best way to normalize_json before read_json for the file with gigabytes size? normalize
34465 BUG: unexpected behavior of json_normalize meta arg normalize
36245 BUG: pd.json_normalize on a column loses rows that have an empty list for that column normalize
42311 ENH: json_normalize flatten lists as well normalize
44329 ENH: errors='ignore' should work for record_path for pandas.json_normalize function normalize
51452 pd.json_normalize doesn't return data with index from series normalize
53126 BUG: json_normalize does not parse nested lists consistently normalize
54121 DOC: description of record_prefix param for json_normalize is wrong normalize
29928 Using to_json/read_json with orient='table' on a DataFrame with a single level MultiIndex does not work multiindex
50456 BUG: JSON serialization with orient split fails roundtrip with MultiIndex multiindex
42582 ENH: col descriptions that'd save in df schemas - helping users avoid creating separate documentation? metadata
51012 ENH: Include df.attrs in to_json output metadata
19261 Standardize pandas metadata for table schema and parquet internal
20599 OverflowError: Python int too large to convert to C long internal
28180 to_iso methods for DatetimeLikeArray internal
32326 "Unexpected behaviour of df.to_json(compression=""gzip"")" internal
33014 to_json should make separators configurable (similar to json.dump) internal
33877 BUG: weird interaction between pyslurm - ujson that changes function signature of ujson.dumps internal
35279 pandas/tests/io/json/test_pandas.py::TestPandasContainer::test_read_json_large_numbers failing for 32-bit system internal
39135 ENH: Add support for date_unit to be specified per column in to_json internal
41521 ENH: Add support to read_json to encode character escape hex codes to utf-8 characters internal
44881 ENH: change pd.read_json kwarg to rtype or return_type? internal
49604 BUG/CLN: Vendored ujson Module internal
54865 BUG: LSAN Detected Memory Leaks internal
17220 Enhancement: to_json and read_json for DataFrame should have option to output/parse values by column format
39913 ENH: new orient setting for read_json to support common API format format
46571 ENH: Allow usage of custom library to serialize with to_json method format
12286 Feature suggestion: flexible hierarchical data (json) importer (will implement if interest exists) extension
22853 Add chunksize support to to_json chunk

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO JSONread_json, to_json, json_normalizeMaster TrackerHigh level tracker for similar issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions