-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: improve pd.io.json_normalize #57811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@slavanorm in case you haven't seen it, your changes are making the tests fail: https://github.com/pandas-dev/pandas/actions/runs/8239141116/job/22531710877?pr=57811#step:8:53 |
I'm not sure about this change - the point of the |
yes it should but it creates rows only for the dictionaries with record path. |
This docs check is failing, and its not fixable. Wonder what should we do now |
pandas/io/json/_normalize.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd
please see lines 449-452. this code is highly related to errors=ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea but why should it not work for errors="raise"
? The errors
argument is specifically for testing the presence of a key, which exists here. My major point is that we are mixing up two different types of errors if we don't separate the concepts of missing keys versus empty values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, it is not very different for current code when we traverse and make early stop returning n/a.
I guess you propose raising errors only for missing keys? And missing values should be always safe?
I guess if we make such a feature, it should provide choice to user, which case should raise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I think the error should just be for missing keys. There should be no such thing as a missing value in well-formed JSON
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just wanted to point that key or value is arbitrary in our case.
value is just the end of json walking we defined in meta or record_path
I fixed the code, wish someone reviewed it |
sorry i closed it by mistake, just reopened it |
@WillAyd could you please review the code again, it's been 2 weeks i think |
@@ -663,6 +676,22 @@ def test_missing_nested_meta(self): | |||
errors="raise", | |||
) | |||
|
|||
def test_missing_nested_meta_traverse_empty_list_errors_ignore(self): | |||
# If errors="ignore" and nested metadata is nullable, return nan | |||
data = {"meta": "foo", "nested_meta": [], "value": [{"rec": 1}, {"rec": 2}]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data = {"meta": "foo", "nested_meta": [], "value": [{"rec": 1}, {"rec": 2}]} | |
data = {"meta": "foo", "nested_meta": {}, "value": [{"rec": 1}, {"rec": 2}]} |
I have lost track a bit on what we are trying to accomplish but this doesn't seem right to ignore an error when the meta argument starts peering into in an array instead of an object. Is this a TypeError on main or a KeyError? I feel like we are mixing the two up when we should not be
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |
thank you too. on second thought, the code does not actually require merging to pandas library. best regards |
doc/source/whatsnew/v2.2.2.rst
file if fixing a bug or adding a new feature.