-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Add new optional "separator" argument to json_normalize #14891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -744,6 +744,9 @@ def json_normalize(data, record_path=None, meta=None, | |||
If True, prefix records with dotted (?) path, e.g. foo.bar.field if | |||
path to records is ['foo', 'bar'] | |||
meta_prefix : string, default None | |||
separator : string, default '.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a .. versionadded::
directive here.
Also, might be better to make separator
the last keyword argument. That way it won't break people using all positional arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call it sep
add a version added tag
@@ -133,6 +133,36 @@ def test_shallow_nested(self): | |||
expected = DataFrame(ex_data, columns=result.columns) | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
def test_shallow_nested_with_separator(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test can be simplified a lot. Could you do something like
result = json_normalize({"A": {"A": 1, "B": 2}}, separator='_')
expected = pd.DataFrame([[1, 2]], columns={"A_A", "A_B"})
assert_frame_equal(result, expected)
That way you're directly testing your change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add test with the default separator, and ensure that the columns are A.A, A.B
.
@@ -84,6 +84,7 @@ Other enhancements | |||
- ``pd.DataFrame.plot`` now prints a title above each subplot if ``suplots=True`` and ``title`` is a list of strings (:issue:`14753`) | |||
- ``pd.Series.interpolate`` now supports timedelta as an index type with ``method='time'`` (:issue:`6424`) | |||
- ``pandas.io.json.json_normalize()`` gained the option ``errors='ignore'|'raise'``; the default is ``errors='raise'`` which is backward compatible. (:issue:`14583`) | |||
- ``pandas.io.json.json_normalize()`` gained the option ``separator=string``; the default is ``separator='.'`` which is backward compatible. (:issue:`14883`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about, ...gained separator
option which accepts str
, default is "."
@@ -828,7 +831,7 @@ def _pull_field(js, spec): | |||
lengths = [] | |||
|
|||
meta_vals = defaultdict(list) | |||
meta_keys = ['.'.join(val) for val in meta] | |||
meta_keys = [separator.join(val) for val in meta] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validate whether separator
is compat.string_types
['state', 'shortname', | ||
['info', 'governor']], | ||
separator='_') | ||
ex_data = {'name': ['Dade', 'Broward', 'Palm Beach', 'Summit', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u also add unicode tests?
I appreciate all of these comments, and implemented them. And then I found my code doesn't even work. I am doing something clearly wrong here—I thought it was to simply substitute meta_keys = [sep.join(val) for val in meta] for meta_keys = ['.'.join(val) for val in meta] It's not. That doesn't work. Happy to take any suggestions. |
I am not sure that is the reason, but in your last version, it is still |
… parameters (#13936) closes #13936 Author: Christopher C. Aycock <christopher.aycock@twosigma.com> Closes #14783 from chrisaycock/GH13936 and squashes the following commits: ffcf0c2 [Christopher C. Aycock] Added test to reject float16; fixed typos 1f208a8 [Christopher C. Aycock] Use tuple representation instead of strings 77eb47b [Christopher C. Aycock] Merge master branch into GH13936 89256f0 [Christopher C. Aycock] Test 8-bit integers and raise error on 16-bit floats; add comments 0ad1687 [Christopher C. Aycock] Fixed whatsnew 2bce3cc [Christopher C. Aycock] Revert dict back to PyObjectHashTable in response to code review fafbb02 [Christopher C. Aycock] Updated benchmarks to reflect new ASV setup 5eeb7d9 [Christopher C. Aycock] Merge master into GH13936 c33c4cb [Christopher C. Aycock] Merge branch 'master' into GH13936 46cc309 [Christopher C. Aycock] Update documentation f01142c [Christopher C. Aycock] Merge master branch 75157fc [Christopher C. Aycock] merge_asof() has type specializations and can take multiple 'by' parameters (#13936)
closes #12766 closes #12798 This is a follow on to #12798. Author: Nate Yoder <nate@whistle.com> Closes #14506 from nateyoder/index_map_index and squashes the following commits: 95e4440 [Nate Yoder] fix typo and add ref tag in whatsnew b36e83c [Nate Yoder] update whatsnew, fix documentation 4635e6a [Nate Yoder] compare as index a17ddab [Nate Yoder] Fix unused import and docstrings per pep8radius docformatter; change other uses of assert_index_equal to testing instead os self ab168e7 [Nate Yoder] Update whatsnew and add git PR to tests to denote changes 504c2a2 [Nate Yoder] Fix tests that weren't run by PyCharm 23c133d [Nate Yoder] Update tests to match dtype int64 07b772a [Nate Yoder] use the numpy results if we can to avoid repeating the computation just to create the object a110be9 [Nate Yoder] make map on time tseries indices return index if dtype of output is not a tseries; sphinx changes; fix docstring a596744 [Nate Yoder] introspect results from map so that if the output array has tuples we create a multiindex instead of an index 5fc66c3 [Nate Yoder] make map return an index if it operates on an index, multi index, or categorical index; map on a categorical will either return a categorical or an index (rather than a numpy array)
Patches the following behaviour when `na_values` is passed in as a dictionary: 1. Prevent aliasing in case `na_values` was defined in a broader scope. 2. Respect column indices as keys when doing NA conversions. Closes #14203. Author: gfyoung <gfyoung17@gmail.com> Closes #14751 from gfyoung/csv-na-values-patching and squashes the following commits: cac422c [gfyoung] BUG: Respect column indices for dict-like na_values 1439c27 [gfyoung] BUG: Prevent aliasing of dict na_values
closes #14813 Author: Ajay Saxena <aileronajay@gmail.com> Closes #14817 from aileronajay/groupby_test_restructure and squashes the following commits: 860574d [Ajay Saxena] removed duplicate file f6e1cda [Ajay Saxena] further split the tests 2cc0734 [Ajay Saxena] branched out the filter tests into a new file
Follow on to #14432 to catch the newly introduced `FutureWarning` in the `test_groupby_multi_categorical_as_index` test case. Author: Jon M. Mease <jon.mease@jhuapl.edu> Closes #14902 from jmmease/GH14432_follow_on and squashes the following commits: c30fa2b [Jon M. Mease] Trap warning introduced by GH14432 in test_groupby_multi_categorical_as_index
…th C engine (GH14874) Follow up on #14576, which refactored compression code to expand URL support. Fixes up some small remaining issues and adds a what's new entry. - [x] Closes #14874 Author: Daniel Himmelstein <daniel.himmelstein@gmail.com> Closes #14880 from dhimmel/whats-new and squashes the following commits: e1b5d42 [Daniel Himmelstein] Address what's new review comments 8568aed [Daniel Himmelstein] TST: Read bz2 files from S3 in PY2 09dcbff [Daniel Himmelstein] DOC: Improve what's new c4ea3d3 [Daniel Himmelstein] STY: PEP8 fixes f8a7900 [Daniel Himmelstein] TST: check bz2 compression in PY2 c engine 0e0fa0a [Daniel Himmelstein] DOC: Reword get_filepath_or_buffer docstring 210fb20 [Daniel Himmelstein] DOC: What's New for refactored compression code cb91007 [Daniel Himmelstein] TST: Read compressed URLs with c engine 85630ea [Daniel Himmelstein] ENH: Support bz2 compression in PY2 for c engine a7960f6 [Daniel Himmelstein] DOC: Improve _infer_compression docstring
closes #14894 Fix usage of fast_multiget with index which was always throwing an exception that was then caught; add ASV that show slight improvement Author: Nate Yoder <nate@whistle.com> Closes #14895 from nateyoder/series_dict_index and squashes the following commits: 56be091 [Nate Yoder] Update whatsnew and fix pep8 issue 5f05fdc [Nate Yoder] Fix usage of fast_multiget with index which was always throwing an exception that was then caught; add ASV that show slight improvement
closes #14872 Author: Rodolfo Fernandez <opensourceworkAR@users.noreply.github.com> Closes #14905 from RodolfoRFR/pandas-14872-e and squashes the following commits: 18802b4 [Rodolfo Fernandez] added 'self' to test_dtype_utc function in pandas/tests/series/test_missing e0c6c7c [Rodolfo Fernandez] added line to whatsnew v0.19.2 and test to test_missing.py in series folder e4ba7e0 [Rodolfo Fernandez] removed all references to _DATELIKE_DTYPES from /pandas/core/missing.py 5d37ce8 [Rodolfo Fernandez] added is_datetime64tz_dtype and changed evaluation from 'values' to dtype 19eecb2 [Rodolfo Fernandez] fixed style errors using flake8 59b91a1 [Rodolfo Fernandez] test modified 5a59eac [Rodolfo Fernandez] test modified bc68bf7 [Rodolfo Fernandez] test modified ba83fc8 [Rodolfo Fernandez] test b7358de [Rodolfo Fernandez] bug fixed
…like accessor this was some older code Author: Jeff Reback <jeff@reback.net> Closes #14909 from jreback/clean_datelike and squashes the following commits: 58ff2c4 [Jeff Reback] CLN: remove simple _DATELIKE_DTYPES test and replace with is_datetimelike accessor
…datetimes with timezones (#14910)
* MAINT: Only output errors in C style check * Move cpplint install before checks
Explicit conversion to list for `percentiles`. Fixes the case where `percentiles` is passed as a numpy with no median (0.5) present. Closes #14908. Author: pbreach <pbreach@uwo.ca> Closes #14914 from pbreach/df-describe-percentile-ndarray-no-median and squashes the following commits: 5c8199b [pbreach] Minor test fix b5d09a6 [pbreach] Added test for median insertion with ndarray 72fe0cb [pbreach] Added what's new entry f954392 [pbreach] Moved conversion to if percentiles is not None d192ac7 [pbreach] Fixed whitespace issue a06794d [pbreach] BUG: Fixed bug in DataFrame.describe when percentiles are passed as array with no median
IdnexError and KeyError now bubble up appropriately. closes #14554 Author: Chris Ham <chris@christopher-ham.com> Closes #14912 from clham/gh14554-b and squashes the following commits: 458c0cc [Chris Ham] CLN: Resubmit of GH14700. Fixes GH14554. Errors other than IndexingError and KeyError now bubble up appropriately.
closes #14827 Author: Roger Thomas <roger.thomas@cremeglobal.com> Author: Roger Thomas <roger.thomas87@gmail.com> Closes #14842 from RogerThomas/fix_to_numeric_on_decimal_fields and squashes the following commits: 91d989b [Roger Thomas] Merge branch 'master' of github.com:pandas-dev/pandas into fix_to_numeric_on_decimal_fields d7972d7 [Roger Thomas] Move isdecimal to internal api 1f1c62c [Roger Thomas] Add Test And Refactor is_decimal f1b69da [Roger Thomas] Merge branch 'master' of github.com:pandas-dev/pandas into fix_to_numeric_on_decimal_fields 2d2488c [Roger Thomas] Fix To Numeric on Decimal Fields
Introduces a `UInt64HashTable` class to hash `uint64` elements and prevent overflow in functions like `Series.unique`. Closes #14721. Author: gfyoung <gfyoung17@gmail.com> Closes #14915 from gfyoung/uint64-hashtable-patch and squashes the following commits: 380c580 [gfyoung] BUG: Prevent uint64 overflow in Series.unique
Adds handling for `uint64` objects during conversion. When negative numbers and `uint64` are detected, we then convert the result to `object`. Picks up where #4845 left off. Closes #4471. Author: gfyoung <gfyoung17@gmail.com> Closes #14916 from gfyoung/convert-objects-uint64 and squashes the following commits: ed325cd [gfyoung] BUG: Convert uint64 in maybe_convert_objects
…lution Closes #14826 Fix inconsistency in Partial String Index with 'second' resolution. See #14826. Now if the timestamp and the index both have resolution `second`, timestamp is considered as an exact match try and not a slice. Therefore, for `Series`, scalar will be returned, for `DataFrame` `KeyError` raised. Author: Ilya V. Schurov <ilya@schurov.com> Closes #14856 from ischurov/datetimeindex-slices and squashes the following commits: 2881a53 [Ilya V. Schurov] Merge branch 'datetimeindex-slices' of https://github.com/ischurov/pandas into datetimeindex-slices ac8758e [Ilya V. Schurov] resolved merge conflict in whatsnew/v0.20.0.txt 0e87874 [Ilya V. Schurov] resolved merge conflict in whatsnew/v0.20.0.txt 0814e5b [Ilya V. Schurov] - Addressing code review: added reference to new docs section in whatsnew. d215905 [Ilya V. Schurov] - Addressing code review: documentation clarification. c287845 [Ilya V. Schurov] conflict PR #14856 resolved 40eddc3 [Ilya V. Schurov] - Documentation fixes e17d210 [Ilya V. Schurov] - Whatsnew section added - Documentation section added 67e6bab [Ilya V. Schurov] Addressing code review: more comments added c901588 [Ilya V. Schurov] Addressing code review: testing different combinations with the loop instead of copy-pasting of the code 9b55117 [Ilya V. Schurov] Addressing code review b30039d [Ilya V. Schurov] Make flake8 happy. cc86bdd [Ilya V. Schurov] Fix inconsistency in Partial String Index with 'second' resolution ea51437 [Ilya V. Schurov] Made this code clearer.
Patches bug in read_msgpack in which Series categoricals were accidentally being constructed with a non-categorical dtype, resulting in an error. Closes gh-14901.
…user-specified, user-specified-with-unicode)
…nto json_normalize-separator
Thanks for the comments, which I've incorporated into a new pull request #14950 (edit). However, despite y'all pointing out some issues with my code, I am baffled why the simple replace- (Feel free to close this pull request whenever it's appropriate to do so; it's supplanted by the new one.) |
test_shallow_nested_with_separator
)git diff upstream/master | flake8 --diff