Skip to content

Add new optional "separator" argument to json_normalize #14891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 52 commits into from
Closed

Add new optional "separator" argument to json_normalize #14891

wants to merge 52 commits into from

Conversation

jowens
Copy link

@jowens jowens commented Dec 16, 2016

@@ -744,6 +744,9 @@ def json_normalize(data, record_path=None, meta=None,
If True, prefix records with dotted (?) path, e.g. foo.bar.field if
path to records is ['foo', 'bar']
meta_prefix : string, default None
separator : string, default '.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a .. versionadded:: directive here.

Also, might be better to make separator the last keyword argument. That way it won't break people using all positional arguments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call it sep

add a version added tag

@@ -133,6 +133,36 @@ def test_shallow_nested(self):
expected = DataFrame(ex_data, columns=result.columns)
tm.assert_frame_equal(result, expected)

def test_shallow_nested_with_separator(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test can be simplified a lot. Could you do something like

result = json_normalize({"A": {"A": 1, "B": 2}}, separator='_')
expected = pd.DataFrame([[1, 2]], columns={"A_A", "A_B"})
assert_frame_equal(result, expected)

That way you're directly testing your change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add test with the default separator, and ensure that the columns are A.A, A.B.

@@ -84,6 +84,7 @@ Other enhancements
- ``pd.DataFrame.plot`` now prints a title above each subplot if ``suplots=True`` and ``title`` is a list of strings (:issue:`14753`)
- ``pd.Series.interpolate`` now supports timedelta as an index type with ``method='time'`` (:issue:`6424`)
- ``pandas.io.json.json_normalize()`` gained the option ``errors='ignore'|'raise'``; the default is ``errors='raise'`` which is backward compatible. (:issue:`14583`)
- ``pandas.io.json.json_normalize()`` gained the option ``separator=string``; the default is ``separator='.'`` which is backward compatible. (:issue:`14883`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about, ...gained separator option which accepts str, default is "."

@@ -828,7 +831,7 @@ def _pull_field(js, spec):
lengths = []

meta_vals = defaultdict(list)
meta_keys = ['.'.join(val) for val in meta]
meta_keys = [separator.join(val) for val in meta]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate whether separator is compat.string_types

['state', 'shortname',
['info', 'governor']],
separator='_')
ex_data = {'name': ['Dade', 'Broward', 'Palm Beach', 'Summit',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u also add unicode tests?

@sinhrks sinhrks added API Design IO JSON read_json, to_json, json_normalize labels Dec 16, 2016
@jowens
Copy link
Author

jowens commented Dec 16, 2016

I appreciate all of these comments, and implemented them. And then I found my code doesn't even work. I am doing something clearly wrong here—I thought it was to simply substitute

    meta_keys = [sep.join(val) for val in meta]

for

    meta_keys = ['.'.join(val) for val in meta]

It's not. That doesn't work. Happy to take any suggestions.

@jorisvandenbossche
Copy link
Member

And then I found my code doesn't even work.

I am not sure that is the reason, but in your last version, it is still separator.join instead of sep.join

gfyoung and others added 18 commits December 16, 2016 06:11
… parameters (#13936)

closes #13936

Author: Christopher C. Aycock <christopher.aycock@twosigma.com>

Closes #14783 from chrisaycock/GH13936 and squashes the following commits:

ffcf0c2 [Christopher C. Aycock] Added test to reject float16; fixed typos
1f208a8 [Christopher C. Aycock] Use tuple representation instead of strings
77eb47b [Christopher C. Aycock] Merge master branch into GH13936
89256f0 [Christopher C. Aycock] Test 8-bit integers and raise error on 16-bit floats; add comments
0ad1687 [Christopher C. Aycock] Fixed whatsnew
2bce3cc [Christopher C. Aycock] Revert dict back to PyObjectHashTable in response to code review
fafbb02 [Christopher C. Aycock] Updated benchmarks to reflect new ASV setup
5eeb7d9 [Christopher C. Aycock] Merge master into GH13936
c33c4cb [Christopher C. Aycock] Merge branch 'master' into GH13936
46cc309 [Christopher C. Aycock] Update documentation
f01142c [Christopher C. Aycock] Merge master branch
75157fc [Christopher C. Aycock] merge_asof() has type specializations and can take multiple 'by' parameters (#13936)
closes #7445

Author: Matt Roeschke <emailformattr@gmail.com>

Closes #14893 from mroeschke/test_7445 and squashes the following commits:

740cafe [Matt Roeschke] TST: to_json keeps column info with empty dataframe (#7445)
closes #12766
closes #12798

This is a follow on to #12798.

Author: Nate Yoder <nate@whistle.com>

Closes #14506 from nateyoder/index_map_index and squashes the following commits:

95e4440 [Nate Yoder] fix typo and add ref tag in whatsnew
b36e83c [Nate Yoder] update whatsnew, fix documentation
4635e6a [Nate Yoder] compare as index
a17ddab [Nate Yoder] Fix unused import and docstrings per pep8radius docformatter; change other uses of assert_index_equal to testing instead os self
ab168e7 [Nate Yoder] Update whatsnew and add git PR to tests to denote changes
504c2a2 [Nate Yoder] Fix tests that weren't run by PyCharm
23c133d [Nate Yoder] Update tests to match dtype int64
07b772a [Nate Yoder] use the numpy results if we can to avoid repeating the computation just to create the object
a110be9 [Nate Yoder] make map on time tseries indices return index if dtype of output is not a tseries; sphinx changes; fix docstring
a596744 [Nate Yoder] introspect results from map so that if the output array has tuples we create a multiindex instead of an index
5fc66c3 [Nate Yoder] make map return an index if it operates on an index, multi index, or categorical index; map on a categorical will either return a categorical or an index (rather than a numpy array)
Patches the following behaviour when `na_values` is passed in as a
dictionary:    1. Prevent aliasing in case `na_values` was defined in
a broader scope.  2. Respect column indices as keys when doing NA
conversions.    Closes #14203.

Author: gfyoung <gfyoung17@gmail.com>

Closes #14751 from gfyoung/csv-na-values-patching and squashes the following commits:

cac422c [gfyoung] BUG: Respect column indices for dict-like na_values
1439c27 [gfyoung] BUG: Prevent aliasing of dict na_values
closes #14813

Author: Ajay Saxena <aileronajay@gmail.com>

Closes #14817 from aileronajay/groupby_test_restructure and squashes the following commits:

860574d [Ajay Saxena] removed duplicate file
f6e1cda [Ajay Saxena] further split the tests
2cc0734 [Ajay Saxena] branched out the filter tests into a new file
Follow on to #14432 to catch the newly introduced `FutureWarning` in
the `test_groupby_multi_categorical_as_index` test case.

Author: Jon M. Mease <jon.mease@jhuapl.edu>

Closes #14902 from jmmease/GH14432_follow_on and squashes the following commits:

c30fa2b [Jon M. Mease] Trap warning introduced by GH14432 in test_groupby_multi_categorical_as_index
…th C engine (GH14874)

Follow up on #14576, which
refactored compression code to expand URL support.    Fixes up some
small remaining issues and adds a what's new entry.    - [x] Closes
#14874

Author: Daniel Himmelstein <daniel.himmelstein@gmail.com>

Closes #14880 from dhimmel/whats-new and squashes the following commits:

e1b5d42 [Daniel Himmelstein] Address what's new review comments
8568aed [Daniel Himmelstein] TST: Read bz2 files from S3 in PY2
09dcbff [Daniel Himmelstein] DOC: Improve what's new
c4ea3d3 [Daniel Himmelstein] STY: PEP8 fixes
f8a7900 [Daniel Himmelstein] TST: check bz2 compression in PY2 c engine
0e0fa0a [Daniel Himmelstein] DOC: Reword get_filepath_or_buffer docstring
210fb20 [Daniel Himmelstein] DOC: What's New for refactored compression code
cb91007 [Daniel Himmelstein] TST: Read compressed URLs with c engine
85630ea [Daniel Himmelstein] ENH: Support bz2 compression in PY2 for c engine
a7960f6 [Daniel Himmelstein] DOC: Improve _infer_compression docstring
Expands checked-add array addition introduced in
gh-14237 to include all other addition cases (i.e.
TimedeltaIndex and Timedelta). Follow-up to gh-14453.

In addition, move checked add function to core/algorithms.
closes #14894
Fix usage of fast_multiget with index which was always throwing an
exception that was then caught; add ASV that show slight improvement

Author: Nate Yoder <nate@whistle.com>

Closes #14895 from nateyoder/series_dict_index and squashes the following commits:

56be091 [Nate Yoder] Update whatsnew and fix pep8 issue
5f05fdc [Nate Yoder] Fix usage of fast_multiget with index which was always throwing an exception that was then caught; add ASV that show slight improvement
closes #14872

Author: Rodolfo Fernandez <opensourceworkAR@users.noreply.github.com>

Closes #14905 from RodolfoRFR/pandas-14872-e and squashes the following commits:

18802b4 [Rodolfo Fernandez] added 'self' to test_dtype_utc function in pandas/tests/series/test_missing
e0c6c7c [Rodolfo Fernandez] added line to whatsnew v0.19.2 and test to test_missing.py in series folder
e4ba7e0 [Rodolfo Fernandez] removed all references to _DATELIKE_DTYPES from /pandas/core/missing.py
5d37ce8 [Rodolfo Fernandez] added is_datetime64tz_dtype and changed evaluation from 'values' to dtype
19eecb2 [Rodolfo Fernandez] fixed style errors using flake8
59b91a1 [Rodolfo Fernandez] test modified
5a59eac [Rodolfo Fernandez] test modified
bc68bf7 [Rodolfo Fernandez] test modified
ba83fc8 [Rodolfo Fernandez] test
b7358de [Rodolfo Fernandez] bug fixed
…like accessor

this was some older code

Author: Jeff Reback <jeff@reback.net>

Closes #14909 from jreback/clean_datelike and squashes the following commits:

58ff2c4 [Jeff Reback] CLN: remove simple _DATELIKE_DTYPES test and replace with is_datetimelike accessor
gfyoung and others added 26 commits December 19, 2016 11:59
The hack used to resolve gh-2355 is no longer needed.
Removes the hack and patches several tests that relied
on this hacky (and buggy) behavior.

Closes gh-14881.
* MAINT: Only output errors in C style check

* Move cpplint install before checks
Explicit conversion to list for `percentiles`. Fixes the case where
`percentiles` is passed as a numpy with no median (0.5) present.
Closes #14908.

Author: pbreach <pbreach@uwo.ca>

Closes #14914 from pbreach/df-describe-percentile-ndarray-no-median and squashes the following commits:

5c8199b [pbreach] Minor test fix
b5d09a6 [pbreach] Added test for median insertion with ndarray
72fe0cb [pbreach] Added what's new entry
f954392 [pbreach] Moved conversion to if percentiles is not None
d192ac7 [pbreach] Fixed whitespace issue
a06794d [pbreach] BUG: Fixed bug in DataFrame.describe when percentiles are passed as array with no median
IdnexError and KeyError now bubble up appropriately.

closes #14554

Author: Chris Ham <chris@christopher-ham.com>

Closes #14912 from clham/gh14554-b and squashes the following commits:

458c0cc [Chris Ham] CLN: Resubmit of GH14700.  Fixes GH14554.  Errors other than IndexingError and KeyError now bubble up appropriately.
closes #14827

Author: Roger Thomas <roger.thomas@cremeglobal.com>
Author: Roger Thomas <roger.thomas87@gmail.com>

Closes #14842 from RogerThomas/fix_to_numeric_on_decimal_fields and squashes the following commits:

91d989b [Roger Thomas] Merge branch 'master' of github.com:pandas-dev/pandas into fix_to_numeric_on_decimal_fields
d7972d7 [Roger Thomas] Move isdecimal to internal api
1f1c62c [Roger Thomas] Add Test And Refactor is_decimal
f1b69da [Roger Thomas] Merge branch 'master' of github.com:pandas-dev/pandas into fix_to_numeric_on_decimal_fields
2d2488c [Roger Thomas] Fix To Numeric on Decimal Fields
Introduces a `UInt64HashTable` class to hash `uint64` elements and
prevent overflow in functions like `Series.unique`.    Closes #14721.

Author: gfyoung <gfyoung17@gmail.com>

Closes #14915 from gfyoung/uint64-hashtable-patch and squashes the following commits:

380c580 [gfyoung] BUG: Prevent uint64 overflow in Series.unique
Adds handling for `uint64` objects during conversion.  When negative
numbers and `uint64` are detected, we then convert the result to
`object`.    Picks up where #4845 left off. Closes #4471.

Author: gfyoung <gfyoung17@gmail.com>

Closes #14916 from gfyoung/convert-objects-uint64 and squashes the following commits:

ed325cd [gfyoung] BUG: Convert uint64 in maybe_convert_objects
Author: Jeff Reback <jeff@reback.net>

Closes #14925 from jreback/inference and squashes the following commits:

ff8ecd1 [Jeff Reback] PERF: make all inference routines cpdef bint
Fixed flake8 issues

Added blank csv file

Removing unneeded test
…lution

Closes #14826

Fix inconsistency in Partial String Index with 'second' resolution.
See #14826. Now if the timestamp and the index both have resolution
`second`, timestamp is considered as an exact match try and not a
slice. Therefore, for `Series`, scalar will be returned, for
`DataFrame` `KeyError` raised.

Author: Ilya V. Schurov <ilya@schurov.com>

Closes #14856 from ischurov/datetimeindex-slices and squashes the following commits:

2881a53 [Ilya V. Schurov] Merge branch 'datetimeindex-slices' of https://github.com/ischurov/pandas into datetimeindex-slices
ac8758e [Ilya V. Schurov] resolved merge conflict in whatsnew/v0.20.0.txt
0e87874 [Ilya V. Schurov] resolved merge conflict in whatsnew/v0.20.0.txt
0814e5b [Ilya V. Schurov] - Addressing code review: added reference to new docs section in whatsnew.
d215905 [Ilya V. Schurov] - Addressing code review: documentation clarification.
c287845 [Ilya V. Schurov] conflict PR #14856 resolved
40eddc3 [Ilya V. Schurov] - Documentation fixes
e17d210 [Ilya V. Schurov] - Whatsnew section added - Documentation section added
67e6bab [Ilya V. Schurov] Addressing code review: more comments added
c901588 [Ilya V. Schurov] Addressing code review: testing different combinations with the loop instead of copy-pasting of the code
9b55117 [Ilya V. Schurov] Addressing code review
b30039d [Ilya V. Schurov] Make flake8 happy.
cc86bdd [Ilya V. Schurov] Fix inconsistency in Partial String Index with 'second' resolution
ea51437 [Ilya V. Schurov] Made this code clearer.
xref #14918

Author: Jeff Reback <jeff@reback.net>

Closes #14928 from jreback/timez_construction and squashes the following commits:

3dd8e99 [Jeff Reback] BUG: bug in Series construction from UTC
closes #14930

Author: Jeff Reback <jeff@reback.net>

Closes #14933 from jreback/perf and squashes the following commits:

dc32b39 [Jeff Reback] PERF: fix getitem unique_check / initialization issue
Patches bug in read_msgpack in which
Series categoricals were accidentally
being constructed with a non-categorical
dtype, resulting in an error.

Closes gh-14901.
…user-specified, user-specified-with-unicode)
@jowens
Copy link
Author

jowens commented Dec 21, 2016

Thanks for the comments, which I've incorporated into a new pull request #14950 (edit). However, despite y'all pointing out some issues with my code, I am baffled why the simple replace-.-with-sep doesn't work, happy to take suggestions.

(Feel free to close this pull request whenever it's appropriate to do so; it's supplanted by the new one.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: json_normalize should allow a different separator than .