Add new optional "separator" argument to json_normalize #14891

jowens · 2016-12-16T00:23:59Z

closes ENH: json_normalize should allow a different separator than . #14883
tests added / passed (added test test_shallow_nested_with_separator)
passes git diff upstream/master | flake8 --diff
whatsnew entry (v0.20.0)

…ional argument

TomAugspurger · 2016-12-16T01:26:01Z

pandas/io/json.py

@@ -744,6 +744,9 @@ def json_normalize(data, record_path=None, meta=None,
        If True, prefix records with dotted (?) path, e.g. foo.bar.field if
        path to records is ['foo', 'bar']
    meta_prefix : string, default None
+    separator : string, default '.'


Can you add a .. versionadded:: directive here.

Also, might be better to make separator the last keyword argument. That way it won't break people using all positional arguments.

call it sep

add a version added tag

TomAugspurger · 2016-12-16T01:33:39Z

pandas/io/tests/json/test_json_norm.py

@@ -133,6 +133,36 @@ def test_shallow_nested(self):
        expected = DataFrame(ex_data, columns=result.columns)
        tm.assert_frame_equal(result, expected)

+    def test_shallow_nested_with_separator(self):


I think this test can be simplified a lot. Could you do something like

result = json_normalize({"A": {"A": 1, "B": 2}}, separator='_') expected = pd.DataFrame([[1, 2]], columns={"A_A", "A_B"}) assert_frame_equal(result, expected)

That way you're directly testing your change.

Also add test with the default separator, and ensure that the columns are A.A, A.B.

… tag

sinhrks · 2016-12-16T03:46:47Z

doc/source/whatsnew/v0.20.0.txt

@@ -84,6 +84,7 @@ Other enhancements
 - ``pd.DataFrame.plot`` now prints a title above each subplot if ``suplots=True`` and ``title`` is a list of strings (:issue:`14753`)
 - ``pd.Series.interpolate`` now supports timedelta as an index type with ``method='time'`` (:issue:`6424`)
 - ``pandas.io.json.json_normalize()`` gained the option ``errors='ignore'|'raise'``; the default is ``errors='raise'`` which is backward compatible. (:issue:`14583`)
+- ``pandas.io.json.json_normalize()`` gained the option ``separator=string``; the default is ``separator='.'`` which is backward compatible. (:issue:`14883`)


How about, ...gained separator option which accepts str, default is "."

sinhrks · 2016-12-16T03:47:57Z

pandas/io/json.py

@@ -828,7 +831,7 @@ def _pull_field(js, spec):
    lengths = []

    meta_vals = defaultdict(list)
-    meta_keys = ['.'.join(val) for val in meta]
+    meta_keys = [separator.join(val) for val in meta]


validate whether separator is compat.string_types

sinhrks · 2016-12-16T03:48:23Z

pandas/io/tests/json/test_json_norm.py

+                                ['state', 'shortname',
+                                 ['info', 'governor']],
+                                separator='_')
+        ex_data = {'name': ['Dade', 'Broward', 'Palm Beach', 'Summit',


can u also add unicode tests?

jowens · 2016-12-16T05:32:36Z

I appreciate all of these comments, and implemented them. And then I found my code doesn't even work. I am doing something clearly wrong here—I thought it was to simply substitute

    meta_keys = [sep.join(val) for val in meta]

for

    meta_keys = ['.'.join(val) for val in meta]

It's not. That doesn't work. Happy to take any suggestions.

…687) (#14886)

jorisvandenbossche · 2016-12-16T10:05:55Z

And then I found my code doesn't even work.

I am not sure that is the reason, but in your last version, it is still separator.join instead of sep.join

… parameters (#13936) closes #13936 Author: Christopher C. Aycock <christopher.aycock@twosigma.com> Closes #14783 from chrisaycock/GH13936 and squashes the following commits: ffcf0c2 [Christopher C. Aycock] Added test to reject float16; fixed typos 1f208a8 [Christopher C. Aycock] Use tuple representation instead of strings 77eb47b [Christopher C. Aycock] Merge master branch into GH13936 89256f0 [Christopher C. Aycock] Test 8-bit integers and raise error on 16-bit floats; add comments 0ad1687 [Christopher C. Aycock] Fixed whatsnew 2bce3cc [Christopher C. Aycock] Revert dict back to PyObjectHashTable in response to code review fafbb02 [Christopher C. Aycock] Updated benchmarks to reflect new ASV setup 5eeb7d9 [Christopher C. Aycock] Merge master into GH13936 c33c4cb [Christopher C. Aycock] Merge branch 'master' into GH13936 46cc309 [Christopher C. Aycock] Update documentation f01142c [Christopher C. Aycock] Merge master branch 75157fc [Christopher C. Aycock] merge_asof() has type specializations and can take multiple 'by' parameters (#13936)

closes #7445 Author: Matt Roeschke <emailformattr@gmail.com> Closes #14893 from mroeschke/test_7445 and squashes the following commits: 740cafe [Matt Roeschke] TST: to_json keeps column info with empty dataframe (#7445)

closes #12766 closes #12798 This is a follow on to #12798. Author: Nate Yoder <nate@whistle.com> Closes #14506 from nateyoder/index_map_index and squashes the following commits: 95e4440 [Nate Yoder] fix typo and add ref tag in whatsnew b36e83c [Nate Yoder] update whatsnew, fix documentation 4635e6a [Nate Yoder] compare as index a17ddab [Nate Yoder] Fix unused import and docstrings per pep8radius docformatter; change other uses of assert_index_equal to testing instead os self ab168e7 [Nate Yoder] Update whatsnew and add git PR to tests to denote changes 504c2a2 [Nate Yoder] Fix tests that weren't run by PyCharm 23c133d [Nate Yoder] Update tests to match dtype int64 07b772a [Nate Yoder] use the numpy results if we can to avoid repeating the computation just to create the object a110be9 [Nate Yoder] make map on time tseries indices return index if dtype of output is not a tseries; sphinx changes; fix docstring a596744 [Nate Yoder] introspect results from map so that if the output array has tuples we create a multiindex instead of an index 5fc66c3 [Nate Yoder] make map return an index if it operates on an index, multi index, or categorical index; map on a categorical will either return a categorical or an index (rather than a numpy array)

Patches the following behaviour when `na_values` is passed in as a dictionary: 1. Prevent aliasing in case `na_values` was defined in a broader scope. 2. Respect column indices as keys when doing NA conversions. Closes #14203. Author: gfyoung <gfyoung17@gmail.com> Closes #14751 from gfyoung/csv-na-values-patching and squashes the following commits: cac422c [gfyoung] BUG: Respect column indices for dict-like na_values 1439c27 [gfyoung] BUG: Prevent aliasing of dict na_values

closes #14813 Author: Ajay Saxena <aileronajay@gmail.com> Closes #14817 from aileronajay/groupby_test_restructure and squashes the following commits: 860574d [Ajay Saxena] removed duplicate file f6e1cda [Ajay Saxena] further split the tests 2cc0734 [Ajay Saxena] branched out the filter tests into a new file

Follow on to #14432 to catch the newly introduced `FutureWarning` in the `test_groupby_multi_categorical_as_index` test case. Author: Jon M. Mease <jon.mease@jhuapl.edu> Closes #14902 from jmmease/GH14432_follow_on and squashes the following commits: c30fa2b [Jon M. Mease] Trap warning introduced by GH14432 in test_groupby_multi_categorical_as_index

…th C engine (GH14874) Follow up on #14576, which refactored compression code to expand URL support. Fixes up some small remaining issues and adds a what's new entry. - [x] Closes #14874 Author: Daniel Himmelstein <daniel.himmelstein@gmail.com> Closes #14880 from dhimmel/whats-new and squashes the following commits: e1b5d42 [Daniel Himmelstein] Address what's new review comments 8568aed [Daniel Himmelstein] TST: Read bz2 files from S3 in PY2 09dcbff [Daniel Himmelstein] DOC: Improve what's new c4ea3d3 [Daniel Himmelstein] STY: PEP8 fixes f8a7900 [Daniel Himmelstein] TST: check bz2 compression in PY2 c engine 0e0fa0a [Daniel Himmelstein] DOC: Reword get_filepath_or_buffer docstring 210fb20 [Daniel Himmelstein] DOC: What's New for refactored compression code cb91007 [Daniel Himmelstein] TST: Read compressed URLs with c engine 85630ea [Daniel Himmelstein] ENH: Support bz2 compression in PY2 for c engine a7960f6 [Daniel Himmelstein] DOC: Improve _infer_compression docstring

)

Expands checked-add array addition introduced in gh-14237 to include all other addition cases (i.e. TimedeltaIndex and Timedelta). Follow-up to gh-14453. In addition, move checked add function to core/algorithms.

closes #14894 Fix usage of fast_multiget with index which was always throwing an exception that was then caught; add ASV that show slight improvement Author: Nate Yoder <nate@whistle.com> Closes #14895 from nateyoder/series_dict_index and squashes the following commits: 56be091 [Nate Yoder] Update whatsnew and fix pep8 issue 5f05fdc [Nate Yoder] Fix usage of fast_multiget with index which was always throwing an exception that was then caught; add ASV that show slight improvement

closes #14872 Author: Rodolfo Fernandez <opensourceworkAR@users.noreply.github.com> Closes #14905 from RodolfoRFR/pandas-14872-e and squashes the following commits: 18802b4 [Rodolfo Fernandez] added 'self' to test_dtype_utc function in pandas/tests/series/test_missing e0c6c7c [Rodolfo Fernandez] added line to whatsnew v0.19.2 and test to test_missing.py in series folder e4ba7e0 [Rodolfo Fernandez] removed all references to _DATELIKE_DTYPES from /pandas/core/missing.py 5d37ce8 [Rodolfo Fernandez] added is_datetime64tz_dtype and changed evaluation from 'values' to dtype 19eecb2 [Rodolfo Fernandez] fixed style errors using flake8 59b91a1 [Rodolfo Fernandez] test modified 5a59eac [Rodolfo Fernandez] test modified bc68bf7 [Rodolfo Fernandez] test modified ba83fc8 [Rodolfo Fernandez] test b7358de [Rodolfo Fernandez] bug fixed

…like accessor this was some older code Author: Jeff Reback <jeff@reback.net> Closes #14909 from jreback/clean_datelike and squashes the following commits: 58ff2c4 [Jeff Reback] CLN: remove simple _DATELIKE_DTYPES test and replace with is_datetimelike accessor

…datetimes with timezones (#14910)

The hack used to resolve gh-2355 is no longer needed. Removes the hack and patches several tests that relied on this hacky (and buggy) behavior. Closes gh-14881.

* MAINT: Only output errors in C style check * Move cpplint install before checks

Explicit conversion to list for `percentiles`. Fixes the case where `percentiles` is passed as a numpy with no median (0.5) present. Closes #14908. Author: pbreach <pbreach@uwo.ca> Closes #14914 from pbreach/df-describe-percentile-ndarray-no-median and squashes the following commits: 5c8199b [pbreach] Minor test fix b5d09a6 [pbreach] Added test for median insertion with ndarray 72fe0cb [pbreach] Added what's new entry f954392 [pbreach] Moved conversion to if percentiles is not None d192ac7 [pbreach] Fixed whitespace issue a06794d [pbreach] BUG: Fixed bug in DataFrame.describe when percentiles are passed as array with no median

IdnexError and KeyError now bubble up appropriately. closes #14554 Author: Chris Ham <chris@christopher-ham.com> Closes #14912 from clham/gh14554-b and squashes the following commits: 458c0cc [Chris Ham] CLN: Resubmit of GH14700. Fixes GH14554. Errors other than IndexingError and KeyError now bubble up appropriately.

closes #14827 Author: Roger Thomas <roger.thomas@cremeglobal.com> Author: Roger Thomas <roger.thomas87@gmail.com> Closes #14842 from RogerThomas/fix_to_numeric_on_decimal_fields and squashes the following commits: 91d989b [Roger Thomas] Merge branch 'master' of github.com:pandas-dev/pandas into fix_to_numeric_on_decimal_fields d7972d7 [Roger Thomas] Move isdecimal to internal api 1f1c62c [Roger Thomas] Add Test And Refactor is_decimal f1b69da [Roger Thomas] Merge branch 'master' of github.com:pandas-dev/pandas into fix_to_numeric_on_decimal_fields 2d2488c [Roger Thomas] Fix To Numeric on Decimal Fields

Introduces a `UInt64HashTable` class to hash `uint64` elements and prevent overflow in functions like `Series.unique`. Closes #14721. Author: gfyoung <gfyoung17@gmail.com> Closes #14915 from gfyoung/uint64-hashtable-patch and squashes the following commits: 380c580 [gfyoung] BUG: Prevent uint64 overflow in Series.unique

Adds handling for `uint64` objects during conversion. When negative numbers and `uint64` are detected, we then convert the result to `object`. Picks up where #4845 left off. Closes #4471. Author: gfyoung <gfyoung17@gmail.com> Closes #14916 from gfyoung/convert-objects-uint64 and squashes the following commits: ed325cd [gfyoung] BUG: Convert uint64 in maybe_convert_objects

Author: Jeff Reback <jeff@reback.net> Closes #14925 from jreback/inference and squashes the following commits: ff8ecd1 [Jeff Reback] PERF: make all inference routines cpdef bint

Fixed flake8 issues Added blank csv file Removing unneeded test

…lution Closes #14826 Fix inconsistency in Partial String Index with 'second' resolution. See #14826. Now if the timestamp and the index both have resolution `second`, timestamp is considered as an exact match try and not a slice. Therefore, for `Series`, scalar will be returned, for `DataFrame` `KeyError` raised. Author: Ilya V. Schurov <ilya@schurov.com> Closes #14856 from ischurov/datetimeindex-slices and squashes the following commits: 2881a53 [Ilya V. Schurov] Merge branch 'datetimeindex-slices' of https://github.com/ischurov/pandas into datetimeindex-slices ac8758e [Ilya V. Schurov] resolved merge conflict in whatsnew/v0.20.0.txt 0e87874 [Ilya V. Schurov] resolved merge conflict in whatsnew/v0.20.0.txt 0814e5b [Ilya V. Schurov] - Addressing code review: added reference to new docs section in whatsnew. d215905 [Ilya V. Schurov] - Addressing code review: documentation clarification. c287845 [Ilya V. Schurov] conflict PR #14856 resolved 40eddc3 [Ilya V. Schurov] - Documentation fixes e17d210 [Ilya V. Schurov] - Whatsnew section added - Documentation section added 67e6bab [Ilya V. Schurov] Addressing code review: more comments added c901588 [Ilya V. Schurov] Addressing code review: testing different combinations with the loop instead of copy-pasting of the code 9b55117 [Ilya V. Schurov] Addressing code review b30039d [Ilya V. Schurov] Make flake8 happy. cc86bdd [Ilya V. Schurov] Fix inconsistency in Partial String Index with 'second' resolution ea51437 [Ilya V. Schurov] Made this code clearer.

xref #14918 Author: Jeff Reback <jeff@reback.net> Closes #14928 from jreback/timez_construction and squashes the following commits: 3dd8e99 [Jeff Reback] BUG: bug in Series construction from UTC

Add test for dropna = True

closes #14930 Author: Jeff Reback <jeff@reback.net> Closes #14933 from jreback/perf and squashes the following commits: dc32b39 [Jeff Reback] PERF: fix getitem unique_check / initialization issue

Patches bug in read_msgpack in which Series categoricals were accidentally being constructed with a non-categorical dtype, resulting in an error. Closes gh-14901.

closes #13202 closes #14943

…ional argument

… tag

…ypes

…user-specified, user-specified-with-unicode)

…nto json_normalize-separator

jowens · 2016-12-21T23:41:04Z

Thanks for the comments, which I've incorporated into a new pull request #14950 (edit). However, despite y'all pointing out some issues with my code, I am baffled why the simple replace-.-with-sep doesn't work, happy to take suggestions.

(Feel free to close this pull request whenever it's appropriate to do so; it's supplanted by the new one.)

jowens added 3 commits December 15, 2016 15:53

added 'separator' argument to json_normalize

457019b

test for json_normalize argument 'separator'

c345d6d

added new enhancement: json_normalize now takes 'separator' as an opt…

def361d

…ional argument

TomAugspurger requested changes Dec 16, 2016

View reviewed changes

rename json_normalize arg separator to sep, simpler test, add version…

fac9ac1

… tag

sinhrks suggested changes Dec 16, 2016

View reviewed changes

sinhrks added API Design IO JSON read_json, to_json, json_normalize labels Dec 16, 2016

smsaladi and others added 2 commits December 16, 2016 10:18

DOC: fixed typo (#14892)

5f777f4

BUG: regression in DataFrame.combine_first with integer columns (GH14…

992dfbc

…687) (#14886)

gfyoung and others added 18 commits December 16, 2016 06:11

DOC: Add documentation about cpplint (#14890)

2083f0d

BLD: swap 3.6-dev and 3.4 builds, reorg build order (#14899)

d1b1720

TST: to_json keeps column info with empty dataframe (#7445)

2566223

closes #7445 Author: Matt Roeschke <emailformattr@gmail.com> Closes #14893 from mroeschke/test_7445 and squashes the following commits: 740cafe [Matt Roeschke] TST: to_json keeps column info with empty dataframe (#7445)

TST: Test datetime array assignment with different units (#7492) (#14884

906b51a

)

BUG: Prevent addition overflow with TimedeltaIndex (#14816)

bdbebc4

Expands checked-add array addition introduced in gh-14237 to include all other addition cases (i.e. TimedeltaIndex and Timedelta). Follow-up to gh-14453. In addition, move checked add function to core/algorithms.

TST: Test timedelta arithmetic (#9396) (#14906)

37b22c7

TST: Groupby/transform with grouped NaN (#9941) (#14907)

a718962

ENH: select_dtypes now allows 'datetimetz' for generically selecting …

8b98104

…datetimes with timezones (#14910)

TST:Test to_sparse with nan dataframe (#10079) (#14913)

8c798c0

gfyoung and others added 26 commits December 19, 2016 11:59

BUG: Don't convert uint64 to object in DataFrame init (#14917)

0ac3d98

The hack used to resolve gh-2355 is no longer needed. Removes the hack and patches several tests that relied on this hacky (and buggy) behavior. Closes gh-14881.

MAINT: Only output errors in C style check (#14924)

f11501a

* MAINT: Only output errors in C style check * Move cpplint install before checks

PERF: make all inference routines cpdef bint

3ab0e55

Author: Jeff Reback <jeff@reback.net> Closes #14925 from jreback/inference and squashes the following commits: ff8ecd1 [Jeff Reback] PERF: make all inference routines cpdef bint

TST: Test empty input for read_csv (#14867) (#14920)

02906ce

Fixed flake8 issues Added blank csv file Removing unneeded test

BUG: bug in Series construction from UTC

24fb26d

xref #14918 Author: Jeff Reback <jeff@reback.net> Closes #14928 from jreback/timez_construction and squashes the following commits: 3dd8e99 [Jeff Reback] BUG: bug in Series construction from UTC

DOC: cleanup of timeseries.rst

708792a

TST: Groupby.filter dropna=False with empty group (#10780) (#14926)

3ab369c

Add test for dropna = True

DOC: small edits in timeseries.rst

1678f14

cache and remove boxing (#14931)

4c3d4d4

DOC: whatsnew 0.20 and timeseries doc fixes

0a7cd97

PERF: fix getitem unique_check / initialization issue

07c83ee

closes #14930 Author: Jeff Reback <jeff@reback.net> Closes #14933 from jreback/perf and squashes the following commits: dc32b39 [Jeff Reback] PERF: fix getitem unique_check / initialization issue

BUG: Properly read Categorical msgpacks (#14918)

73e2829

Patches bug in read_msgpack in which Series categoricals were accidentally being constructed with a non-categorical dtype, resulting in an error. Closes gh-14901.

DOC: Pandas Cheat Sheet

f79bc7a

closes #13202 closes #14943

added 'separator' argument to json_normalize

a06e32a

test for json_normalize argument 'separator'

dcc4632

added new enhancement: json_normalize now takes 'separator' as an opt…

2363314

…ional argument

rename json_normalize arg separator to sep, simpler test, add version…

8e0faa8

… tag

json_normalize's separator is now sep, also does a check for string_t…

521720d

…ypes

simpler and better tests for json_normalize with separator (default, …

74c4285

…user-specified, user-specified-with-unicode)

Merge branch 'json_normalize-separator' of github.com:jowens/pandas i…

8b72b12

…nto json_normalize-separator

jowens mentioned this pull request Dec 21, 2016

added 'separator' argument to json_normalize #14949

Closed

4 tasks

jowens mentioned this pull request Dec 22, 2016

ENH: GH14883: json_normalize now takes a user-specified separator #14950

Closed

4 tasks

jowens closed this Dec 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add new optional "separator" argument to json_normalize #14891

Add new optional "separator" argument to json_normalize #14891

Uh oh!

jowens commented Dec 16, 2016

Uh oh!

TomAugspurger Dec 16, 2016

Uh oh!

jreback Dec 16, 2016

Uh oh!

TomAugspurger Dec 16, 2016

Uh oh!

TomAugspurger Dec 16, 2016

Uh oh!

sinhrks Dec 16, 2016

Uh oh!

sinhrks Dec 16, 2016

Uh oh!

sinhrks Dec 16, 2016

Uh oh!

jowens commented Dec 16, 2016

Uh oh!

jorisvandenbossche commented Dec 16, 2016

Uh oh!

jowens commented Dec 21, 2016 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Add new optional "separator" argument to json_normalize #14891

Add new optional "separator" argument to json_normalize #14891

Uh oh!

Conversation

jowens commented Dec 16, 2016

Uh oh!

TomAugspurger Dec 16, 2016

Choose a reason for hiding this comment

Uh oh!

jreback Dec 16, 2016

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Dec 16, 2016

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Dec 16, 2016

Choose a reason for hiding this comment

Uh oh!

sinhrks Dec 16, 2016

Choose a reason for hiding this comment

Uh oh!

sinhrks Dec 16, 2016

Choose a reason for hiding this comment

Uh oh!

sinhrks Dec 16, 2016

Choose a reason for hiding this comment

Uh oh!

jowens commented Dec 16, 2016

Uh oh!

jorisvandenbossche commented Dec 16, 2016

Uh oh!

jowens commented Dec 21, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jowens commented Dec 21, 2016 •

edited

Loading