Skip to content

Add new optional "separator" argument to json_normalize #14891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 52 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
457019b
added 'separator' argument to json_normalize
jowens Dec 15, 2016
c345d6d
test for json_normalize argument 'separator'
jowens Dec 16, 2016
def361d
added new enhancement: json_normalize now takes 'separator' as an opt…
jowens Dec 16, 2016
fac9ac1
rename json_normalize arg separator to sep, simpler test, add version…
jowens Dec 16, 2016
5f777f4
DOC: fixed typo (#14892)
smsaladi Dec 16, 2016
992dfbc
BUG: regression in DataFrame.combine_first with integer columns (GH14…
jorisvandenbossche Dec 16, 2016
2083f0d
DOC: Add documentation about cpplint (#14890)
gfyoung Dec 16, 2016
d1b1720
BLD: swap 3.6-dev and 3.4 builds, reorg build order (#14899)
jreback Dec 16, 2016
e7df751
ENH: merge_asof() has type specializations and can take multiple 'by'…
Dec 16, 2016
2566223
TST: to_json keeps column info with empty dataframe (#7445)
mroeschke Dec 16, 2016
6f4e36a
API: map() on Index returns an Index, not array
nateyoder Dec 16, 2016
dd8cba2
BUG: Patch read_csv NA values behaviour
gfyoung Dec 16, 2016
73bc6cf
Groupby tests restructure
aileronajay Dec 17, 2016
f5c8d54
Catch warning introduced by GH14432 in test case
Dec 17, 2016
e80a2b9
DOC for refactored compression (GH14576) + BUG: bz2-compressed URL wi…
dhimmel Dec 17, 2016
906b51a
TST: Test datetime array assignment with different units (#7492) (#14…
mroeschke Dec 17, 2016
bdbebc4
BUG: Prevent addition overflow with TimedeltaIndex (#14816)
gfyoung Dec 17, 2016
e503d40
Clean up construction of Series with dictionary and datetime index
nateyoder Dec 17, 2016
f3c5a42
BUG: .fillna() for datetime64 with tz is passing thru floats
opensourceworkAR Dec 18, 2016
37b22c7
TST: Test timedelta arithmetic (#9396) (#14906)
mroeschke Dec 18, 2016
a718962
TST: Groupby/transform with grouped NaN (#9941) (#14907)
mroeschke Dec 18, 2016
f1cfe5b
CLN: remove simple _DATELIKE_DTYPES test and replace with is_datetime…
jreback Dec 18, 2016
8b98104
ENH: select_dtypes now allows 'datetimetz' for generically selecting …
jreback Dec 19, 2016
8c798c0
TST:Test to_sparse with nan dataframe (#10079) (#14913)
mroeschke Dec 19, 2016
dc4b070
COMPAT/REF: Use s3fs for s3 IO
TomAugspurger Dec 19, 2016
39efbbc
CLN: move unique1d to algorithms from nanops (#14919)
jreback Dec 19, 2016
0ac3d98
BUG: Don't convert uint64 to object in DataFrame init (#14917)
gfyoung Dec 19, 2016
f11501a
MAINT: Only output errors in C style check (#14924)
gfyoung Dec 19, 2016
8e630b6
BUG: Fixed DataFrame.describe percentiles are ndarray w/ no median
pbreach Dec 19, 2016
3ccb501
CLN: Resubmit of GH14700. Fixes GH14554. Errors other than Indexing…
clham Dec 19, 2016
5faf32a
BUG: Fix to numeric on decimal fields
Dec 20, 2016
b35c689
BUG: Prevent uint64 overflow in Series.unique
gfyoung Dec 20, 2016
0c52813
BUG: Convert uint64 in maybe_convert_objects
gfyoung Dec 20, 2016
3ab0e55
PERF: make all inference routines cpdef bint
jreback Dec 20, 2016
02906ce
TST: Test empty input for read_csv (#14867) (#14920)
jeffcarey Dec 20, 2016
50930a9
API/BUG: Fix inconsistency in Partial String Index with 'second' reso…
ischurov Dec 20, 2016
24fb26d
BUG: bug in Series construction from UTC
jreback Dec 20, 2016
708792a
DOC: cleanup of timeseries.rst
jreback Dec 20, 2016
3ab369c
TST: Groupby.filter dropna=False with empty group (#10780) (#14926)
mroeschke Dec 20, 2016
1678f14
DOC: small edits in timeseries.rst
jreback Dec 21, 2016
4c3d4d4
cache and remove boxing (#14931)
MaximilianR Dec 21, 2016
0a7cd97
DOC: whatsnew 0.20 and timeseries doc fixes
jreback Dec 21, 2016
07c83ee
PERF: fix getitem unique_check / initialization issue
jreback Dec 21, 2016
73e2829
BUG: Properly read Categorical msgpacks (#14918)
gfyoung Dec 21, 2016
f79bc7a
DOC: Pandas Cheat Sheet
Dr-Irv Dec 21, 2016
a06e32a
added 'separator' argument to json_normalize
jowens Dec 15, 2016
dcc4632
test for json_normalize argument 'separator'
jowens Dec 16, 2016
2363314
added new enhancement: json_normalize now takes 'separator' as an opt…
jowens Dec 16, 2016
8e0faa8
rename json_normalize arg separator to sep, simpler test, add version…
jowens Dec 16, 2016
521720d
json_normalize's separator is now sep, also does a check for string_t…
jowens Dec 21, 2016
74c4285
simpler and better tests for json_normalize with separator (default, …
jowens Dec 21, 2016
8b72b12
Merge branch 'json_normalize-separator' of github.com:jowens/pandas i…
jowens Dec 21, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 58 additions & 57 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,19 +66,6 @@ matrix:
apt:
packages:
- python-gtk2
- python: 3.4
env:
- PYTHON_VERSION=3.4
- JOB_NAME: "34_nslow"
- NOSE_ARGS="not slow and not disabled"
- FULL_DEPS=true
- CLIPBOARD=xsel
- CACHE_NAME="34_nslow"
- USE_CACHE=true
addons:
apt:
packages:
- xsel
- python: 3.5
env:
- PYTHON_VERSION=3.5
Expand All @@ -93,6 +80,33 @@ matrix:
apt:
packages:
- xsel
- python: 3.6-dev
env:
- PYTHON_VERSION=3.6
- JOB_NAME: "36_dev"
- JOB_TAG=_DEV
- NOSE_ARGS="not slow and not network and not disabled"
- PANDAS_TESTING_MODE="deprecate"
addons:
apt:
packages:
- libatlas-base-dev
- gfortran
# In allow_failures
- python: 2.7
env:
- PYTHON_VERSION=2.7
- JOB_NAME: "27_nslow_nnet_COMPAT"
- NOSE_ARGS="not slow and not network and not disabled"
- LOCALE_OVERRIDE="it_IT.UTF-8"
- INSTALL_TEST=true
- JOB_TAG=_COMPAT
- CACHE_NAME="27_nslow_nnet_COMPAT"
- USE_CACHE=true
addons:
apt:
packages:
- language-pack-it
# In allow_failures
- python: 2.7
env:
Expand All @@ -103,45 +117,46 @@ matrix:
- FULL_DEPS=true
- CACHE_NAME="27_slow"
- USE_CACHE=true
# In allow_failures
- python: 2.7
env:
- PYTHON_VERSION=2.7
- JOB_NAME: "27_build_test_conda"
- JOB_TAG=_BUILD_TEST
- NOSE_ARGS="not slow and not disabled"
- FULL_DEPS=true
- BUILD_TEST=true
- CACHE_NAME="27_build_test_conda"
- USE_CACHE=true
# In allow_failures
- python: 3.4
env:
- PYTHON_VERSION=3.4
- JOB_NAME: "34_slow"
- JOB_TAG=_SLOW
- NOSE_ARGS="slow and not network and not disabled"
- JOB_NAME: "34_nslow"
- NOSE_ARGS="not slow and not disabled"
- FULL_DEPS=true
- CLIPBOARD=xsel
- CACHE_NAME="34_slow"
- CACHE_NAME="34_nslow"
- USE_CACHE=true
addons:
apt:
packages:
- xsel
# In allow_failures
- python: 2.7
- python: 3.4
env:
- PYTHON_VERSION=2.7
- JOB_NAME: "27_build_test_conda"
- JOB_TAG=_BUILD_TEST
- NOSE_ARGS="not slow and not disabled"
- PYTHON_VERSION=3.4
- JOB_NAME: "34_slow"
- JOB_TAG=_SLOW
- NOSE_ARGS="slow and not network and not disabled"
- FULL_DEPS=true
- BUILD_TEST=true
- CACHE_NAME="27_build_test_conda"
- CLIPBOARD=xsel
- CACHE_NAME="34_slow"
- USE_CACHE=true
# In allow_failures
- python: 3.6-dev
env:
- PYTHON_VERSION=3.6
- JOB_NAME: "36_dev"
- JOB_TAG=_DEV
- NOSE_ARGS="not slow and not network and not disabled"
- PANDAS_TESTING_MODE="deprecate"
addons:
apt:
packages:
- libatlas-base-dev
- gfortran
- xsel
# In allow_failures
- python: 3.5
env:
Expand All @@ -157,21 +172,6 @@ matrix:
packages:
- libatlas-base-dev
- gfortran
# In allow_failures
- python: 2.7
env:
- PYTHON_VERSION=2.7
- JOB_NAME: "27_nslow_nnet_COMPAT"
- NOSE_ARGS="not slow and not network and not disabled"
- LOCALE_OVERRIDE="it_IT.UTF-8"
- INSTALL_TEST=true
- JOB_TAG=_COMPAT
- CACHE_NAME="27_nslow_nnet_COMPAT"
- USE_CACHE=true
addons:
apt:
packages:
- language-pack-it
# In allow_failures
- python: 3.5
env:
Expand Down Expand Up @@ -226,18 +226,19 @@ matrix:
- BUILD_TEST=true
- CACHE_NAME="27_build_test_conda"
- USE_CACHE=true
- python: 3.6-dev
- python: 3.4
env:
- PYTHON_VERSION=3.6
- JOB_NAME: "36_dev"
- JOB_TAG=_DEV
- NOSE_ARGS="not slow and not network and not disabled"
- PANDAS_TESTING_MODE="deprecate"
- PYTHON_VERSION=3.4
- JOB_NAME: "34_nslow"
- NOSE_ARGS="not slow and not disabled"
- FULL_DEPS=true
- CLIPBOARD=xsel
- CACHE_NAME="34_nslow"
- USE_CACHE=true
addons:
apt:
packages:
- libatlas-base-dev
- gfortran
- xsel
- python: 3.5
env:
- PYTHON_VERSION=3.5
Expand Down
15 changes: 14 additions & 1 deletion asv_bench/benchmarks/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,17 @@ def setup(self):
self.float = pd.Float64Index(np.random.randn(N).repeat(5))

# Convenience naming.
self.checked_add = pd.core.nanops._checked_add_with_arr
self.checked_add = pd.core.algorithms.checked_add_with_arr

self.arr = np.arange(1000000)
self.arrpos = np.arange(1000000)
self.arrneg = np.arange(-1000000, 0)
self.arrmixed = np.array([1, -1]).repeat(500000)
self.strings = tm.makeStringIndex(100000)

self.arr_nan = np.random.choice([True, False], size=1000000)
self.arrmixed_nan = np.random.choice([True, False], size=1000000)

# match
self.uniques = tm.makeStringIndex(1000).values
self.all = self.uniques.repeat(10)
Expand Down Expand Up @@ -69,6 +72,16 @@ def time_add_overflow_neg_arr(self):
def time_add_overflow_mixed_arr(self):
self.checked_add(self.arr, self.arrmixed)

def time_add_overflow_first_arg_nan(self):
self.checked_add(self.arr, self.arrmixed, arr_mask=self.arr_nan)

def time_add_overflow_second_arg_nan(self):
self.checked_add(self.arr, self.arrmixed, b_mask=self.arrmixed_nan)

def time_add_overflow_both_arg_nan(self):
self.checked_add(self.arr, self.arrmixed, arr_mask=self.arr_nan,
b_mask=self.arrmixed_nan)


class Hashing(object):
goal_time = 0.2
Expand Down
7 changes: 7 additions & 0 deletions asv_bench/benchmarks/frame_methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,8 @@ class Iteration(object):
def setup(self):
self.df = DataFrame(randn(10000, 1000))
self.df2 = DataFrame(np.random.randn(50000, 10))
self.df3 = pd.DataFrame(np.random.randn(1000,5000),
columns=['C'+str(c) for c in range(5000)])

def f(self):
if hasattr(self.df, '_item_cache'):
Expand All @@ -85,6 +87,11 @@ def time_iteritems(self):
def time_iteritems_cached(self):
self.g()

def time_iteritems_indexing(self):
df = self.df3
for col in df:
df[col]

def time_itertuples(self):
for row in self.df2.itertuples():
pass
Expand Down
2 changes: 1 addition & 1 deletion asv_bench/benchmarks/io_bench.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ def setup(self, compression, engine):
# The Python 2 C parser can't read bz2 from open files.
raise NotImplementedError
try:
import boto
import s3fs
except ImportError:
# Skip these benchmarks if `boto` is not installed.
raise NotImplementedError
Expand Down
13 changes: 13 additions & 0 deletions asv_bench/benchmarks/join_merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -302,12 +302,19 @@ def setup(self):
self.df1 = self.df1.sort_values('time')
self.df2 = self.df2.sort_values('time')

self.df1['time32'] = np.int32(self.df1.time)
self.df2['time32'] = np.int32(self.df2.time)

self.df1a = self.df1[['time', 'value1']]
self.df2a = self.df2[['time', 'value2']]
self.df1b = self.df1[['time', 'key', 'value1']]
self.df2b = self.df2[['time', 'key', 'value2']]
self.df1c = self.df1[['time', 'key2', 'value1']]
self.df2c = self.df2[['time', 'key2', 'value2']]
self.df1d = self.df1[['time32', 'value1']]
self.df2d = self.df2[['time32', 'value2']]
self.df1e = self.df1[['time', 'key', 'key2', 'value1']]
self.df2e = self.df2[['time', 'key', 'key2', 'value2']]

def time_noby(self):
merge_asof(self.df1a, self.df2a, on='time')
Expand All @@ -318,6 +325,12 @@ def time_by_object(self):
def time_by_int(self):
merge_asof(self.df1c, self.df2c, on='time', by='key2')

def time_on_int32(self):
merge_asof(self.df1d, self.df2d, on='time32')

def time_multiby(self):
merge_asof(self.df1e, self.df2e, on='time', by=['key', 'key2'])


#----------------------------------------------------------------------
# data alignment
Expand Down
25 changes: 25 additions & 0 deletions asv_bench/benchmarks/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,28 @@ def time_value_counts_pindex(self):
self.i.value_counts()


class period_standard_indexing(object):
goal_time = 0.2

def setup(self):
self.index = PeriodIndex(start='1985', periods=1000, freq='D')
self.series = Series(range(1000), index=self.index)
self.period = self.index[500]

def time_get_loc(self):
self.index.get_loc(self.period)

def time_shape(self):
self.index.shape

def time_shallow_copy(self):
self.index._shallow_copy()

def time_series_loc(self):
self.series.loc[self.period]

def time_align(self):
pd.DataFrame({'a': self.series, 'b': self.series[:500]})

def time_intersection(self):
self.index[:750].intersection(self.index[250:])
19 changes: 17 additions & 2 deletions asv_bench/benchmarks/series_methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,28 @@ def setup(self):
self.dr = pd.date_range(
start=datetime(2015,10,26),
end=datetime(2016,1,1),
freq='10s'
) # ~500k long
freq='50s'
) # ~100k long

def time_series_constructor_no_data_datetime_index(self):
Series(data=None, index=self.dr)


class series_constructor_dict_data_datetime_index(object):
goal_time = 0.2

def setup(self):
self.dr = pd.date_range(
start=datetime(2015, 10, 26),
end=datetime(2016, 1, 1),
freq='50s'
) # ~100k long
self.data = {d: v for d, v in zip(self.dr, range(len(self.dr)))}

def time_series_constructor_no_data_datetime_index(self):
Series(data=self.data, index=self.dr)


class series_isin_int64(object):
goal_time = 0.2

Expand Down
6 changes: 3 additions & 3 deletions ci/lint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ source activate pandas
RET=0

if [ "$LINT" ]; then
pip install cpplint

# pandas/rpy is deprecated and will be removed.
# pandas/src is C code, so no need to search there.
echo "Linting *.py"
Expand Down Expand Up @@ -43,13 +45,11 @@ if [ "$LINT" ]; then
# from Cython files nor do we want to lint C files that we didn't modify for
# this particular codebase (e.g. src/headers, src/klib, src/msgpack). However,
# we can lint all header files since they aren't "generated" like C files are.
pip install cpplint

echo "Linting *.c and *.h"
for path in '*.h' 'period_helper.c' 'datetime' 'parser' 'ujson'
do
echo "linting -> pandas/src/$path"
cpplint --extensions=c,h --headers=h --filter=-readability/casting,-runtime/int,-build/include_subdir --recursive pandas/src/$path
cpplint --quiet --extensions=c,h --headers=h --filter=-readability/casting,-runtime/int,-build/include_subdir --recursive pandas/src/$path
if [ $? -ne "0" ]; then
RET=1
fi
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements-2.7-64.run
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ sqlalchemy
lxml=3.2.1
scipy
xlsxwriter
boto
s3fs
bottleneck
html5lib
beautiful-soup
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements-2.7.run
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ sqlalchemy=0.9.6
lxml=3.2.1
scipy
xlsxwriter=0.4.6
boto=2.36.0
s3fs
bottleneck
psycopg2=2.5.2
patsy
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements-2.7_SLOW.run
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ numexpr
pytables
sqlalchemy
lxml
boto
s3fs
bottleneck
psycopg2
pymysql
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements-3.5.run
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ sqlalchemy
pymysql
psycopg2
xarray
boto
s3fs

# incompat with conda ATM
# beautiful-soup
2 changes: 1 addition & 1 deletion ci/requirements-3.5_OSX.run
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ matplotlib
jinja2
bottleneck
xarray
boto
s3fs

# incompat with conda ATM
# beautiful-soup
Binary file added doc/cheatsheet/Pandas_Cheat_Sheet.pdf
Binary file not shown.
Binary file added doc/cheatsheet/Pandas_Cheat_Sheet.pptx
Binary file not shown.
Loading