Description
Code Sample, a copy-pastable example if possible
import pandas as pd
#construct a toy example
data = {0:{'id':1, 'name': 'Foo', 'elements': {'a':1}},1:{'id':2, 'name': 'Bar', 'elements': {'b':2}},2:{'id':3, 'name': 'Baz', 'elements': {'c':3}}}
passing_series = pd.Series(data)
#testing the passing case
pd.io.json.json_normalize(passing_series)
Output:
Out[5]:
elements.a elements.b elements.c id name
0 1.0 NaN NaN 1 Foo
1 NaN 2.0 NaN 2 Bar
2 NaN NaN 3.0 3 Baz
#construct the failing case
failing_series = passing_series.copy()
failing_series.index = [1,2,3]
#testing the failing case
pd.io.json.json_normalize(failing_series)
Output:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-7-08797bc7d807> in <module>()
----> 1 pd.io.json.json_normalize(failing_series)
~/miniconda3/envs/uptodatepandas/lib/python3.5/site-packages/pandas/io/json/normalize.py in json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep)
190
191 if record_path is None:
--> 192 if any([isinstance(x, dict) for x in compat.itervalues(data[0])]):
193 # naive normalization, this is idempotent for flat records
194 # and potentially will inflate the data considerably for
~/miniconda3/envs/uptodatepandas/lib/python3.5/site-packages/pandas/core/series.py in __getitem__(self, key)
621 key = com._apply_if_callable(key, self)
622 try:
--> 623 result = self.index.get_value(self, key)
624
625 if not is_scalar(result):
~/miniconda3/envs/uptodatepandas/lib/python3.5/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
2558 try:
2559 return self._engine.get_value(s, k,
-> 2560 tz=getattr(series.dtype, 'tz', None))
2561 except KeyError as e1:
2562 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 0
Problem description
This is a bit of a stretch for this function, which calls for a dict or list of dicts, but generally json_normalize
works just fine for Pandas Series. The exception, of course, being Series without a record whose index value is 0. The reason for this error is pretty obvious--line 192 of pandas/io/json/normalize.py
attempts to get the first item from the passed data
parameter with []
indexing. (i.e. if any([isinstance(x, dict) for x in compat.itervalues(data[0])]):
)
Expected Output
Either an error for passing an invalid input type (not ideal), or just work correctly if passed a Series by checking the first record in a way that is compatible with Series as well as the standard (list). It seems like an improvement that would be pretty trivial to implement.
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Darwin
OS-release: 17.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.4.0
Cython: 0.26
numpy: 1.13.1
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: 1.1.5
pymysql: 0.7.9.None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: 0.1.2
fastparquet: 0.1.0
pandas_gbq: None
pandas_datareader: None