Skip to content

Error Creating DataFrame with Single MultiIndexed Column #12457

Closed
@woztheproblem

Description

@woztheproblem

Attempting to create a DataFrame with a single column that is multiindexed, I get "IndexError: list index out of range".

Code Sample, a copy-pastable example if possible

df = pd.DataFrame(data=zip(range(100)), columns=[['a','b']])

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-9-5b88e876b2de> in <module>()
----> 1 df = pd.DataFrame(data=zip(range(100)), columns=[['a','b']])

/Users/ewozniak/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in __init__(self, data, index, columns, dtype, copy)
    273 
    274                     mgr = _arrays_to_mgr(arrays, columns, index, columns,
--> 275                                          dtype=dtype)
    276                 else:
    277                     mgr = self._init_ndarray(data, index, columns, dtype=dtype,

/Users/ewozniak/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   5236     axes = [_ensure_index(columns), _ensure_index(index)]
   5237 
-> 5238     return create_block_manager_from_arrays(arrays, arr_names, axes)
   5239 
   5240 

/Users/ewozniak/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in create_block_manager_from_arrays(arrays, names, axes)
   3894 
   3895     try:
-> 3896         blocks = form_blocks(arrays, names, axes)
   3897         mgr = BlockManager(blocks, axes)
   3898         mgr._consolidate_inplace()

/Users/ewozniak/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in form_blocks(arrays, names, axes)
   3929 
   3930         k = names[name_idx]
-> 3931         v = arrays[name_idx]
   3932 
   3933         if is_sparse(v):

IndexError: list index out of range

Note, I use zip() in the example above in order to match what the data would look like when creating one dataframe from the data of another data frame using df.to_json(orient='split') (which is what I'm trying to do). If I don't use zip(), then I get:

df = pd.DataFrame(data=range(100), columns=[['a','b']])
ValueError: Shape of passed values is (1, 100), indices imply (2, 100)

This works fine with two (or more) columns:

df = pd.DataFrame(data=zip(range(100),range(100)), columns=[['a','b'],['c','d']])

output of pd.show_versions()

I have tried with both pandas 0.17.1 and 0.18.0rc1.

INSTALLED VERSIONS

commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 15.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.0rc1
nose: 1.3.7
pip: 8.0.3
setuptools: 20.1.1
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: 0.9.2
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
jinja2: 2.8

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions