Skip to content

better support for duplicate labels (on the same axis) #1126

Open
@gdementen

Description

@gdementen

We cannot load >2D array with duplicate labels:

arr = ndtest("a=a0,a1;b=x,x;c=c0,c1")

arr.to_hdf('test.h5', 'arr')
arr = read_hdf('test.h5', 'arr')
ValueError: cannot reshape array of size 8 into shape (2,1,2)

arr.to_csv('test.csv')
arr = read_csv('test.csv')
ValueError: cannot handle a non-unique multi-index!

arr.to_excel('test.xlsx')
arr = read_excel('test.xlsx')
ValueError: cannot handle a non-unique multi-index!

For HDF, this is clearly a limitation in larray's code. In pandas.py/index_to_labels, I used the following code:

return [unique_list(idx.get_level_values(label)) for label in range(idx.nlevels)]

where unique_list returns the unique labels for that index "level", and that obviously breaks in the presence of duplicate labels.

For csv and Excel, this is not so clear-cut. This seems to be a limitation in Pandas reindex, and I am unsure we can do anything about that (except not going via Pandas to load data).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions