Description
In https://github.com/pandas-dev/pandas/blob/master/doc/source/development/developer.rst, there is no affordance for storing RangeIndex without serializing it to a column of integers. This wastes both memory and time
I'll propose an evolution of the metadata that permits "non-serialized" indexes like RangeIndex to be stored without a conversion step of some kind
This will have to mind forward compatibility (so we can read old files, but not backward compatibility -- i.e. allowing new files to be read by old readers -- see below). I would suggest changing the index_columns
to include dictionaries like
{
'kind': 'range',
'start': 0,
'stop': 10,
'step': 1
}
versus
{
'kind': 'serialized',
'field_name': '__index_level_0__'
}
So if a string is encountered in this field (instead of a dict), we know it is "old" metadata. This will break old readers but I think that is OK
Cross ref with https://issues.apache.org/jira/browse/ARROW-1639