Closed
Description
I propose that itertuples()
should return collections.namedtuple
objects, a drop-in replacement for the standard tuple but with the benefit of having named fields. I have tested the following with Python 3.4 (only slight changes compared to the current implementation).
def itertuples(self, index=True):
arrays = []
if index:
arrays.append(self.index)
fields = ["Index"] + list(self.columns)
else:
fields = self.columns
itertuple = collections.namedtuple("Itertuple", fields, rename=True)
# use integer indexing because of possible duplicate column names
arrays.extend(self.iloc[:, k] for k in range(len(self.columns)))
return (itertuple(*row) for row in zip(*arrays))
Example
In [3]: df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]}, index=['a', 'b'])
In [4]: for row in df.itertuples():
...: print(row)
...:
Itertuple(Index='a', col1=1, col2=0.10000000000000001)
Itertuple(Index='b', col1=2, col2=0.20000000000000001)
In [5]: row.Index, row.col1, row.col2
Out[5]: ('b', 2, 0.20000000000000001)
There is no performance overhead. I'm not sure about the compatibility for older versions of Python, though. The rename
parameter is needed for renaming disallowed field names and duplicate identifiers to standard position-based identifiers, and this feature was added in Python 2.7/3.1.