Skip to content

Commit 94c08c4

Browse files
authored
PERF: Performance improvement in to_csv with unused levels in multiindex (#44943)
1 parent adfc78b commit 94c08c4

File tree

3 files changed

+23
-0
lines changed

3 files changed

+23
-0
lines changed

asv_bench/benchmarks/io/csv.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,26 @@ def time_frame(self, kind):
5555
self.df.to_csv(self.fname)
5656

5757

58+
class ToCSVMultiIndexUnusedLevels(BaseIO):
59+
60+
fname = "__test__.csv"
61+
62+
def setup(self):
63+
df = DataFrame({"a": np.random.randn(100_000), "b": 1, "c": 1})
64+
self.df = df.set_index(["a", "b"])
65+
self.df_unused_levels = self.df.iloc[:10_000]
66+
self.df_single_index = df.set_index(["a"]).iloc[:10_000]
67+
68+
def time_full_frame(self):
69+
self.df.to_csv(self.fname)
70+
71+
def time_sliced_frame(self):
72+
self.df_unused_levels.to_csv(self.fname)
73+
74+
def time_single_index_frame(self):
75+
self.df_single_index.to_csv(self.fname)
76+
77+
5878
class ToCSVDatetime(BaseIO):
5979

6080
fname = "__test__.csv"

doc/source/whatsnew/v1.4.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -592,6 +592,7 @@ Performance improvements
592592
- Performance improvement in :meth:`Series.mad` (:issue:`43010`)
593593
- Performance improvement in :func:`merge` (:issue:`43332`)
594594
- Performance improvement in :func:`to_csv` when index column is a datetime and is formatted (:issue:`39413`)
595+
- Performance improvement in :func:`to_csv` when :class:`MultiIndex` contains a lot of unused levels (:issue:`37484`)
595596
- Performance improvement in :func:`read_csv` when ``index_col`` was set with a numeric column (:issue:`44158`)
596597
- Performance improvement in :func:`concat` (:issue:`43354`)
597598
-

pandas/io/formats/csvs.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,8 @@ def data_index(self) -> Index:
186186
data_index = Index(
187187
[x.strftime(self.date_format) if notna(x) else "" for x in data_index]
188188
)
189+
elif isinstance(data_index, ABCMultiIndex):
190+
data_index = data_index.remove_unused_levels()
189191
return data_index
190192

191193
@property

0 commit comments

Comments
 (0)