Skip to content

BUG: Error while saving DataFrame with TimedeltaIndex to .csv #10833 #10845

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 21, 2015
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.17.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -678,3 +678,4 @@ Bug Fixes
- Bug in ``iloc`` allowing memory outside bounds of a Series to be accessed with negative integers (:issue:`10779`)
- Bug in ``read_msgpack`` where encoding is not respected (:issue:`10580`)
- Bug preventing access to the first index when using ``iloc`` with a list containing the appropriate negative integer (:issue:`10547`, :issue:`10779`)
- Bug in ``TimedeltaIndex`` formatter causing error while trying to save ``DataFrame`` with ``TimedeltaIndex`` using ``to_csv`` (:issue:`10833`)
2 changes: 1 addition & 1 deletion pandas/core/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -2174,7 +2174,7 @@ def __init__(self, values, nat_rep='NaT', box=False, **kwargs):
def _format_strings(self):
formatter = self.formatter or _get_format_timedelta64(self.values, nat_rep=self.nat_rep,
box=self.box)
fmt_values = [formatter(x) for x in self.values]
fmt_values = np.array([formatter(x) for x in self.values])
return fmt_values


Expand Down
15 changes: 14 additions & 1 deletion pandas/tests/test_frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -6308,7 +6308,6 @@ def test_to_csv_from_csv(self):
header=['AA', 'X'])

with ensure_clean(pname) as path:
import pandas as pd
df1 = DataFrame(np.random.randn(3, 1))
df2 = DataFrame(np.random.randn(3, 1))

Expand All @@ -6320,6 +6319,20 @@ def test_to_csv_from_csv(self):
xp.columns = lmap(int,xp.columns)
assert_frame_equal(xp,rs)

with ensure_clean() as path:
# GH 10833 (TimedeltaIndex formatting)
dt = pd.Timedelta(seconds=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a Timedelta column as well.

df_orig = pd.DataFrame({'data': list(range(10))},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this df

index=[i*dt for i in range(10)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do index=Index([.........],name='timestamp')

df_orig.index.rename('timestamp', inplace=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we in general do not use inplace ops in testing code (nor recommend them for production code)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there is no way to preserve index name using to_timedelta.

df_test.index = pd.to_timedelta(df_test.index)

Should I use this variant instead?

df_test.index = pd.TimedeltaIndex(df_test.index, name='timestamp')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, no?

In [18]: s = Series(range(3),index=Index(timedelta_range('1 day',periods=3),name='foo'))

In [19]: s
Out[19]: 
foo
1 days    0
2 days    1
3 days    2
dtype: int64

In [20]: pd.to_timedelta(s.index)
Out[20]: TimedeltaIndex(['1 days', '2 days', '3 days'], dtype='timedelta64[ns]', name=u'foo', freq=None)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same result as you on your piece of code.

Take a look at this example:

result = pd.read_csv(path, index_col='dt_index')
print(result.index)
>>> Index(['00:00:00', '00:00:01', '00:00:02'], dtype='object', name='dt_index')
result.index = pd.to_timedelta(result.index)
print(result.index)
>>> TimedeltaIndex(['00:00:00', '00:00:01', '00:00:02'], dtype='timedelta64[ns]', freq=None)

Seems like it disappears during Index -> TimedeltaIndex conversion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i c. if you want to trace where that is not passed thru would be gr8. (if its easy do it as part of this issue, if not, pls open a new issue).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll finalize both issues within this week.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gr8 thanks. Add a separate whatsnew note (with same issue number) for the name lost in conversion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that casting to_datetime on index doesn't keep the name also. Let's open a new issue.
This PR can't be merged before I fix the new one.

df_orig.to_csv(path)

df_test = pd.read_csv(path, index_col='timestamp')
df_test.index = pd.to_timedelta(df_test.index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this result

df_test.index.rename('timestamp', inplace=True)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set the index name for now so this issue is independent

self.assertTrue(df_test.equal(df_orig))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could do this, but it is not nearly as imformative as assert_frame_equal. pls change.


def test_to_csv_cols_reordering(self):
# GH3454
import pandas as pd
Expand Down