Skip to content

PERF/DOC: Option to .info() and .memory_usage() to provide for deep introspection of memory consumption #11595 #11596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 13, 2015

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Nov 13, 2015

closes #11595

@jreback jreback added Output-Formatting __repr__ of pandas objects, to_string API Design labels Nov 13, 2015
@jreback jreback added this to the 0.17.1 milestone Nov 13, 2015
@jreback
Copy link
Contributor Author

jreback commented Nov 13, 2015

cc @mrocklin

@mrocklin
Copy link
Contributor

Does this descend into categories and the index?

@jreback
Copy link
Contributor Author

jreback commented Nov 13, 2015

wondering if you were going to ask that....it DOES do the index. not the categories, but I can fix this ......

@jreback
Copy link
Contributor Author

jreback commented Nov 13, 2015

Now includes embedded usage for Index & Categorical

In [5]:    df = DataFrame({'A' : ['foo']*1000})

In [6]:    df['B'] = df['A'].astype('category')

In [8]:    df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000 entries, 0 to 999
Data columns (total 2 columns):
A    1000 non-null object
B    1000 non-null category
dtypes: category(1), object(1)
memory usage: 16.6+ KB

In [9]:    df.info(deep=True)
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000 entries, 0 to 999
Data columns (total 2 columns):
A    1000 non-null object
B    1000 non-null category
dtypes: category(1), object(1)
memory usage: 55.7 KB

In [11]: df.memory_usage()
Out[11]: 
A    8000
B    1008
dtype: int64

In [12]: df.memory_usage(deep=True)
Out[12]: 
A    48000
B     1048
dtype: int64

@jreback
Copy link
Contributor Author

jreback commented Nov 13, 2015

And providing on Series as well

In [6]: df['A'].memory_usage()
Out[6]: 8000

In [7]: df['A'].memory_usage(index=True)
Out[7]: 16000

In [8]: df['A'].memory_usage(index=True,deep=True)
Out[8]: 56000

@mrocklin
Copy link
Contributor

BTW, I'm glad that memory_usage_of_objects is usable on numpy arrays as well. I may end up using that outside of pandas.

@jreback
Copy link
Contributor Author

jreback commented Nov 13, 2015

right dask.array could certainly introspect here as well

@max-sixty
Copy link
Contributor

I'll wait until this is merged before adding __getsize__.
Is there a reason index is False by default? I'd have thought that would be 'part of the package'.

@jreback
Copy link
Contributor Author

jreback commented Nov 13, 2015

@MaximilianR I don't recall the discussion, but I think we should change the default. Note that this is just for a direct call to memory_usage and not for .info where it is included.

why don't you post an issue and we'll change in 0.18 (as its a small API change).

jreback added a commit that referenced this pull request Nov 13, 2015
PERF/DOC:  Option to .info() and .memory_usage() to provide for deep introspection of memory consumption #11595
@jreback jreback merged commit ddd0372 into pandas-dev:master Nov 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optionally use sys.getsizeof in DataFrame.memory_usage
3 participants