-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Add default repr for EAs #23601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add default repr for EAs #23601
Changes from 1 commit
0fdbfd3
ace62aa
6e76b51
fef04e6
1885a97
ecfcd72
4e0d91f
37638cc
6e64b7b
193747e
5a2e1e4
1635b73
e2b1941
48e55cc
d8e7ba4
b312fe4
445736d
60e0d02
5b07906
ff0c998
2fd3d5d
5d8d2fc
baee6b2
4d343ea
5b291d5
1b93bf0
708dd75
0f4083e
9116930
ebadf6f
e5f6976
221cee9
439f2f8
2364546
62b1e2f
a926dca
fc4279d
27db397
5c253a4
ef390fc
2b5fe25
d84cc02
d9df6bf
a35399e
740f9e5
e7cc2ac
c79ba0b
3825aeb
2a60c15
bccf40d
a7ef104
a3b1c92
e080023
6ad113b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -49,6 +49,13 @@ class ExtensionArray(object): | |
|
||
* _formatting_values | ||
|
||
A default repr displaying the type, (truncated) data, length, | ||
and dtype is provided. It can be customized or replaced by | ||
by overriding: | ||
|
||
* _formatter | ||
* __repr__ | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Some methods require casting the ExtensionArray to an ndarray of Python | ||
objects with ``self.astype(object)``, which may be expensive. When | ||
performance is a concern, we highly recommend overriding the following | ||
|
@@ -653,15 +660,46 @@ def copy(self, deep=False): | |
raise AbstractMethodError(self) | ||
|
||
# ------------------------------------------------------------------------ | ||
# Block-related methods | ||
# Printing | ||
# ------------------------------------------------------------------------ | ||
def __repr__(self): | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
from pandas.io.formats.printing import format_object_summary | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
template = ( | ||
'<{class_name}>\n' | ||
'{data}\n' | ||
'Length: {length}, dtype: {dtype}' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we need to define the “unicode” There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you are writing new code here but this should be consistent as well (it’s ok to change that too) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed... I left the implementation in repr, and then encoded / decoded as needed in |
||
) | ||
# the short repr has no trailing newline, while the truncated | ||
# repr does. So we include a newline in our template, and strip | ||
# any trailing newlines from format_object_summary | ||
data = format_object_summary(self, self._formatter, name=False, | ||
trailing_comma=False).rstrip('\n') | ||
name = self.__class__.__name__ | ||
return template.format(class_name=name, data=data, | ||
length=len(self), | ||
dtype=self.dtype) | ||
|
||
@property | ||
def _formatter(self): | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# type: () -> Callable[Any] | ||
"""Formatting function for scalar values. | ||
|
||
This is used in the default '__repr__'. The formatting function | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
receives instances of your scalar type. | ||
""" | ||
return str | ||
|
||
def _formatting_values(self): | ||
# type: () -> np.ndarray | ||
# At the moment, this has to be an array since we use result.dtype | ||
"""An array of values to be printed in, e.g. the Series repr""" | ||
return np.array(self) | ||
|
||
# ------------------------------------------------------------------------ | ||
# Reshaping | ||
# ------------------------------------------------------------------------ | ||
|
||
@classmethod | ||
def _concat_same_type(cls, to_concat): | ||
# type: (Sequence[ExtensionArray]) -> ExtensionArray | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -271,7 +271,9 @@ class TableSchemaFormatter(BaseFormatter): | |
max_seq_items=max_seq_items) | ||
|
||
|
||
def format_object_summary(obj, formatter, is_justify=True, name=None): | ||
def format_object_summary(obj, formatter, is_justify=True, name=None, | ||
trailing_comma=True, | ||
truncated_trailing_newline=True): | ||
""" | ||
Return the formatted obj as a unicode string | ||
|
||
|
@@ -283,9 +285,14 @@ def format_object_summary(obj, formatter, is_justify=True, name=None): | |
string formatter for an element | ||
is_justify : boolean | ||
should justify the display | ||
name : name, optiona | ||
name : name, optional | ||
defaults to the class name of the obj | ||
|
||
Pass ``False`` to indicate that subsequent lines should | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what calls this with False? IOW what has a name that u don’t want to print There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is needed to re-use for both Index and EA-style formatters. It disables indentation on subsequent lines. print(pd.io.formats.printing.format_object_summary(arr, str))
[2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01, 2000-01-01,
2001-01-01, 2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01,
2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01, 2000-01-01,
2001-01-01, 2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01], vs. In [6]: print(pd.io.formats.printing.format_object_summary(arr, str, name=False))
[2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01,
2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01,
2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01,
2000-01-01, 2001-01-01], There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can this be another parameter then? it seems like it is used for 2 purposes There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. Calling it |
||
not be indented to align with the name. | ||
trailing_comma : bool, default True | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Whether to include a comma after the closing ']' | ||
|
||
Returns | ||
------- | ||
summary string | ||
|
@@ -300,8 +307,13 @@ def format_object_summary(obj, formatter, is_justify=True, name=None): | |
if name is None: | ||
name = obj.__class__.__name__ | ||
|
||
space1 = "\n%s" % (' ' * (len(name) + 1)) | ||
space2 = "\n%s" % (' ' * (len(name) + 2)) | ||
if name is False: | ||
space1 = "\n" | ||
space2 = "\n " # space for the opening '[' | ||
else: | ||
name_len = len(name) | ||
space1 = "\n%s" % (' ' * (name_len + 1)) | ||
space2 = "\n%s" % (' ' * (name_len + 2)) | ||
|
||
n = len(obj) | ||
sep = ',' | ||
|
@@ -328,15 +340,20 @@ def best_len(values): | |
else: | ||
return 0 | ||
|
||
if trailing_comma: | ||
close = ', ' | ||
else: | ||
close = '' | ||
|
||
if n == 0: | ||
summary = '[], ' | ||
summary = '[]{}'.format(close) | ||
elif n == 1: | ||
first = formatter(obj[0]) | ||
summary = '[%s], ' % first | ||
summary = '[{}]{}'.format(first, close) | ||
elif n == 2: | ||
first = formatter(obj[0]) | ||
last = formatter(obj[-1]) | ||
summary = '[%s, %s], ' % (first, last) | ||
summary = '[{}, {}]{}'.format(first, last, close) | ||
else: | ||
|
||
if n > max_seq_items: | ||
|
@@ -381,7 +398,11 @@ def best_len(values): | |
summary, line = _extend_line(summary, line, tail[-1], | ||
display_width - 2, space2) | ||
summary += line | ||
summary += '],' | ||
|
||
# right now close is either '' or ', ' | ||
# Now we want to include the ']', but not the maybe space. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is this needed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Without it, you'd get something like In [2]: pd.core.arrays.period_array(['2000', '2001'], freq='D')
Out[2]:
<PeriodArray>
['2000-01-01', '2001-01-01'],
Length: 2, dtype: period[D] (notice the trailing coma after the ending That's what we want for index classes: In [3]: pd.Index([1, 2, 3])
Out[3]: Int64Index([1, 2, 3], dtype='int64') but not for EAs since we don't know if they're valid code: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think this is a very strange repr this is reinventing the wheel again compared to what we do for index; and somewhat arbitrary repr is very important for consistency and this is off i would avoid the special casing and have it look a whole lot more like what we have now There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also none of this looks tested (meaning the special casing and so on) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the valid code argument does not work here either we already have not valid code in repr eg MultiIndex and IntervalIndex it’s very hard to guarantee this but using parens and no angle brackets and commas between items would be a major improvement There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What special casing? I have 100% coverage for the diff when running it on the base repr tests. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the very fact that you need a special option is the strange part. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why's that? Don't we want to reuse the common formatting code? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so another difference this is highliting is that EA have the attributes on another line, while the Index does not (as they are args). |
||
close = ']' + close.rstrip(' ') | ||
summary += close | ||
|
||
if len(summary) > (display_width): | ||
summary += space1 | ||
|
Uh oh!
There was an error while loading. Please reload this page.