-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Add default repr for EAs #23601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add default repr for EAs #23601
Changes from 15 commits
0fdbfd3
ace62aa
6e76b51
fef04e6
1885a97
ecfcd72
4e0d91f
37638cc
6e64b7b
193747e
5a2e1e4
1635b73
e2b1941
48e55cc
d8e7ba4
b312fe4
445736d
60e0d02
5b07906
ff0c998
2fd3d5d
5d8d2fc
baee6b2
4d343ea
5b291d5
1b93bf0
708dd75
0f4083e
9116930
ebadf6f
e5f6976
221cee9
439f2f8
2364546
62b1e2f
a926dca
fc4279d
27db397
5c253a4
ef390fc
2b5fe25
d84cc02
d9df6bf
a35399e
740f9e5
e7cc2ac
c79ba0b
3825aeb
2a60c15
bccf40d
a7ef104
a3b1c92
e080023
6ad113b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -49,6 +49,13 @@ class ExtensionArray(object): | |
|
||
* _formatting_values | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
A default repr displaying the type, (truncated) data, length, | ||
and dtype is provided. It can be customized or replaced by | ||
by overriding: | ||
|
||
* _formatter | ||
* __repr__ | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Some methods require casting the ExtensionArray to an ndarray of Python | ||
objects with ``self.astype(object)``, which may be expensive. When | ||
performance is a concern, we highly recommend overriding the following | ||
|
@@ -653,15 +660,60 @@ def copy(self, deep=False): | |
raise AbstractMethodError(self) | ||
|
||
# ------------------------------------------------------------------------ | ||
# Block-related methods | ||
# Printing | ||
# ------------------------------------------------------------------------ | ||
|
||
def __repr__(self): | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
from pandas.io.formats.printing import format_object_summary | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
template = ( | ||
'<{class_name}>\n' | ||
'{data}\n' | ||
'Length: {length}, dtype: {dtype}' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we need to define the “unicode” There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you are writing new code here but this should be consistent as well (it’s ok to change that too) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed... I left the implementation in repr, and then encoded / decoded as needed in |
||
) | ||
# the short repr has no trailing newline, while the truncated | ||
# repr does. So we include a newline in our template, and strip | ||
# any trailing newlines from format_object_summary | ||
data = format_object_summary(self, self._formatter(), name=False, | ||
trailing_comma=False).rstrip('\n') | ||
name = self.__class__.__name__ | ||
return template.format(class_name=name, data=data, | ||
length=len(self), | ||
dtype=self.dtype) | ||
|
||
def _formatter(self, formatter=None): | ||
# type: (Optional[ExtensionArrayFormatter]) -> Callable[[Any], str] | ||
"""Formatting function for scalar values. | ||
|
||
This is used in the default '__repr__'. The formatting function | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
receives instances of your scalar type. | ||
|
||
Parameters | ||
---------- | ||
formatter: GenericArrayFormatter, optional | ||
The formatter this array is being rendered with. The formatter | ||
may have a `.formatter` method already defined. By default, this | ||
will be used if a `formatter` is passed, otherwise the formatter | ||
is ``str``. | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Returns | ||
------- | ||
Callable[[Any], str] | ||
A callable that gets instances of the scalar type and | ||
returns a string. | ||
""" | ||
return getattr(formatter, 'formatter', None) or str | ||
|
||
def _formatting_values(self): | ||
# type: () -> np.ndarray | ||
# At the moment, this has to be an array since we use result.dtype | ||
"""An array of values to be printed in, e.g. the Series repr""" | ||
return np.array(self) | ||
|
||
# ------------------------------------------------------------------------ | ||
# Reshaping | ||
# ------------------------------------------------------------------------ | ||
|
||
@classmethod | ||
def _concat_same_type(cls, to_concat): | ||
# type: (Sequence[ExtensionArray]) -> ExtensionArray | ||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -16,11 +16,12 @@ | |||||
from pandas.compat import StringIO, lzip, map, u, zip | ||||||
|
||||||
from pandas.core.dtypes.common import ( | ||||||
is_categorical_dtype, is_datetime64_dtype, is_datetimetz, is_float, | ||||||
is_float_dtype, is_integer, is_integer_dtype, is_interval_dtype, | ||||||
is_list_like, is_numeric_dtype, is_period_arraylike, is_scalar, | ||||||
is_categorical_dtype, is_datetime64_dtype, is_datetimetz, | ||||||
is_extension_array_dtype, is_float, is_float_dtype, is_integer, | ||||||
is_integer_dtype, is_list_like, is_numeric_dtype, is_scalar, | ||||||
is_timedelta64_dtype) | ||||||
from pandas.core.dtypes.generic import ABCMultiIndex, ABCSparseArray | ||||||
from pandas.core.dtypes.generic import ( | ||||||
ABCIndexClass, ABCMultiIndex, ABCSeries, ABCSparseArray) | ||||||
from pandas.core.dtypes.missing import isna, notna | ||||||
|
||||||
from pandas import compat | ||||||
|
@@ -29,7 +30,6 @@ | |||||
from pandas.core.config import get_option, set_option | ||||||
from pandas.core.index import Index, ensure_index | ||||||
from pandas.core.indexes.datetimes import DatetimeIndex | ||||||
from pandas.core.indexes.period import PeriodIndex | ||||||
|
||||||
from pandas.io.common import _expand_user, _stringify_path | ||||||
from pandas.io.formats.printing import adjoin, justify, pprint_thing | ||||||
|
@@ -849,22 +849,18 @@ def _get_column_name_list(self): | |||||
def format_array(values, formatter, float_format=None, na_rep='NaN', | ||||||
digits=None, space=None, justify='right', decimal='.'): | ||||||
|
||||||
if is_categorical_dtype(values): | ||||||
fmt_klass = CategoricalArrayFormatter | ||||||
elif is_interval_dtype(values): | ||||||
fmt_klass = IntervalArrayFormatter | ||||||
if is_datetime64_dtype(values.dtype): | ||||||
fmt_klass = Datetime64Formatter | ||||||
elif is_timedelta64_dtype(values.dtype): | ||||||
fmt_klass = Timedelta64Formatter | ||||||
elif is_extension_array_dtype(values.dtype): | ||||||
fmt_klass = ExtensionArrayFormatter | ||||||
elif is_float_dtype(values.dtype): | ||||||
fmt_klass = FloatArrayFormatter | ||||||
elif is_period_arraylike(values): | ||||||
fmt_klass = PeriodArrayFormatter | ||||||
elif is_integer_dtype(values.dtype): | ||||||
fmt_klass = IntArrayFormatter | ||||||
elif is_datetimetz(values): | ||||||
fmt_klass = Datetime64TZFormatter | ||||||
elif is_datetime64_dtype(values.dtype): | ||||||
fmt_klass = Datetime64Formatter | ||||||
elif is_timedelta64_dtype(values.dtype): | ||||||
fmt_klass = Timedelta64Formatter | ||||||
else: | ||||||
fmt_klass = GenericArrayFormatter | ||||||
|
||||||
|
@@ -1126,39 +1122,22 @@ def _format_strings(self): | |||||
return fmt_values.tolist() | ||||||
|
||||||
|
||||||
class IntervalArrayFormatter(GenericArrayFormatter): | ||||||
|
||||||
def __init__(self, values, *args, **kwargs): | ||||||
GenericArrayFormatter.__init__(self, values, *args, **kwargs) | ||||||
|
||||||
def _format_strings(self): | ||||||
formatter = self.formatter or str | ||||||
fmt_values = np.array([formatter(x) for x in self.values]) | ||||||
return fmt_values | ||||||
|
||||||
|
||||||
class PeriodArrayFormatter(IntArrayFormatter): | ||||||
|
||||||
class ExtensionArrayFormatter(GenericArrayFormatter): | ||||||
def _format_strings(self): | ||||||
from pandas.core.indexes.period import IncompatibleFrequency | ||||||
try: | ||||||
values = PeriodIndex(self.values).to_native_types() | ||||||
except IncompatibleFrequency: | ||||||
# periods may contains different freq | ||||||
values = Index(self.values, dtype='object').to_native_types() | ||||||
|
||||||
formatter = self.formatter or (lambda x: '{x}'.format(x=x)) | ||||||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
fmt_values = [formatter(x) for x in values] | ||||||
return fmt_values | ||||||
|
||||||
values = self.values | ||||||
if isinstance(values, (ABCIndexClass, ABCSeries)): | ||||||
values = values._values | ||||||
|
||||||
class CategoricalArrayFormatter(GenericArrayFormatter): | ||||||
formatter = values._formatter(self) | ||||||
|
||||||
def __init__(self, values, *args, **kwargs): | ||||||
GenericArrayFormatter.__init__(self, values, *args, **kwargs) | ||||||
if is_categorical_dtype(values.dtype): | ||||||
# Categorical is special for now, so that we can preserve tzinfo | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we need a TODO here? this is until DatetimeArray is fully pushed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That depends on whether we're willing to change There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. #23569 (comment) for that. |
||||||
array = values.get_values() | ||||||
else: | ||||||
array = np.asarray(values) | ||||||
|
||||||
def _format_strings(self): | ||||||
fmt_values = format_array(self.values.get_values(), self.formatter, | ||||||
fmt_values = format_array(array, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TomAugspurger : i'm struggling to resolve some formatting issues. what is the reason for calling There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i guess, to be more succinct, why is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not that familiar with this code, but from a quick look: calling There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Although that most of those custom Formatter classes don't do much special if formatter is specified. Eg pandas/pandas/io/formats/format.py Lines 1174 to 1175 in 181f972
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so if |
||||||
formatter, | ||||||
float_format=self.float_format, | ||||||
na_rep=self.na_rep, digits=self.digits, | ||||||
space=self.space, justify=self.justify) | ||||||
|
Uh oh!
There was an error while loading. Please reload this page.