API / internals: exact semantics of _ndarray_values

We need to better describe the exact semantics of `_ndarray_values`: what is it expected to return and how it is used.

Currenlty it is defined on the ExtensionArray, but mentioned it is not part of the "official" interface:

https://github.com/pandas-dev/pandas/blob/712fa945c878eaed18f79d4cf99ed91e464d51b1/pandas/core/arrays/base.py#L687-L697

One Series/Index, the property will either give you what `EA._ndarray_values` gives, or the underlying ndarray:

https://github.com/pandas-dev/pandas/blob/712fa945c878eaed18f79d4cf99ed91e464d51b1/pandas/core/base.py#L768-L780

---

What it currently is for the EAs:

* Categorical: integer codes
* IntegerArray: the integer `_data`, so but losing any information about missing values
* PeriodArray: the integer ordinals
* IntervalIndex: object array of Interval objects

---

For what it is currently used (this needs to be better looked at, copying now from https://github.com/pandas-dev/pandas/issues/19954#issuecomment-436374598, quoting Tom here):

- Index.itemsize (deprecated)
- Index.strides (deprecated)
- Index._engine
- Index set ops
- Index.insert
- DatetimeIndex.unique
- MultiIndex.equals
- pytables._convert_index (shared across integer and period)

There are a few other uses (mostly datetime / timedelta / period) that could maybe uses asi8 instead. I'm not familiar enough with indexing to know whether that can operate on something other than ndarrays. In theory, EAs can implement the buffer protocol, which would get the data to cython. But I don't know what ops would be required when we're down there.



	@property
	def _ndarray_values(self):
	# type: () -> np.ndarray
	"""Internal pandas method for lossy conversion to a NumPy ndarray.

	This method is not part of the pandas interface.

	The expectation is that this is cheap to compute, and is primarily
	used for interacting with our indexers.
	"""
	return np.array(self)

	@property
	def _ndarray_values(self):
	# type: () -> np.ndarray
	"""The data as an ndarray, possibly losing information.

	The expectation is that this is cheap to compute, and is primarily
	used for interacting with our indexers.

	- categorical -> codes
	"""
	if is_extension_array_dtype(self):
	return self.values._ndarray_values
	return self.values

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API / internals: exact semantics of _ndarray_values #23565

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

API / internals: exact semantics of _ndarray_values #23565

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions