Skip to content

add __len__ and __getitem__ to Column #140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 4, 2023
36 changes: 35 additions & 1 deletion spec/API_specification/dataframe_api/column_object.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from __future__ import annotations

from typing import Sequence
from typing import NoReturn, Sequence

from ._types import dtype

Expand Down Expand Up @@ -35,3 +35,37 @@ def from_sequence(cls, sequence: Sequence[object], dtype: dtype) -> Column:
-------
Column
"""
...

def __len__(self) -> int:
"""
Return the number of rows.
"""

def __getitem__(self, row: int) -> object:
"""
Get the element at row index `key`.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still feels a bit funky to me to only support the scalar case via __getitem__ where I think it would go against folks natural intuition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's already get_rows for that - perhaps we could start like this, and then we can always overload this one if necessary based on feedback?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another use case for getitem (or separate method) is slicing (#89 indicates adding a separate method for that as well)


def __iter__(self) -> NoReturn:
"""
Iterate over elements.

This is intentionally "poisoned" to discourage inefficient code patterns.

Raises
------
NotImplementedError
"""
raise NotImplementedError("'__iter__' is intentionally not implemented.")

def get_rows(self, indices: Column[int]) -> Column:
"""
Select a subset of rows, similar to `ndarray.take`.

Parameters
----------
indices : Column[int]
Positions of rows to select.
"""
...