Skip to content

Get and set column names #21

Closed
Closed
@datapythonista

Description

@datapythonista

Regarding column names, the next proposal, similar to what pandas currently does, uses a columns property to set and get columns names.

In #7, the preference is to restrict column names to string, and not allow duplicates.

The proposed API with an example is:

>>> df = dataframe({'col1': [1, 2], 'col2': [3, 4]})
>>> df.columns = 'foo', 'bar'
>>> df.columns = ['foo', 'bar']
>>> df.columns = map(str.upper, df.columns)
>>> df.columns
['FOO', 'BAR']

And the next cases would fail:

>>> df.columns = 1
TypeError: Columns must be an iterable, not int
>>> df.columns = 'foo'
TypeError: Columns must be an iterable, not str
>>> df.columns = 'foo', 1
TypeError: Column names must be str, int found
>>> df.columns = 'foo', 'bar', 'foobar'
ValueError: Expected 2 column names, found 3
>>> df.columns = 'foo', 'foo'
ValueError: Column names cannot be duplicated. Found duplicates: foo

Some things that people may want to discuss:

  • Using a different name for the property (e.g. column_names)
  • Being able to set a single column df.columns[0] = 'foo' (the proposal don't allow it)
  • The return type of the columns (the proposal returns a Python list, pandas returns an Index)
  • Setting the column of a dataframe with one column with df.columns = 'foo' (the proposal requires an iterable, so df.columns = ['foo'] or equivalent is needed).

In case it's useful, this is the implementation of the examples:

import collections
import typing


class dataframe:
    def __init__(self, data):
        self._columns = list(data)

    @property
    def columns(self) -> typing.List[str]:
        return self._columns
    
    @columns.setter
    def columns(self, names: typing.Iterable[str]):
        if not isinstance(names, collections.abc.Iterable) or isinstance(names, str):
            raise TypeError(f'Columns must be an iterable, not {type(names).__name__}')

        names = list(names)

        for name in names:
            if not isinstance(name, str):
                raise TypeError(f'Column names must be str, {type(name).__name__} found')
        
        if len(names) != len(self._columns):
            raise ValueError(f'Expected {len(self._columns)} column names, found {len(names)}')

        if len(set(names)) != len(self._columns):
            duplicates = set(name for name in names if names.count(name) > 1)
            raise ValueError(f'Column names cannot be duplicated. Found duplicates: {", ".join(duplicates)}')

        self._columns = names

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions