Closed
Description
Regarding column names, the next proposal, similar to what pandas currently does, uses a columns
property to set and get columns names.
In #7, the preference is to restrict column names to string, and not allow duplicates.
The proposed API with an example is:
>>> df = dataframe({'col1': [1, 2], 'col2': [3, 4]})
>>> df.columns = 'foo', 'bar'
>>> df.columns = ['foo', 'bar']
>>> df.columns = map(str.upper, df.columns)
>>> df.columns
['FOO', 'BAR']
And the next cases would fail:
>>> df.columns = 1
TypeError: Columns must be an iterable, not int
>>> df.columns = 'foo'
TypeError: Columns must be an iterable, not str
>>> df.columns = 'foo', 1
TypeError: Column names must be str, int found
>>> df.columns = 'foo', 'bar', 'foobar'
ValueError: Expected 2 column names, found 3
>>> df.columns = 'foo', 'foo'
ValueError: Column names cannot be duplicated. Found duplicates: foo
Some things that people may want to discuss:
- Using a different name for the property (e.g.
column_names
) - Being able to set a single column
df.columns[0] = 'foo'
(the proposal don't allow it) - The return type of the columns (the proposal returns a Python list, pandas returns an Index)
- Setting the column of a dataframe with one column with
df.columns = 'foo'
(the proposal requires an iterable, sodf.columns = ['foo']
or equivalent is needed).
In case it's useful, this is the implementation of the examples:
import collections
import typing
class dataframe:
def __init__(self, data):
self._columns = list(data)
@property
def columns(self) -> typing.List[str]:
return self._columns
@columns.setter
def columns(self, names: typing.Iterable[str]):
if not isinstance(names, collections.abc.Iterable) or isinstance(names, str):
raise TypeError(f'Columns must be an iterable, not {type(names).__name__}')
names = list(names)
for name in names:
if not isinstance(name, str):
raise TypeError(f'Column names must be str, {type(name).__name__} found')
if len(names) != len(self._columns):
raise ValueError(f'Expected {len(self._columns)} column names, found {len(names)}')
if len(set(names)) != len(self._columns):
duplicates = set(name for name in names if names.count(name) > 1)
raise ValueError(f'Column names cannot be duplicated. Found duplicates: {", ".join(duplicates)}')
self._columns = names
Metadata
Metadata
Assignees
Labels
No labels