Skip to content

warning in bar plot with multiple columns #18764

Closed
@AlbertDeFusco

Description

@AlbertDeFusco

Code Sample, a copy-pastable example if possible

>>> import numpy as np
>>> import pandas as pd
>>> pd.__version__
'0.21.0'
>>> a = np.random.randint(1, 100, size=10)
>>> b = 100 - a
>>> i = np.arange(100, 110)
>>> 
>>> df = pd.DataFrame(dict(a=a, b=b, i=i))
>>> df.plot.bar(x='i', y=['b','a'], stacked=True)
/Users/adefusco/Applications/miniconda3/envs/projects-data-analysis/lib/python3.6/site-packages/pandas/plotting/_core.py:1714: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
  series.name = label
<matplotlib.axes._subplots.AxesSubplot object at 0x10dec46d8>

Problem description

The warning message about series.name = label in this case is because it's trying to do the following. I am not using the label keyword argument.

y = ['a','b'] # <=== from inputs to function
label = kwds['label'] if 'label' in kwds else y
series = data[y].copy()  # Don't modify
series.name = label

and since series is actually a Pandas now thinks that a new column is being created with the values ['a','b'].

Expected Output

The warning message does not occur if the Index is used as the x-axis

df[['b','a']].plot.bar(stacked=True)

Proposed solution

In pandas/plotting/_core.py would the following be reasonable?

if y is not None:
    if is_scalar(y):
        if is_integer(y) and not data.columns.holds_integer():
            y = data.columns[y]
        
        label = kwds['label'] if 'label' in kwds else y
        series = data[y].copy()  # Don't modify
        series.name = label
        
        data = series
        
    elif is_dict_like(y):
        data = data[list(y.values())].copy()
        data = data.rename(columns=y)
    
    elif is_list_like(y):
        data = data[y].copy()
        
    elif not isinstance(data[y], ABCSeries):
        raise ValueError("y must be a label or position")

... continue with plot

this provides for the following options using the DataFrame defined above.

df.plot.bar(x='i', y=1)              # <-- plot the second column
df.plot.bar(x='i', y=['b,'a'])       # <-- plot multiple columns
df.plot.bar(x='i', y=dict(y=a, z=b)) # <-- plot multiple columns and with custom labels

After I teach myself how to build Pandas I'll test this change.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 17.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.21.0
pytest: None
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: None
numpy: 1.13.3
scipy: None
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.5.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions