Description
Code Sample, a copy-pastable example if possible
from os import path
import pandas as pd
import numpy as np
input_file = path.join(r'C:\DUMP', 'Process Log 2 Week_2.txt')
tdf = pd.read_csv(input_file, low_memory=False)
# Value Error in this statement -->
tdf_gsdf = tdvdf.groupby(tdvdf.columns.tolist()).size()
Problem description
The Above code is giving 'Value Error : Length of passed values is 65, index implies 0'
I'm trying to identify unique/duplicate rows by grouping by all of the columns in Data Frame.
(Attached the text file here).
Process Log 2 Week_2.txt
I'm new to Python, Pandas and this community as well. just trying to automate few tasks in my project.
I think this might be related to Issue #21624. Not sure how to link.
Expected Output
Output should give distinct rows and corresponding count from DataFrame.
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.4
pytest: 3.8.0
pip: 10.0.1
setuptools: 40.4.3
Cython: 0.28.5
numpy: 1.15.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.8.1
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.1
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None