Skip to content

BUG: hashing's are the same for different key values for hash_pandas_object #41404

Closed
@Sandy4321

Description

@Sandy4321
  • [x ] I have checked that this issue has not already been reported.

  • [ x] I have confirmed this bug exists on the latest version of pandas.

  • [ x] (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here
hash_pandas_object(test[       columns_names[i]      ],    index=True, encoding='utf8', hash_key='012' ,    categorize=True)
0      3713087409444908179
1      7478705303072568462
2     12024724921319894105
3     12785939622558835299
4      9788992550609991128
5      1239052552041868816
6      9610202078597672705
7     12287384021013641209
8     10264240190786022141
9     10535148974563425818
10    10238940258630658604
11    15446383648481672096
12    14265484681526586699
13     8862960024351814462
dtype: uint64

hash_pandas_object(test[       columns_names[i]      ],    index=True, encoding='utf8', hash_key='01298768755' ,    categorize=True)
0      3713087409444908179
1      7478705303072568462
2     12024724921319894105
3     12785939622558835299
4      9788992550609991128
5      1239052552041868816
6      9610202078597672705
7     12287384021013641209
8     10264240190786022141
9     10535148974563425818
10    10238940258630658604
11    15446383648481672096
12    14265484681526586699
13     8862960024351814462
dtype: uint64


hash_pandas_object(test[     [  columns_names[i] , columns_names[j]  ]     ],    index=True, encoding='utf8', hash_key='01' ,    categorize=True)
0     11107058607426530111
1     15666232225746534312
2      1136675766145783381
3     14892489092684772659
4      8519430825150424018
5       550646855301521146
6      3846031041217881485
7      2936614219041217571
8     16182698869780262111
9      2895548739675332954
10      677258434224654732
11     6105852029672525672
12    15095703462911844621
13     6081994522921680694
dtype: uint64
hash_pandas_object(test[     [  columns_names[i] , columns_names[j]  ]     ],    index=True, encoding='utf8', hash_key='0198076674534' ,    categorize=True)
0     11107058607426530111
1     15666232225746534312
2      1136675766145783381
3     14892489092684772659
4      8519430825150424018
5       550646855301521146
6      3846031041217881485
7      2936614219041217571
8     16182698869780262111
9      2895548739675332954
10      677258434224654732
11     6105852029672525672
12    15095703462911844621
13     6081994522921680694
dtype: uint64


test[     [  columns_names[i] , columns_names[j]  ]     ]
    A  B
0   0 -1
1   0 -1
2   0  0
3   0  0
4   1  0
5   1  0
6   2  0
7   2  2
8   2  2
9   2  2
10  2  2
11  2  2
12 -1  2
13 -1  2


hashing's are the same for different key values 


Problem description

it should be different hashing values for different keys

Expected Output

Output of pd.show_versions()

[paste the output of pd.show_versions() here leaving a blank line after the details tag]

print("sklearn.version = ", sklearn.version)
sklearn.version = 0.24.2

pd.show_versions()

INSTALLED VERSIONS

commit : 2cb9652
python : 3.8.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 1.2.4
numpy : 1.19.5
pytz : 2021.1
dateutil : 2.8.1
pip : 20.3.3
setuptools : 51.3.3
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.3
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions