differences in Series.map with defaultdict with different dtypes #49011

dannyi96 · 2022-10-08T23:14:09Z

closes BUG: differences in Series.map with defaultdict with different dtypes #48813
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

pandas/core/base.py

pandas/tests/apply/test_series_apply.py

mroeschke

Does this fix the defaultdict mutation issue? #47527

dannyi96 · 2022-10-10T17:29:22Z

Does this fix the defaultdict mutation issue? #47527

Just tested this ( & followed the older threads on this issue ).
Yes, these changes does fix the defaultdict mutation issue.

Eg -
Before changes -

s = pd.Series([1, 2, np.nan])
default_map = defaultdict(lambda: "missing", {1: "a", 2: "b", np.nan: "c"})

s.map(default_map) 
# 0              a
# 1              b
# 2    missing
# dtype: object

# defaultdict is mutated
default_map 
# defaultdict(<function <lambda> at 0x7f3e792833a0>, {1: 'a', 2: 'b', nan: 'c', nan: 'missing'})

After changes -

s = pd.Series([1, 2, np.nan])
default_map = defaultdict(lambda: "missing", {1: "a", 2: "b", np.nan: "c"})

s.map(default_map) 
# 0    a
# 1    b
# 2    c
# dtype: object

# defaultdict is not mutated
default_map 
# defaultdict(<function <lambda> at 0x7fe002b773a0>, {1: 'a', 2: 'b', nan: 'c'})

dannyi96 · 2022-10-08T23:15:29Z

pandas/tests/apply/test_series_apply.py

@@ -613,7 +630,7 @@ def test_map_defaultdict_ignore_na():
    mapping = defaultdict(int, {1: 10, np.nan: 42})
    ser = Series([1, np.nan, 2])
    result = ser.map(mapping)
-    expected = Series([10, 0, 0])
+    expected = Series([10, 42, 0])


this older testcase seemed incorrect & needed to be corrected with this change.
Am however unsure if the older behavior is as per intention.

According to the discussion we had in #47585 the old behavior is correct

ahh I see.
Do we keep it as it is now ? or is this enhancement fine - since there are some discrepancies observed (based on dtype) in map behavior as highlighted in bug description #48813
TIA

cc @rhshadrach can you weigh in?

+1 on changing the behavior in this test. The 0 in question here is because np.nan does not equal itself, and NumPy often returns views so that ids are not equal either; e.g.

mapping = defaultdict(int, {1: 10, np.nan: 42}) arr = np.array([1, np.nan, 2]) print(mapping[arr[1]]) # 0

I think introducing a better lookup for NaN values makes sense, and brings this in line with the Series case:

mapping = pd.Series({1: 10, np.nan: 42}) ser = Series([1, np.nan, 2]) print(ser.map(mapping)) # 0 10.0 # 1 42.0 # 2 NaN # dtype: float64

pandas/tests/apply/test_series_apply.py

mroeschke · 2022-10-11T17:16:16Z

Yes, these changes does fix the defaultdict mutation issue.

Could you add a test to check that the default dict isn't mutated?

mroeschke · 2022-10-11T17:16:29Z

cc @phofl since you made a fix here recently

dannyi96 · 2022-10-11T19:03:06Z

Yes, these changes does fix the defaultdict mutation issue.

Could you add a test to check that the default dict isn't mutated?

Updated the PR with a testcase to check defaultdict mutation.

pandas/tests/apply/test_series_apply.py

doc/source/whatsnew/v1.6.0.rst

…_bug_fix

pandas/tests/apply/test_series_apply.py

mroeschke

LGTM. @phofl or @rhshadrach merge when ready

phofl · 2022-10-14T17:00:04Z

Thx @dannyi96

…das-dev#49011)

dannyi96 added 2 commits October 9, 2022 04:43

differences in Series.map with defaultdict with different dtypes

138c298

Merge branch 'main' into daniel_bug_fix

a42c9dd

mroeschke reviewed Oct 10, 2022

View reviewed changes

pandas/core/base.py Outdated Show resolved Hide resolved

mroeschke reviewed Oct 10, 2022

View reviewed changes

pandas/tests/apply/test_series_apply.py Outdated Show resolved Hide resolved

mroeschke reviewed Oct 10, 2022

View reviewed changes

mroeschke added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Series Series data structure labels Oct 10, 2022

dannyi96 added 3 commits October 10, 2022 23:23

Merge branch 'main' into daniel_bug_fix

e15a9eb

code review: updated testcases

35a30f9

code review: updated testcases

3c2dddb

dannyi96 commented Oct 10, 2022

View reviewed changes

code review: added testcase for defaultdict mutation

6a43c79

dannyi96 added 2 commits October 12, 2022 00:44

code review: added testcase for defaultdict mutation

b618938

Merge branch 'main' into daniel_bug_fix

7fc199c

mroeschke reviewed Oct 12, 2022

View reviewed changes

pandas/tests/apply/test_series_apply.py Outdated Show resolved Hide resolved

code review: updated testcase for defaultdict mutation

31b69e9

mroeschke reviewed Oct 12, 2022

View reviewed changes

doc/source/whatsnew/v1.6.0.rst Outdated Show resolved Hide resolved

dannyi96 added 2 commits October 13, 2022 20:15

code review: updated whatsnew

6a8efb8

Merge branch 'main' of https://github.com/dannyi96/pandas into daniel…

349d54f

…_bug_fix

dannyi96 force-pushed the daniel_bug_fix branch from e8f11ee to 349d54f Compare October 13, 2022 16:35

phofl reviewed Oct 13, 2022

View reviewed changes

pandas/tests/apply/test_series_apply.py Show resolved Hide resolved

code review: updated testcase comments

ee788c2

dannyi96 force-pushed the daniel_bug_fix branch from 490dac3 to ee788c2 Compare October 14, 2022 14:33

mroeschke added this to the 2.0 milestone Oct 14, 2022

mroeschke approved these changes Oct 14, 2022

View reviewed changes

phofl approved these changes Oct 14, 2022

View reviewed changes

phofl merged commit 42a43e3 into pandas-dev:main Oct 14, 2022

dannyi96 deleted the daniel_bug_fix branch October 14, 2022 17:22

noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022

differences in Series.map with defaultdict with different dtypes (pan…

ee1890b

…das-dev#49011)

Uh oh!

differences in Series.map with defaultdict with different dtypes #49011

differences in Series.map with defaultdict with different dtypes #49011

Uh oh!

Conversation

dannyi96 commented Oct 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mroeschke left a comment

Choose a reason for hiding this comment

Uh oh!

dannyi96 commented Oct 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dannyi96 Oct 8, 2022

Choose a reason for hiding this comment

Uh oh!

phofl Oct 11, 2022

Choose a reason for hiding this comment

Uh oh!

dannyi96 Oct 11, 2022

Choose a reason for hiding this comment

Uh oh!

phofl Oct 11, 2022

Choose a reason for hiding this comment

Uh oh!

rhshadrach Oct 12, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mroeschke commented Oct 11, 2022

Uh oh!

mroeschke commented Oct 11, 2022

Uh oh!

dannyi96 commented Oct 11, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mroeschke left a comment

Choose a reason for hiding this comment

Uh oh!

phofl commented Oct 14, 2022

Uh oh!

Uh oh!

dannyi96 commented Oct 8, 2022 •

edited

Loading

dannyi96 commented Oct 10, 2022 •

edited

Loading