Skip to content

Recompute hashcodes on deserialization #1891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

fredericDelaporte
Copy link
Member

@fredericDelaporte fredericDelaporte commented Oct 27, 2018

Computed hashcode values may vary from one process to another. For being consistent inside a process, they must be re-computed after deserialization.

Fixes #1888, and also fixes #1089.

Possible breaking change: many classes, which were not serializing the session factory, do now serialize it. In case of cross-process serialization/deserialization, these session factories will need to be properly named, by setting session_factory_name setting in the configuration used to build them.

@fredericDelaporte

This comment has been minimized.

hazzik
hazzik previously approved these changes Oct 27, 2018
@fredericDelaporte
Copy link
Member Author

We have two troubles here:

  1. Hashcodes must not be serialized, as they are not guaranteed to be generated identical by different processes.
  2. Hashcodes depending on iterating over dictionaries or sets cannot be reliably computed during deserialization, because dictionaries or sets may not be populated till the very end of the deserialization.

So a thing like QueryKey which holds a set of FilterKey triggers the FilterKey hashcode computation on deserializing. But we have not guarantees that the filter key inner dictionaries, on which its own hashcode computation depends, are ready.

We need to eliminate dictionaries and sets from the serialized state when they are used for hashcode computation, in order to ensure we can compute hashcodes of object without risking iterating over an not yet populated set or dictionary.

@fredericDelaporte fredericDelaporte changed the title Recompute hashcodes on deserialization WIP - Recompute hashcodes on deserialization Oct 27, 2018
@fredericDelaporte

This comment has been minimized.

@gokhanabatay

This comment has been minimized.

@fredericDelaporte

This comment has been minimized.

@fredericDelaporte fredericDelaporte changed the title WIP - Recompute hashcodes on deserialization Recompute hashcodes on deserialization Oct 28, 2018
Copy link
Member Author

@fredericDelaporte fredericDelaporte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the issue should be reliably fixed.

// computed. Compute the hashcode on the fly. So long as this happens only during deserialization, there
// will be no thread safety issues. For the hashcode to be always defined after deserialization, the
// deserialization callback is used.
return _hashCode ?? GenerateHashCode();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could have written it _hashCode ?? (_hashCode = GenerateHashCode()).Value, for lazily initializing it. But that is slightly more complicated code for a corner case, which should rarely happen: deserialization in which the object is put in a set or dictionary as part of the deserialization process. The constructor and the deserialization callback are the normal paths of initialization of this member.
They are mandatory, because solely relying on a lazy initialization could expose the code of GenerateHashCode to multi-threaded usage. It should work in most cases, but better reduce as much as possible the code having to support multi-threading.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks much more readable than before

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some comments at QueryKey and FilterKey, but i test it and it works.
The cache problem will be fixed with this pr.

{
var hash = query.GetHashCode();
hash = 29 * hash + (shallow ? 1 : 0);
hash = 29 * hash + CollectionHelper.GetHashCode(filterNames, filterNames.Comparer);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filterNames.Comparer is the default one here, the hashset is always constructed by this class, without a custom comparer. The new code does no more use it explicitly, as the hashset may not yet be available, if the hashcode computation has to be done during deserialization.

int hash = query.GetHashCode();
hash = 29 * hash + collectionRole.GetHashCode();
hash = 29 * hash + (shallow ? 1 : 0);
hash = 29 * hash + CollectionHelper.GetHashCode(filterNames, filterNames.Comparer);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous comment.

// computed. Compute the hashcode on the fly. So long as this happens only during deserialization, there
// will be no thread safety issues. For the hashcode to be always defined after deserialization, the
// deserialization callback is used.
return _hashCode ?? GenerateHashCode();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some comments at QueryKey and FilterKey, but i test it and it works.
The cache problem will be fixed with this pr.

@fredericDelaporte
Copy link
Member Author

So at least in the use case of @gokhanabatay, this PR solves the trouble of hashcode mismatch seen with his distributed cache usage.

That is still a change a bit heavier than what I would have expected. But it seems to me needed.

@gokhanabatay
Copy link
Contributor

So at least in the use case of @gokhanabatay, this PR solves the trouble of hashcode mismatch seen with his distributed cache usage.

That is still a change a bit heavier than what I would have expected. But it seems to me needed.

I agree thank you @fredericDelaporte I will wait to merge at master branch.

gokhanabatay
gokhanabatay previously approved these changes Oct 28, 2018
@gokhanabatay
Copy link
Contributor

gokhanabatay commented Oct 29, 2018

@hazzik
@fredericDelaporte
I post the issue to dotnet team please look and watch further action;

Deserialize binary[] to ISet wrong Internal HashCode has been stored #2027
dotnet/core#2027

And finally I got suspicious about https://github.com/nhibernate/NHibernate-Caches is it really works as distributed, because hash code will not return same result across process. If it is than best option is do not check hash codes just check object equality.

Do I need to ignore hash code with distributed cache usage like below? KeySanitizer

CacheConstraints constraints = new CacheConstraints()
{
                MaxKeySize = 250,
                KeySanitizer = x => x.IndexOf('@') >= 0 ? x.Substring(0, x.IndexOf('@')) : x
};

@fredericDelaporte
Copy link
Member Author

fredericDelaporte commented Oct 30, 2018

The PR here precisely dodges the issue you have reported in dotnet core repository. That is the reason for having serialized dictionaries and sets content with arrays in this PR.

The fact that hashcodes are not computed with same values across processes is also the reason for recomputing them on deserialization instead of using the previously stored value in serialized data.

About the use of hashcode in keys, the issue is already known and handled, see nhibernate/NHibernate-Caches#53. This is not yet released (v5.5).

But a static configuration property is already available since v5.4: CoreDistributedCacheProvider.AppendHashcodeToKey. Set it to false. (It can also be set by specifying the attribute append-hashcode="false" on the <coredistributedcache> configuration node.)

Copy link
Member

@hazzik hazzik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider following:

@fredericDelaporte

This comment has been minimized.

@fredericDelaporte

This comment has been minimized.

@hazzik hazzik force-pushed the SerializedHashcode branch from 0a80c68 to 5dc7087 Compare November 4, 2018 23:20
@hazzik
Copy link
Member

hazzik commented Nov 4, 2018

Rebased with autosquash

Computed hashcode values may vary from one process to another. For being
consistent inside a process, they must be re-computed after
deserialization.

Fixes nhibernate#1888
@fredericDelaporte fredericDelaporte force-pushed the SerializedHashcode branch 2 times, most recently from e203384 to 1d9b81d Compare November 5, 2018 12:59
Re-implement Printer.ToString
@hazzik
Copy link
Member

hazzik commented Nov 5, 2018

Also fixes #1089.

@fredericDelaporte

This comment has been minimized.

@hazzik

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Second level cache key mismatch NH-2755 - LockMode hash differs in x86 and 64bit OS
3 participants