Skip to content

Improve dirty check performance of set #2393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 66 additions & 4 deletions src/NHibernate/Collection/Generic/SetHelpers/SetSnapShot.cs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,22 @@ namespace NHibernate.Collection.Generic.SetHelpers
internal class SetSnapShot<T> : ISetSnapshot<T>
{
private readonly List<T> _elements;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this into a LinkedHashMap<T, T> (from NHibernate.Util namespace) could do most of the job, and that would be way simpler, also avoiding having two collections instead of one.
But it does not support a null key. It could be modified to have an option for supporting it, maybe. This option would cause it to no more respect IDictionary<TKey, TValue> contract which explicitly states null keys are not supported, so it is debatable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact we do not need the ordering preserving capability either for the SetSnapshot. Set are not order sensitive, included in dirty check logics.

So we could use a regular Dictionary<T, T> instead of the List, with your additional field for handling the null case. The enumerator would have to be overloaded for also yielding the null if there is some, the count too for taking the null element into account, ...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My implementation focuses primarily on preventing any possible regressions as I am not so familiar with NH to anticipate what can and cannot appear in the snapshot. Other than being significantly more complex, LinkedHashMap would also not support multiple entries which would claim to be Equal. I assumed there was a reason for the List based implementation of the SetSnapShot, it just failed terribly due to the way TryGetValue is used in a loop against a similarly sized collection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This snapshot helper class ends up being consumed only in PersistentGenericSet. It may by passed around in NHibernate code but only for being used inside some PersistentGenericSet instance.
So you can check what features it actually needs.
And as it is an internal class, it can be freely changed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no strong guarantee of input. Also, an instance of SetSnapShot gets passed to and returned from GetOrphans, used in SequenceEquals, etc. Scenarios where the lookup by key does not even come into play, that's why I am creating the lookup dictionary very lazily.
I would need extremely deep knowledge of NH to feel confident changing the (even undocumented) contract of a class that's been there unchanged since 2.0.


/// <summary>
/// Lazy cache for TryGetValue calls
/// </summary>
private Dictionary<T, T> _lookupCache;

/// <summary>
/// Index of the first item that is not yet in the lookup cache
/// </summary>
private int _lookupCacheUnprocessed;

/// <summary>
/// <see langword="true" /> if the snapshot contains a <see langword="null" /> element
/// </summary>
private bool _containsNull;

public SetSnapShot()
{
_elements = new List<T>();
Expand Down Expand Up @@ -36,6 +52,16 @@ IEnumerator IEnumerable.GetEnumerator()
public void Add(T item)
{
_elements.Add(item);
// we don't need to invalidate the cache if it was already created,
// as items are always added at the end and will be caught by
// _lookupCacheUnprocessed

if (item == null)
{
// keep track if the collection contains a null item
// these are processed separately in TryGetValue
_containsNull = true;
}
}

public void Clear()
Expand All @@ -45,7 +71,9 @@ public void Clear()

public bool Contains(T item)
{
return _elements.Contains(item);
// use the caching implementation in TryGetValue
T discard;
return TryGetValue(item, out discard);
}

public void CopyTo(T[] array, int arrayIndex)
Expand Down Expand Up @@ -95,13 +123,47 @@ public bool IsReadOnly

public bool TryGetValue(T element, out T value)
{
var idx = _elements.IndexOf(element);
if (idx >= 0)
if (element == null)
{
// null-s cannot be handled by the cache,
// but we keep track of them separately
value = default(T);
return _containsNull;
}

if (_lookupCache == null)
{
_lookupCache = new Dictionary<T, T>();
}

if (_lookupCache.TryGetValue(element, out value))
{
value = _elements[idx];
return true;
}

// look at elements not yet in the cache
while (_lookupCacheUnprocessed < _elements.Count)
{
T snapshotElement = _elements[_lookupCacheUnprocessed++];
if (snapshotElement != null)
{
// be careful not to replace cache entries to preserve
// original semantics and return the first matching object
if (!_lookupCache.ContainsKey(snapshotElement))
{
_lookupCache.Add(snapshotElement, snapshotElement);
}
Comment on lines +150 to +155
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is accidental semantic. Having the snapshot containing two times the same element makes no sense, and would anyway cause bugs if this was actually happening. So it is unneeded.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not willing to risk that, see above comment.


if (snapshotElement.Equals(element))
{
// found a match, no need to continue filling the cache
// at this point
value = snapshotElement;
return true;
}
}
}

value = default(T);
return false;
}
Expand Down