Skip to content

Refactored DefaultEntityAliases to avoid unnecessary calculations #1482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 15, 2017

Conversation

bahusoid
Copy link
Member

DefaultEntityAliases is actively used for generating column aliases for entities in T-SQL SELECT statements.

My refactorings:

  1. Avoided default alias calculation if user provided alias is present;

  2. Added early exits if no user provided aliases exist (to avoid lookup for all columns on empty dic);

  3. Made RowIdAlias lazy (other fields are used one way or another but this one is dialect specific);

  4. Removed string.IsIntern calls for each generated alias. I didn't find any places where aliases can be interned inside NHibernate project (and can prepare a pull request that shows that string.IsIntern for aliases always returns null for all tests). According to b1500ea string.IsIntern was introduced instead of string.Intern to avoid excessive memory usage. But I don't understand why NHibernate should expect interned aliases - string.IsIntern doesn't look right to me here.

@fredericDelaporte
Copy link
Member

About intern string, they seem to be there as a result of the Java port, see NH-902 comments. (That is the ticket bound to the commit you have linked.)

About interns - do you have a special reason to use them in C#/CLR? Or is it a Java legacy?

Yes, interns are there because they were there in Java.

I implemented sort of a compromise - string.IsInterned will be used to find whether there is an equivalent interned string, and that one will be returned if found. If not, the original string will be used.

So initially they were always interned, not just switched to their intern version if one was there. But apparently reading from NH-902, this was causing high memory consumption due to interning to many various strings, which otherwise would have been garbage collected.

The applied solution to NH-902 does not look to me good. From a situation where too many various strings were interned we are now in a situation where maybe commonly used strings are not interned. That is not a domain where a systematic approach works. Interning should be handled on a case by case basis.

In the case of aliases, can we expect aliases to use only a small set of string values? If yes, they should be interned (reverting NH-902 for them). If no, if instead they have a large set of potential values, no interning should be handled, as your PR do.

If the situation is mixed, meaning if there is a set of common values frequently used along with a large set of various values, then NH-902 solution could have been beneficial if something had been added for interning the set of common values. But that is not the case according to your findings.

I think the safer is to remove interning handling for them. There is a suffix logic for avoiding aliases conflict which causes them to be numerous, so better not revert NH-902 for them.

Ideally it seems to me we should check all the other interning to check if they make sens.

@fredericDelaporte fredericDelaporte self-requested a review December 11, 2017 11:03
@bahusoid
Copy link
Member Author

bahusoid commented Dec 11, 2017

@fredericDelaporte I agree - the whole b1500ea looks suspicious. Interning might be useful for long living objects such as various string metadata in session factory (so commonly used property/column names and other string metadata between multiple factories are shared).

But IMHO it should not be used in frequently called methods - DefaultEntityAliases instances are created for each query executions. And it's not long living objects (it's scoped to Loader life) - we already generated alias (memory is occupied) and obtaining interned alias here is just a waste of CPU/memory.

@fredericDelaporte
Copy link
Member

Yes, I think removing interning for aliases is right. For other cases, we may consider opening other issues or PR.

suffixedKeyColumns = keyColumnsCandidates;
}
Intern(suffixedKeyColumns);
suffixedKeyColumns = GetSuffixedKeyAliases(persister, suffix);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new GetSuffixedKeyAliases has also been introduced on Hibernate side, under the name determineKeyAlias. Better use Hibernate name and try to order things similarly, for easing comparing the state of the two. (But adjust casing to .Net conventions of course, so DetermineKeyAlias.) See https://github.com/hibernate/hibernate-orm/blob/master/hibernate-core/src/main/java/org/hibernate/loader/DefaultEntityAliases.java

Now on their side they have not optimized the generated indexes and they have introduced other changes which should likely not be taken here (like the unquote thing), so your PR code will of course have differences.

If renaming, the other GetSuffixed methods already there should be renamed too in order to better match Hibernate code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fredericDelaporte Done. Let me know if I miss something.

hazzik
hazzik previously requested changes Dec 12, 2017
int size = persister.PropertyNames.Length;
string[][] suffixedPropertyAliases = new string[size][];
for (int j = 0; j < size; j++)
{
suffixedPropertyAliases[j] = GetUserProvidedAliases(persister.PropertyNames[j], GetPropertyAliases(persister, j));
Intern(suffixedPropertyAliases[j]);
suffixedPropertyAliases[j] = GetUserProvidedAliases(persister.PropertyNames[j], () => GetPropertyAliases(persister, j));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The () => GetPropertyAliases(persister, j) part is incorrect. When the lambda is resolved j would always be equal to size. See here: https://dotnetfiddle.net/wv8xC1

Copy link
Member Author

@bahusoid bahusoid Dec 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hazzik I don't think that your example is equal to mine. This lambda is resolved right inside cycle with closured variable and so it's correctly resolved. And in you example you execute action after cycle with modified closured variable. So equivalent would be:
https://dotnetfiddle.net/VrjF80

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it seems that you're right.

@hazzik
Copy link
Member

hazzik commented Dec 13, 2017

@bahusoid I still would like this to be refactored without using lambdas.

@bahusoid
Copy link
Member Author

@hazzik What's wrong with lambdas? I can get rid of the "access to the modified closure" and still use lambdas. I don't understand your concerns - could you please be more specific. How do you prefer to see "deferred value calculation" in some method.

@hazzik
Copy link
Member

hazzik commented Dec 13, 2017

There is no need to have “deferred” calculations. It’s all nice and clean with just simple ??. Just need to inline some of the private methods and reassemble them into other methods.

I actually did the refactoring, but wanted to give you a credit for the idea and the work.

@bahusoid
Copy link
Member Author

Ah.. I see :) Ok. I can do it

@bahusoid
Copy link
Member Author

@hazzik Done

@fredericDelaporte
Copy link
Member

fredericDelaporte commented Dec 13, 2017

There is no need to have “deferred” calculations. It’s all nice and clean with just simple ??. Just need to inline some of the private methods and reassemble them into other methods.

Indeed, I should have thought about it.

@hazzik
Copy link
Member

hazzik commented Dec 13, 2017

I've allowed myself to cleanup the code. Please review. @bahusoid @fredericDelaporte

Copy link
Member

@fredericDelaporte fredericDelaporte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some shortcuts are removed but it is more clean, and those shortcuts were only quite small savings.

}
else
string[] columns = GetUserProvidedAlias(AbstractEntityPersister.EntityClass);
if (columns != null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last refactoring removed by the way a discrepancy in previous refactoring causing a null value as first columns value to be replaced. So the last refactoring fix an undue change. (That said, this likely concerns an error case.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it.

Copy link
Member

@fredericDelaporte fredericDelaporte Dec 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous state of the PR was somewhat equivalent to

return columns?[0] ?? GetDiscriminatorAlias(persister, _suffix);

While the master code is more like current state.

So on master, if columns[0] contains null, null is yielded as an alias (which is likely an error case). With your last commit it will also be the case. But without your last commit, it would have been replaced by GetDiscriminatorAlias(persister, _suffix).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this is a correct behaviour: if we got null in columns[0] we should not swallow this case, but fail somewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agreed.

@bahusoid
Copy link
Member Author

bahusoid commented Dec 13, 2017

@hazzik My IMHOs:

  1. Now it's really hard to understand what's been changed in this PR. I personally think that fields renaming and method orderings should be commited as separate changeset without others changes.

  2. @fredericDelaporte

and those shortcuts were only quite small savings

Maybe. But the thing is - I believe that user provided aliases are hardly ever used by most users. So those shortcuts are the main route of code for most users - why not to save a little? See example of test with custom provided aliases - NativeSqlCollectionLoaderFixture.LoadCompositeElementsWithWithCustomAliases

@hazzik
Copy link
Member

hazzik commented Dec 13, 2017

@bahusoid do you want me to redo this in 2 commits? I don't mind.

@bahusoid
Copy link
Member Author

@hazzik Yeah it would be better. And I want my shortcuts back :)

@hazzik
Copy link
Member

hazzik commented Dec 14, 2017

071b7449-84d5-4f26-98de-a0be3bb77a1f

@hazzik
Copy link
Member

hazzik commented Dec 14, 2017

@bahusoid done.

@bahusoid
Copy link
Member Author

@hazzik Have nothing to complain about.

Though... )) Do we really need both SafeGetUserProvidedAliases/GetUserProvidedAliases? "Unsafe" version is used only twice and only if user aliases are provided. I think code would be cleaner with only safe version.

Just a thought - it's fine by me to keep it.

@hazzik hazzik merged commit aada800 into nhibernate:master Dec 15, 2017
@fredericDelaporte
Copy link
Member

fredericDelaporte commented Dec 15, 2017

Though... )) Do we really need both SafeGetUserProvidedAliases/GetUserProvidedAliases? "Unsafe" version is used only twice and only if user aliases are provided. I think code would be cleaner with only safe version.

This is always a tricky part with shortcuts/small code optimizations: up to which point should we do them? That is why I try to not debate much on them, unless some benchmark show concrete effects.

About the merge, I was thinking the point of splitting the commits was not only for review, but also for helping understanding the changes in the history. So merging them all squashed to a single commit was not what I was expecting.
Of course it would have required to have this PR commits squashed in a minimal set of commits doing only the final changes but still split between "code renamed, restyled and moved around" and "code logic changed".

public string[] SuffixedKeyAliases { get; }

// TODO: not visible to the user!
public string RowIdAlias => _rowIdAlias ?? (_rowIdAlias = Loadable.RowIdAlias + _suffix);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think none of us have realized we were changing a class which instances can be used by concurrent threads. In a concurrent usage situation, this micro-optimization can instead be a micro-performance loss! We would have better not have done it in my opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants