Skip to content

[WIP] Intern string metadata in Session Factory #1503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

bahusoid
Copy link
Member

String interning is pretty powerful concept but it needs to be used carefully.
Intern methods should not be used in frequently called methods or in short living objects or for unique strings. But if properly used it can drastically reduce memory usage. What's really cool - string intern pool is shared across app domains (so multiple web-sites running in the same app pool will share all interned strings)

As we already shortly discussed here #1482 (comment) latest intern related refactoring b1500ea
is not very useful. Almost all string.Intern calls were replaced with string.IsIntern - and for most cases (like Collection.Role or entity aliases) those strings are internally calculated inside NHibernate framework. It's very unlikely that such strings will be interned by user code - so those calls affect performance with no added value.

So it's my attempt to implement this feature properly. I think interning should be configurable depending on usage scenario. For instance if it's simple app with single database - there's no need to intern unique session factory metadata like Collection.Role. But it might be very useful if multiple session factories/app domains are used.

I see configuration levels for interning in the following way:
Disabled - just to make people happy :). Every feature that might affect performance globally should be optional.

Minimal - Very close to current behaviour (make sure that entity names and class names are interned) except I removed some intern pool checks that I think is not useful (like string.IsIntern calls for Collection.Role)

Default - Interning level for single session factory scenario. I think it should be default level.
Currently I added property names and column names for this mode as I believe in most databases many column/property names are duplicated (like Id, Name, ParentId, ChildId...)
Also added Suffixes for EntityLoader.

SessionFactories - Interning level for multiple session factories scenario.
Added Entity Aliases for EntityLoader (interning is back for DefaultEntityAliases but only for EntityLoader), Collection.Role, PersistentClass.LoaderName (it was interned by default, but I'm not sure - is it really something that needs to be interned for single session factory?)

AppDomains - Multiple app domains scenario.
Added metadata for StaticProxyFactory.

It's currently just a draft (I didn't finish investigation about what else should be interned) - let me know what you think:
Other thoughts about how it should be done (configuration or anything else)?
What else should be interned?

@fredericDelaporte
Copy link
Member

For this kind of change, adding a setting changing (even slightly) the behavior in many places, I would rather have some practical measurement of the gain.

Do you think you could benchmark what impact this PR and its different option values have on both spatial and temporal performances? (Memory usage and execution time.)

You may try to see if it has a significant impact on Frans's benchmark, but I am not sure this benchmark is adequate for this.

@bahusoid
Copy link
Member Author

@fredericDelaporte Ok. Give me a few days. I will check out Frans's benchmark but I don't expect any significant memory drops in single factory scenario. Also I don't think it will affect execution time - my changes affect performance of session factory creation only which is not measured in those tests. But will see...
Also I will try to adjust code to measure things that affect this code.

@bahusoid
Copy link
Member Author

bahusoid commented Jan 5, 2018

I played with interning using mappings from RawDataBencher and a bit disappointed - with 2 app domains and 3 session factories in each app domain (so 6 session factories in total) savings between minimum(basically current code) and maximum interning is about 300 Kb. There is no noticeable performance penalties but the gain is also insignificant in the current state of PR.

Pushing my changes just for academic purposes : )

But I noticed a few things that I think should be addressed:
1) It's become clear to me why session factory creation in cycle from NH-902 lead to "out of memory" exceptions. It's because of Table class:

public Table()
{
uniqueInteger = tableCounter++;
}

[ThreadStatic]
private static int tableCounter;

This UniqueInteger is used in alias generation and that leads to different aliases for the same table in different session factories. And so alias interning is completely useless and dangerous.

I think we should make UniqueInteger scoped to session factory. So aliases should be the same for 2 session factories created for the same database. As now there is big possibility that for the same database different aliases will be generated if multiple apps are used. That's not very good as it fills DB server internal query cache with identical logically queries (at least I believe it's true for SQL Server).

2) NHibernate.Cfg.Environment class static initialization takes almost 2 Mb. And almost 1 Mb of it is taken by statically cached XML schemas in

private const string CfgSchemaResource = "NHibernate.nhibernate-configuration.xsd";
private const string MappingSchemaResource = "NHibernate.nhibernate-mapping.xsd";
private static readonly XmlSchemaSet ConfigSchemaSet = ReadXmlSchemaFromEmbeddedResource(CfgSchemaResource);
private static readonly XmlSchemaSet MappingSchemaSet = ReadXmlSchemaFromEmbeddedResource(MappingSchemaResource);

I personally think that both variables should not be static and scoped to session factory creation. But if you think that static caching is useful - hbm schema needs to be lazy initialized.

Let me know what you think - I can try to prepare a PRs for both points I mentioned if you agree with me.

@fredericDelaporte
Copy link
Member

fredericDelaporte commented Jan 5, 2018

  1. This tableCounter issue is already known but I was no more thinking about it. It was written about here, leading to adding Table counter for aliases should be stable #1417 issue. It is not an easy subject, and may be hard to do without breaking changes (which would then means it would require a major version). Session factory may not be the right place for storing it, see previous links comments. Of course fell free to add additional considerations in Table counter for aliases should be stable #1417 like the trouble it may cause if trying to intern strings, ...

    As now there is big possibility that for the same database different aliases will be generated if multiple apps are used.

    I do not get that point. Do you mean multiple instances of the same app? In such case it looks unlikely to me for applications using a single session factory, unless some randomness in the way they are bootstrapped has been introduced, but I do not see a reason for that. Or do you mean multiple different applications? In such case, if they do not share the same mappings, I do not think we can do much for still generating the same aliases.

  2. Well, maybe a good finding. But we have to make sure that finding a way to release the memory taken by those schema does cause multiple loading of them instead, at least not multiple ones in a short period of time.

@bahusoid
Copy link
Member Author

bahusoid commented Jan 5, 2018

I do not get that point. Do you mean multiple instance of the same app?

I meant the case multiple apps and multiple databases, same mapping for all databases. So for App1 Database1 was accessed first and session factory is generated with index 1. But for App2 Database2 was first accessed so for Database1 session factory is generated with index 2.

@hazzik
Copy link
Member

hazzik commented Jan 11, 2018

Shall we close this?

@bahusoid
Copy link
Member Author

Not yet. Let me play a little bit with stable table aliases (#1517)

@hazzik
Copy link
Member

hazzik commented Jan 30, 2018

Shall we close this now?

@bahusoid
Copy link
Member Author

Ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants