Description
We've been investigating our process stalling under load, multiple crash dumps show a similar set of stack traces to this:
After some testing I put together a crude harness over NHibernate.Util.AsyncReadWriteLock which reproduces the problem - run this on .NET Framework 4.8 and the unit test doesn't complete after an hour, switch to .NET Core 3.0 and it runs < 1 minute.
[TestMethod]
public Task LockHammer53()
{
var asyncReaderWriterLock = new AsyncReaderWriterLock();
return Task.WhenAll(Enumerable.Range(1, 100).Select(x => Task.Run(() =>
{
for (var i = 0; i < 1000000; i++)
{
using (asyncReaderWriterLock.WriteLock())
{
}
}
})));
}
It looks like it might be related to this bug in .NET
dotnet/runtime#28717
It was fixed in .NET Core 3.0 but remains unfixed in .NET Framework and it looks like they have no plans to fix it.
Interestingly StackExchange.Redis was majorly impacted by this and worked around this issue by switching from SemaphoreSlim to MutexSlim
StackExchange/StackExchange.Redis@6873941
So I would propose either:
a) Reverting the locking to the design in 5.2
b) Rewrite the AsyncReaderWriterLock to avoid the bug in SemaphoreSlim