-
Notifications
You must be signed in to change notification settings - Fork 411
Move claimable_htlcs
to separate lock
#1772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move claimable_htlcs
to separate lock
#1772
Conversation
Codecov ReportBase: 90.77% // Head: 90.76% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #1772 +/- ##
==========================================
- Coverage 90.77% 90.76% -0.01%
==========================================
Files 87 87
Lines 47595 47597 +2
Branches 47595 47597 +2
==========================================
- Hits 43204 43203 -1
- Misses 4391 4394 +3
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
f048a08
to
f76d011
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically LGTM.
lightning/src/ln/channelmanager.rs
Outdated
@@ -3650,7 +3655,9 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana | |||
true | |||
}); | |||
|
|||
channel_state.claimable_htlcs.retain(|payment_hash, (_, htlcs)| { | |||
mem::drop(channel_state_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than drop'ing here, note that the scope this is in was explicitly for the channel_state lock, so we should just move the next hunk out of the scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, thanks! 736d403
lightning/src/ln/channelmanager.rs
Outdated
let peer_state = peer_state_mutex.lock().unwrap(); | ||
peer_state.latest_features.write(writer)?; | ||
} | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think its worth bothering with a bunch of scopes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, those were an attempt to not violate the "lock order branches" given my previous understanding of how we handle them. But given your clarification above there's no need for these scopes :)!
efbd2f2
to
01e4912
Compare
Snap will look into fixing the CI windows build fail tomorrow. |
0c48ede
to
938769c
Compare
Super sorry for the long and complicated change log here. Added acquiring of all locks in the beginning of the Still haven't fixed the CI issues with windows. |
lightning/src/ln/channelmanager.rs
Outdated
let forward_htlcs = self.forward_htlcs.lock().unwrap(); | ||
let channel_state = self.channel_state.lock().unwrap(); | ||
let per_peer_state = self.per_peer_state.write().unwrap(); | ||
let peer_states = per_peer_state.iter().map(|(peer_pubkey, peer_state_mutex)| (peer_pubkey, peer_state_mutex.lock().unwrap())).collect::<Vec<(&PublicKey, MutexGuard<'_, PeerState>)>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TheBlueMatt are we okay this creating a lock order for all the inner locks of the PeerState
mutexes? In the "non-serialization" state (ie. outside of ChannelManager::write
) we should never acquire two PeerState
locks at the same time, due to the deadlock possibility. IIUC this "debug lockorder" will be violated in production runtime when we move towards parallelization, but that should be ok as long as the same thread doesn't acquire multiple PeerStates at the same time?
If we are not ok with this, I don't think we can take the locks in the beginning of the ChannelManager::write
function, and revert back to how this PR originally acquired the locks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, yea, that's super gross. Sorry for the confusion, I hadn't intended to suggest that we should always take all the locks at the top, only that it doesn't really matter where the locks go, and if its easier to take them in the right order, don't bother trying to super manually scope things but rather just move the locks around/up until we get the right order. We can scope (or drop) if we have to, but there's no reason to do it "by default" to try to ensure we meet "lock-order requirements" (unless its gonna panic in tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok :)! I'll go ahead and remove the commit that takes all the locks in the beginning of the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
/// failed/claimed by the user. | ||
/// | ||
/// Note that, no consistency guarantees are made about the channels given here actually | ||
/// existing anymore by the time you go to read them! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could precise what is meant by "channels given here actually existing anymore by the time" It could mean a) The channel is closed on-chain or funding has been aborted or b) we don't have any reference in our local map (i.e ChannelHolder:by_id
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't those two equivalent? I'm not sure what would be more specific.
lightning/src/ln/channelmanager.rs
Outdated
@@ -691,6 +684,8 @@ pub type SimpleRefChannelManager<'a, 'b, 'c, 'd, 'e, M, T, F, L> = ChannelManage | |||
// | | | |||
// | |__`pending_inbound_payments` | |||
// | | | |||
// | |__`claimable_htlcs` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding this lock order tree is currently prescriptive (where we would like to go) but not descriptive (where we're today), in the sense claimable_htlcs
lock is currently taken in process_pending_htlc_forwards()
without taking per_peer_state
, or what do you aim to convey?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the tree is "descriptive" today in the sense that none of the illustration is violated in any of our code (or well after this PR we will violate the branch illustration, but just in the ChannelManager::write
function).
But you are correct that the claimable_htlcs
lock for example is never acquired at the same time as the per_peer_state
lock in our codebase.
However the tree illustration is also ment to give you guidance on which order you are supposed to take the locks in, in case you implement new code in the future that needs to acquire both locks at the same time.
I deliberately choose to illustrate the per_peer_state
before pending_inbound_payments
(and therefore also claimable_htlcs
) even though that's not actually enforced currently by the "real" lock order, as pending_inbound_payments
has a "real" lock order currently after channel_state
currently. And I wanted to avoid that new PRs before #1507 adds code that creates "real" lock orders for pending_inbound_payments
and pending_inbound_payments
before per_peer_state
but after channel_state
as it would need to be changed once channels are moved to per_peer_state
with #1507.
Though I've realized now that most of the places in the code that requires that pending_inbound_payments
is locked before channel_state
is actually removed with this PR :). I actually created a draft PR #1773 that moves the pending_inbound_payments
and pending_outbound_payments
lock order to before channel_state
.
Now that I think about it again, we should actually include that lock order change with this PR, as doing so will solve most of the issues I've faced in this PR with the ChannelManager::write
function, and will allow us to take the pending_inbound_payments
, pending_outbound_payments
and claimable_htlcs
locks and hold them throughout the function 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last two commits look good! Will check out the first commit once it's updated
938769c
to
30a77b7
Compare
Thanks for the reviews @ariard and @valentinewallace :)! Dropped the first commit that acquired all of the locks in the top of the Edit: So locks are still getting poisoned on Windows builds only... I'm not really sure why this is yet. I would have hoped that this was fixed by the latest version. I'll look into it. |
c0813d3
to
6c94add
Compare
@@ -681,17 +674,19 @@ pub type SimpleRefChannelManager<'a, 'b, 'c, 'd, 'e, M, T, F, L> = ChannelManage | |||
// | | |||
// |__`forward_htlcs` | |||
// | | |||
// |__`channel_state` | |||
// |__`pending_inbound_payments` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a general question. Would we actually in the long run like pending_inbound_payments
and pending_outbound_payments
to be peer specific (ie. be stored in the PeerState
)? I haven't really researched all the implications that would have yet. From my first hunch it'd be pretty beneficial for parallelization (especially pending_outbound_payments
), but will be hard in terms of backwards compatibly. But if we would like it, maybe my last commit isn't such a great idea in the long run, but might be ok for now.
Edit: Oh actually as they'd be part of the PeerState
mutex lock and not individual locks any longer, the lock order change by the last commit should be ok :)
So none of the tests fails on a local windows build I setup. I'm thinking that it might help if I rebase on main. Please let me know when if your okay with me rebasing this on upstream :) |
Go for it IMO. I tried to look into that error and it baffles me 😬 |
6c94add
to
8318961
Compare
Thanks @valentinewallace. Rebased on upstream with no changes. Hopefully CI passes now 😅 |
Hmmm ok so the CI still doesn't pass. I'm really baffled as well @valentinewallace 😅... Will try to figure out what's going on 😁 |
8318961
to
8ce4756
Compare
Rebased on upstream to fix merge conflicts |
As I'm unable to produce the CI problems locally, I'm pushing a test commit to see if this is somehow related to that Windows treats the acquiring of the RwLock for |
93a64fc
to
c2037a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to squash.
/// Map from payment hash to the payment data and any HTLCs which are to us and can be | ||
/// failed/claimed by the user. | ||
/// | ||
/// Note that, no consistency guarantees are made about the channels given here actually |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/,//
// | |__`per_peer_state` | ||
// | | | ||
// | |__`outbound_scid_aliases` | ||
// | |__`pending_outbound_payments` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is true - claim_funds_internal
takes a pending_outbound_payments
lock while holding the channel_state
mutex. Maybe just drop the mutex-order-swap commit for now and we can do it separately later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm the channel_state lock is dropped (mem::drop
) before pending_outbound_payments
is locked in claim_funds_internal
, no?
I can of course instead drop the mutex-order-swap commit for now if we'd prefer though :). That will require that we clone the PaymentPurpose
for the claimable_htlcs
in the ChannelMananger::write
function as per the #1772 (comment) discussion, or that we do the weird scoping I did with 266175d. Are we ok with any of those alternatives?
@ViktorTigerstrom OOC, how did you debug the Windows poison error? The fix seems totally unrelated to the PR.. |
Sorry for taking some time to continue this PR, I'm having visitors this week so I'm a bit strapped on time.
We'll I actually never managed to debug the error 😅... I tried debugging the error locally on a Windows machine, but weirdly enough no tests failed locally on that machine. I even tried both Windows 10 and 11 in case it was some weird specific OS version issue. My best guess so far of why the errors don't appear locally on my Windows machine is that lock orders assigned across different tests aren't checked on a local build, but only trough CI (which may be related to some incorrect local configuration on my part or similar), as we have some tests that acquire the Will let you know what I find once I've had time to dig into it deeper :)! |
Thanks for the report, what a strange error. Can't see why this PR brought it out. |
Feel free to squash. Re: things different on CI vs local - did you try |
As the `channel_state` lock will be removed, we prepare for that by flipping the lock order for `pending_inbound_payments` and `pending_outbound_payments` locks to before the `channel_state` lock.
For Windows build only, the `TestPersister::chain_sync_monitor_persistences` lock has a lock order before the `ChannelManager::per_peer_state` lock. This fix ensures that the `per_peer_state` lock isn't held before the `TestPersister::chain_sync_monitor_persistences` lock is acquired.
c2037a1
to
782eb36
Compare
Squashed without changes. @TheBlueMatt are we okay with keeping 6b12117 as discussed in #1772 (comment), or would you prefer cloning the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
Closes #1732
Moves
claimable_htlcs
out from thechannel_state
lock into a separate lock. This simplifies the removal of thechannel_state
lock, as well as simplifies changing its lock order for #1507.