Skip to content

Commit 91a9fa4

Browse files
committed
Disallow dual-sync-async persistence without restarting
In general, we don't expect users to persist `ChannelMonitor[Update]`s both synchronously and asynchronously for a single `ChannelManager` instance. If a user has implemented asynchronous persistence, they should generally always use that, as there is then no advantage to them to occasionally persist synchronously. Even still, in 920d96e we fixed some bugs related to such operation, and noted that "there isn't much cost to supporting it". Sadly, this is not true. Specifically, the dual-sync-async persistence flow is ill-defined and difficult to define in away that a user can realistically implement. Consider the case of a `ChannelMonitorUpdate` which is persisted asynchronously and while it is still being persisted a new `ChannelMonitorUpdate` is created. If the second `ChannelMonitorUpdate` is persisted synchronously, the `ChannelManager` will be left with a single pending `ChannelMonitorUpdate` which is not the latest. If we were to then restart, the latest copy of the `ChannelMonitor` would be that without any updates, but the `ChannelManager` has a pending `ChannelMonitorUpdate` for the next update, but not the one after that. The user would then have to handle the replayed `ChannelMonitorUpdate` and then find the second `ChannelMonitorUpdate` on disk and somehow know to replay that one as well. Further, we currently have a bug in handling this scenario as we'll complete all pending post-update actions when the second `ChannelMonitorUpdate` gets persisted synchronously, even though the first `ChannelMonitorUpdate` is still pending. While we could rather trivially fix these issues, addressing the larger API question above is difficult and as we don't anticipate this use-case being important, we just disable it here. Note that we continue to support it internally as some 39 tests rely on it. Issue highlighted by (changes to the) chanmon_consistency fuzz target (in the next commit).
1 parent 83e9e80 commit 91a9fa4

File tree

3 files changed

+76
-32
lines changed

3 files changed

+76
-32
lines changed

fuzz/src/chanmon_consistency.rs

Lines changed: 44 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,8 @@ use bitcoin::secp256k1::{self, Message, PublicKey, Scalar, Secp256k1, SecretKey}
8282

8383
use lightning::io::Cursor;
8484
use lightning::util::dyn_signer::DynSigner;
85+
86+
use std::cell::RefCell;
8587
use std::cmp::{self, Ordering};
8688
use std::mem;
8789
use std::sync::atomic;
@@ -674,6 +676,9 @@ pub fn do_test<Out: Output>(data: &[u8], underlying_out: Out, anchors: bool) {
674676
}};
675677
}
676678

679+
let default_mon_style = RefCell::new(ChannelMonitorUpdateStatus::Completed);
680+
let mon_style = [default_mon_style.clone(), default_mon_style.clone(), default_mon_style];
681+
677682
macro_rules! reload_node {
678683
($ser: expr, $node_id: expr, $old_monitors: expr, $keys_manager: expr, $fee_estimator: expr) => {{
679684
let keys_manager = Arc::clone(&$keys_manager);
@@ -746,6 +751,7 @@ pub fn do_test<Out: Output>(data: &[u8], underlying_out: Out, anchors: bool) {
746751
Ok(ChannelMonitorUpdateStatus::Completed)
747752
);
748753
}
754+
*chain_monitor.persister.update_ret.lock().unwrap() = *mon_style[$node_id].borrow();
749755
res
750756
}};
751757
}
@@ -1393,28 +1399,22 @@ pub fn do_test<Out: Output>(data: &[u8], underlying_out: Out, anchors: bool) {
13931399
// bit-twiddling mutations to have similar effects. This is probably overkill, but no
13941400
// harm in doing so.
13951401
0x00 => {
1396-
*monitor_a.persister.update_ret.lock().unwrap() =
1397-
ChannelMonitorUpdateStatus::InProgress
1402+
*mon_style[0].borrow_mut() = ChannelMonitorUpdateStatus::InProgress;
13981403
},
13991404
0x01 => {
1400-
*monitor_b.persister.update_ret.lock().unwrap() =
1401-
ChannelMonitorUpdateStatus::InProgress
1405+
*mon_style[1].borrow_mut() = ChannelMonitorUpdateStatus::InProgress;
14021406
},
14031407
0x02 => {
1404-
*monitor_c.persister.update_ret.lock().unwrap() =
1405-
ChannelMonitorUpdateStatus::InProgress
1408+
*mon_style[2].borrow_mut() = ChannelMonitorUpdateStatus::InProgress;
14061409
},
14071410
0x04 => {
1408-
*monitor_a.persister.update_ret.lock().unwrap() =
1409-
ChannelMonitorUpdateStatus::Completed
1411+
*mon_style[0].borrow_mut() = ChannelMonitorUpdateStatus::Completed;
14101412
},
14111413
0x05 => {
1412-
*monitor_b.persister.update_ret.lock().unwrap() =
1413-
ChannelMonitorUpdateStatus::Completed
1414+
*mon_style[1].borrow_mut() = ChannelMonitorUpdateStatus::Completed;
14141415
},
14151416
0x06 => {
1416-
*monitor_c.persister.update_ret.lock().unwrap() =
1417-
ChannelMonitorUpdateStatus::Completed
1417+
*mon_style[2].borrow_mut() = ChannelMonitorUpdateStatus::Completed;
14181418
},
14191419

14201420
0x08 => complete_all_monitor_updates(&monitor_a, &chan_1_id),
@@ -1724,19 +1724,8 @@ pub fn do_test<Out: Output>(data: &[u8], underlying_out: Out, anchors: bool) {
17241724
// after we resolve all pending events.
17251725
// First make sure there are no pending monitor updates and further update
17261726
// operations complete.
1727-
*monitor_a.persister.update_ret.lock().unwrap() =
1728-
ChannelMonitorUpdateStatus::Completed;
1729-
*monitor_b.persister.update_ret.lock().unwrap() =
1730-
ChannelMonitorUpdateStatus::Completed;
1731-
*monitor_c.persister.update_ret.lock().unwrap() =
1732-
ChannelMonitorUpdateStatus::Completed;
1733-
1734-
complete_all_monitor_updates(&monitor_a, &chan_1_id);
1735-
complete_all_monitor_updates(&monitor_b, &chan_1_id);
1736-
complete_all_monitor_updates(&monitor_b, &chan_2_id);
1737-
complete_all_monitor_updates(&monitor_c, &chan_2_id);
1738-
1739-
// Next, make sure peers are all connected to each other
1727+
1728+
// First, make sure peers are all connected to each other
17401729
if chan_a_disconnected {
17411730
let init_1 = Init {
17421731
features: nodes[1].init_features(),
@@ -1769,42 +1758,65 @@ pub fn do_test<Out: Output>(data: &[u8], underlying_out: Out, anchors: bool) {
17691758
}
17701759

17711760
macro_rules! process_all_events {
1772-
() => {
1761+
() => { {
1762+
let mut last_pass_no_updates = false;
17731763
for i in 0..std::usize::MAX {
17741764
if i == 100 {
17751765
panic!("It may take may iterations to settle the state, but it should not take forever");
17761766
}
1767+
// Next, make sure no monitor updates are pending
1768+
complete_all_monitor_updates(&monitor_a, &chan_1_id);
1769+
complete_all_monitor_updates(&monitor_b, &chan_1_id);
1770+
complete_all_monitor_updates(&monitor_b, &chan_2_id);
1771+
complete_all_monitor_updates(&monitor_c, &chan_2_id);
17771772
// Then, make sure any current forwards make their way to their destination
17781773
if process_msg_events!(0, false, ProcessMessages::AllMessages) {
1774+
last_pass_no_updates = false;
17791775
continue;
17801776
}
17811777
if process_msg_events!(1, false, ProcessMessages::AllMessages) {
1778+
last_pass_no_updates = false;
17821779
continue;
17831780
}
17841781
if process_msg_events!(2, false, ProcessMessages::AllMessages) {
1782+
last_pass_no_updates = false;
17851783
continue;
17861784
}
17871785
// ...making sure any pending PendingHTLCsForwardable events are handled and
17881786
// payments claimed.
17891787
if process_events!(0, false) {
1788+
last_pass_no_updates = false;
17901789
continue;
17911790
}
17921791
if process_events!(1, false) {
1792+
last_pass_no_updates = false;
17931793
continue;
17941794
}
17951795
if process_events!(2, false) {
1796+
last_pass_no_updates = false;
17961797
continue;
17971798
}
1798-
break;
1799+
if last_pass_no_updates {
1800+
// In some cases, we may generate a message to send in
1801+
// `process_msg_events`, but block sending until
1802+
// `complete_all_monitor_updates` gets called on the next
1803+
// iteration.
1804+
//
1805+
// Thus, we only exit if we manage two iterations with no messages
1806+
// or events to process.
1807+
break;
1808+
}
1809+
last_pass_no_updates = true;
17991810
}
1800-
};
1811+
} };
18011812
}
18021813

1803-
// At this point, we may be pending quiescence, so we'll process all messages to
1804-
// ensure we can complete its handshake. We'll then exit quiescence and process all
1805-
// messages again, to resolve any pending HTLCs (only irrevocably committed ones)
1806-
// before attempting to send more payments.
1814+
// We may be pending quiescence, so first process all messages to ensure we can
1815+
// complete the quiescence handshake.
18071816
process_all_events!();
1817+
1818+
// Then exit quiescence and process all messages again, to resolve any pending
1819+
// HTLCs (only irrevocably committed ones) before attempting to send more payments.
18081820
nodes[0].exit_quiescence(&nodes[1].get_our_node_id(), &chan_a_id).unwrap();
18091821
nodes[1].exit_quiescence(&nodes[0].get_our_node_id(), &chan_a_id).unwrap();
18101822
nodes[1].exit_quiescence(&nodes[2].get_our_node_id(), &chan_b_id).unwrap();

lightning/src/chain/mod.rs

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,12 @@ pub enum ChannelMonitorUpdateStatus {
209209
///
210210
/// This includes performing any `fsync()` calls required to ensure the update is guaranteed to
211211
/// be available on restart even if the application crashes.
212+
///
213+
/// If you return this variant, you cannot later return [`InProgress`] from the same instance of
214+
/// [`Persist`]/[`Watch`] without first restarting.
215+
///
216+
/// [`InProgress`]: ChannelMonitorUpdateStatus::InProgress
217+
/// [`Persist`]: chainmonitor::Persist
212218
Completed,
213219
/// Indicates that the update will happen asynchronously in the background or that a transient
214220
/// failure occurred which is being retried in the background and will eventually complete.
@@ -234,7 +240,12 @@ pub enum ChannelMonitorUpdateStatus {
234240
/// reliable, this feature is considered beta, and a handful of edge-cases remain. Until the
235241
/// remaining cases are fixed, in rare cases, *using this feature may lead to funds loss*.
236242
///
243+
/// If you return this variant, you cannot later return [`Completed`] from the same instance of
244+
/// [`Persist`]/[`Watch`] without first restarting.
245+
///
237246
/// [`InProgress`]: ChannelMonitorUpdateStatus::InProgress
247+
/// [`Completed`]: ChannelMonitorUpdateStatus::Completed
248+
/// [`Persist`]: chainmonitor::Persist
238249
InProgress,
239250
/// Indicates that an update has failed and will not complete at any point in the future.
240251
///

lightning/src/ln/channelmanager.rs

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2569,6 +2569,13 @@ where
25692569
#[cfg(any(test, feature = "_test_utils"))]
25702570
pub(super) per_peer_state: FairRwLock<HashMap<PublicKey, Mutex<PeerState<SP>>>>,
25712571

2572+
/// We only support using one of [`ChannelMonitorUpdateStatus::InProgress`] and
2573+
/// [`ChannelMonitorUpdateStatus::Completed`] without restarting. Because the API does not
2574+
/// otherwise directly enforce this, we enforce it in debug builds here by storing which one is
2575+
/// in use.
2576+
#[cfg(all(not(test), debug_assertions))]
2577+
monitor_update_type: AtomicUsize,
2578+
25722579
/// The set of events which we need to give to the user to handle. In some cases an event may
25732580
/// require some further action after the user handles it (currently only blocking a monitor
25742581
/// update from being handed to the user to ensure the included changes to the channel state
@@ -3312,11 +3319,19 @@ macro_rules! handle_new_monitor_update {
33123319
panic!("{}", err_str);
33133320
},
33143321
ChannelMonitorUpdateStatus::InProgress => {
3322+
#[cfg(all(not(test), debug_assertions))]
3323+
if $self.monitor_update_type.swap(1, Ordering::Relaxed) == 2 {
3324+
panic!("Cannot use both ChannelMonitorUpdateStatus modes InProgress and Completed without restart");
3325+
}
33153326
log_debug!($logger, "ChannelMonitor update for {} in flight, holding messages until the update completes.",
33163327
$channel_id);
33173328
false
33183329
},
33193330
ChannelMonitorUpdateStatus::Completed => {
3331+
#[cfg(all(not(test), debug_assertions))]
3332+
if $self.monitor_update_type.swap(2, Ordering::Relaxed) == 1 {
3333+
panic!("Cannot use both ChannelMonitorUpdateStatus modes InProgress and Completed without restart");
3334+
}
33203335
$completed;
33213336
true
33223337
},
@@ -3577,6 +3592,9 @@ where
35773592

35783593
per_peer_state: FairRwLock::new(new_hash_map()),
35793594

3595+
#[cfg(all(not(test), debug_assertions))]
3596+
monitor_update_type: AtomicUsize::new(0),
3597+
35803598
pending_events: Mutex::new(VecDeque::new()),
35813599
pending_events_processor: AtomicBool::new(false),
35823600
pending_background_events: Mutex::new(Vec::new()),
@@ -14747,6 +14765,9 @@ where
1474714765

1474814766
per_peer_state: FairRwLock::new(per_peer_state),
1474914767

14768+
#[cfg(all(not(test), debug_assertions))]
14769+
monitor_update_type: AtomicUsize::new(0),
14770+
1475014771
pending_events: Mutex::new(pending_events_read),
1475114772
pending_events_processor: AtomicBool::new(false),
1475214773
pending_background_events: Mutex::new(pending_background_events),

0 commit comments

Comments
 (0)