-
Notifications
You must be signed in to change notification settings - Fork 241
NMV2 metrics fanout and corrections #5763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
2 Skipped Deployments
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
almost entirely nits apart from one thing:
let node_ids = topology.node_details().keys().cloned().collect::<Vec<_>>();
^ this will not work the way you want as it also retrieves gateways (which also have to be tested, but not as mixnodes : D )
layer3 INTEGER NOT NULL, -- NodeId of layer 3 mixnode | ||
gw INTEGER NOT NULL, -- NodeId of gateway | ||
success BOOLEAN NOT NULL, -- Whether the packet was delivered successfully | ||
timestamp INTEGER NOT NULL DEFAULT (unixepoch()) -- When the measurement was taken |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if possible, perhaps store it with explicit type for easier queries, i.e. TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP
so that it could be coerced into OffsetDateTime
with sqlx
Err(AxumErrorResponse::internal_msg( | ||
"failed to submit gateway monitoring results", | ||
)) | ||
match message.results() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is out of scope for this PR, but just wondering, does it still make sense to have separate results for gateways and mixnodes?
|
||
// Output struct for the calculated corrected reliability for an interval | ||
#[derive(Debug, serde::Serialize)] // Add utoipa::ToSchema if this struct is directly exposed via API | ||
pub struct CorrectedNodeIntervalReliability { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the correction be done by nym api or rather NM? (ideally, in stage 2, NM would be submitting corrected results to the contract)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is that corrections need a lot of results to be useful, and NMs submit every 5 or 10 minutes, or similar cadence. So I've pushed the correction as close to the actual usage of the results as possible otherwise it would not be doing much
pub neg_samples_in_interval: u32, | ||
pub final_fail_seq_in_interval: u32, | ||
} | ||
|
||
// all SQL goes here | ||
impl StorageManager { | ||
pub(super) async fn get_all_avg_mix_reliability_in_last_24hr( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another nit: imo StorageManager
should be concerned with just SQL queries and the logic for calculating all this stuff should go up one (or maybe even two) layers of abstractions. either NymApiStorage
or maybe even above it
r.layer2 as NodeId, | ||
r.layer3 as NodeId, | ||
r.gw as NodeId, | ||
r.success.unwrap_or_default(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would success be missing? can't we set it to NOT NULL
and then avoid this whole iterator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, just being paranoid, if we set it to NOT NULL then inserts might start failing quietly if we have a problem somewhere else, better be explicit then sorry :)
let mut guilty_nodes_in_path: Vec<NodeId> = Vec::new(); | ||
for &node_id in ¤t_path_node_ids_for_blame { | ||
if let Some(state_after_update) = node_states.get(&node_id) { | ||
if state_after_update.fail_seq > 2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you start distributing the blame before you accumulated all results?
|
||
// Attempt to fetch identity, first as mixnode, then as gateway if not found. | ||
// This assumes get_mixnode_identity_key and get_gateway_identity_key return Result<Option<String>, sqlx::Error> | ||
let mut identity: Option<String> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we even care about identity key at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of this is legacy, I think we just clean it all up once we get to the performance contract
let gateway_stats = monitor_gateway_results().await?; | ||
let client = reqwest::Client::new(); | ||
|
||
let node_submit_url = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than using hardcoded strings, can't we just construct a NymApiClient
and call methods on that guy instead?
b06d1ac
to
5d8bdc6
Compare
This change is
Summary
This PR enhances the Network Monitor with a "metrics fanout" capability, allowing monitoring metrics to be submitted to multiple API endpoints concurrently. It also introduces a new database schema for storing detailed
route performance data to improve network reliability tracking and analysis.
Key Features
nym-api
instances simultaneouslyTechnical Changes
NYM_API
endpoints via thenym_apis
CLI parameter andNYM_APIS
environment variableFuturesUnordered
Testing Notes
Deployment Considerations
This enhancement significantly improves the robustness of the Nym Network Monitor by allowing redundant metric submission to multiple endpoints, ensuring that network performance data is properly captured and available for
analysis even if individual API endpoints experience issues. The detailed route tracking will also provide valuable insights into network performance patterns and help identify potential bottlenecks.