NMV2 metrics fanout and corrections #5763

durch · 2025-05-15T11:11:54Z

This change is

Summary

This PR enhances the Network Monitor with a "metrics fanout" capability, allowing monitoring metrics to be submitted to multiple API endpoints concurrently. It also introduces a new database schema for storing detailed
route performance data to improve network reliability tracking and analysis.

Key Features

Metrics Fanout System: Submit monitoring data to multiple nym-api instances simultaneously
Route Performance Tracking: Store route metrics in a new database table

Technical Changes

Added support for multiple NYM_API endpoints via the nym_apis CLI parameter and NYM_APIS environment variable
Created new routes table schema in the database to store detailed route metrics:
- Records node IDs for each layer (layer1, layer2, layer3, gateway)
- Tracks success/failure status for each packet route
- Includes created_at timestamps, would be better if these were received at, but out of scope for now
- Added appropriate indexes for efficient querying
Implemented parallel submission of route data to all configured API endpoints using FuturesUnordered
Added support for efficient binary data copying to the database for metrics storage
Enhanced error handling and logging for metrics submission

Testing Notes

The metrics fanout functionality has been tested against multiple API endpoints to ensure proper data submission
Route performance data can be verified in the database by querying the new routes table
The implementation includes proper error handling to ensure metrics submission continues even if one endpoint fails
Performance impact is minimal as metrics submission happens in parallel using asynchronous requests

Deployment Considerations

To use this feature, configure multiple API endpoints using either:
- Command-line arguments: --nym-apis <API_URL1> <API_URL2> ...
- Environment variable: NYM_APIS=<API_URL1>,<API_URL2>,...
Database schema will need to be updated using the new migration (20250513104800_routes_table.sql)
No changes to existing functionality or backwards compatibility issues
For optimal performance, API endpoints should be geographically distributed

This enhancement significantly improves the robustness of the Nym Network Monitor by allowing redundant metric submission to multiple endpoints, ensuring that network performance data is properly captured and available for
analysis even if individual API endpoints experience issues. The detailed route tracking will also provide valuable insights into network performance patterns and help identify potential bottlenecks.

vercel · 2025-05-15T11:14:18Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
nym-explorer-v2	❌ Failed (Inspect)			Jun 2, 2025 10:55am

2 Skipped Deployments

Name	Status	Preview	Comments	Updated (UTC)
docs-nextra	⬜️ Ignored (Inspect)	Visit Preview		Jun 2, 2025 10:55am
nym-next-explorer	⬜️ Ignored (Inspect)	Visit Preview		Jun 2, 2025 10:55am

jstuczyn

almost entirely nits apart from one thing:
let node_ids = topology.node_details().keys().cloned().collect::<Vec<_>>();
^ this will not work the way you want as it also retrieves gateways (which also have to be tested, but not as mixnodes : D )

jstuczyn · 2025-05-20T15:17:27Z

nym-api/migrations/20250513104800_routes_table.sql

+    layer3 INTEGER NOT NULL,   -- NodeId of layer 3 mixnode
+    gw INTEGER NOT NULL,       -- NodeId of gateway
+    success BOOLEAN NOT NULL,  -- Whether the packet was delivered successfully
+    timestamp INTEGER NOT NULL DEFAULT (unixepoch()) -- When the measurement was taken


nit: if possible, perhaps store it with explicit type for easier queries, i.e. TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP so that it could be coerced into OffsetDateTime with sqlx

jstuczyn · 2025-05-20T15:18:50Z

nym-api/src/node_status_api/handlers/without_monitor.rs

-            Err(AxumErrorResponse::internal_msg(
-                "failed to submit gateway monitoring results",
-            ))
+    match message.results() {


I guess this is out of scope for this PR, but just wondering, does it still make sense to have separate results for gateways and mixnodes?

jstuczyn · 2025-05-20T15:19:37Z