timeout and retry logic

I have an issue with the way the NTP library works around updating the time synchronization that I would like to discuss. I have a sketch that keeps getting the current time using NTP.datetime(). When the update interval (.next_sync) expires, the library attempts to get a new NTP based time. When that fails (ETIMEDOUT is common in my environment), it is normally valid to continue using the previous monotonic_start_ns, and retry the update later.

I have a local patch that does that naively. It just traps the time out, and returns, allowing the calling (datetime) method to continue with the existing offset information. The problem with that, is that the update is reattempted every time datetime is referenced, often resulting in a whole string of timeouts, until the referenced server gets caught up. Probably hitting the server way too often, even though the packet was not successfully retrieved.

Should it be the responsibility of the NTP library to manage retry logic? That would seem to need updating next_sync on failures, or other logic to prevent additional update attempts immediately after a failure.

Alternatively, without my patch logic, the caller (of NTP.datetime) could trap the exception, and choose how to recover. A method in NTP to increase next_sync by _cache_seconds would facilitate that.

With any failure, wait, retry logic, it would probably be good to have a method that allows callers to check how stale the current synchronization is. A public 'last_synchronized' time stamp. As a struct_time, monotonic_ns, or a delta time value in either format.
Additional retry options could include using a list of servers. Sequencing through the list until one works. Once an update is successful, that server could become permanent, or fall back to the initial (default) server for the next update attempt. That would need to consider the possibly different cache time for each server. Need to skip servers that have been queried (successful or not) since their published cache value. Which will not be known until the first successful packet retrieval for that server.

I think the NTP library should be responsible for managing retries at some level
-- return time values even when update fails (if a previous offset value is stored)
-- prevent repeat hits on an NTP server after a timeout (honor any known cache_seconds value)
-- provide staleness indicators: ns since last update, cache_seconds
More extreme would be to allow an 'Iterable' as well as a simple str for the server, and provided logic to use them for retries. Potentially that would need to have matching iterables for port and cache_seconds as well. Prioritizing them based on individual cache_seconds would be an option.

I think the first 2 are 'needed', or some alternative to them. The rest is potentially useful feature bloat. I can write the code, but some idea of what will be 'acceptable' is needed.

Discussion?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

timeout and retry logic #40

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

timeout and retry logic #40

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions