Description
Hardware:
Board: ESP32 Dev Module nodeMCU 32s
Core Installation version: latest git
IDE name: Arduino IDE
Flash Frequency: 40Mhz
PSRAM enabled: ?no? ?yes?
Upload Speed: default speed
Computer OS: Ubuntu
Description:
millis() is not working as expected, causing time out problems every 72 minutes in many functions of the core and user programs
Sketch: (leave the backquotes for code formatting)
void setup() {
// put your setup code here, to run once:
Serial.begin(115200);
Serial.println("\nDemonstrate often used timeout loop error\nHappens every about 72 minutes");
Serial.printf("0xFFFFFFFF + 100 is %lu\n", 0xFFFFFFFFUL + 100UL);
Serial.printf("Negatif will be %lu\n", -4294967);
}
void loop() {
// put your main code here, to run repeatedly:
unsigned long start = millis();
unsigned long duration;
unsigned long startMicros = micros();
unsigned long durationMicros;
for(;;) {
duration = millis() - start;
durationMicros = micros() - startMicros;
if(duration > 0x7FFFFFFFUL) {
Serial.printf("\r\nMillis() rolled over: %u. In micros / 1000: %u", duration, durationMicros / 1000UL);
for(;;) delay(1);
}
if(duration >= 3000) break;
delay(1);
}
Serial.printf("\r\nTimeout took %lu ms. Wait another about %lu minutes for the error to occur", duration, (0xFFFFFFFF - micros()) / 60000000UL);
}
Debug Messages:
15:33:49.694 -> Demonstrate often used timeout loop error
15:33:49.694 -> Happens every about 72 minutes
15:33:49.694 -> 0xFFFFFFFF + 100 is 99
15:33:49.694 -> Negatif will be 4290672329
15:33:52.682 -> Timeout took 3000 ms. Wait another about 71 minutes for the error to occur
15:33:55.700 -> Timeout took 3000 ms. Wait another about 71 minutes for the error to occur
15:33:58.688 -> Timeout took 3000 ms. Wait another about 71 minutes for the error to occur
…
16:45:16.791 -> Timeout took 3000 ms. Wait another about 0 minutes for the error to occur
16:45:19.778 -> Timeout took 3000 ms. Wait another about 0 minutes for the error to occur
16:45:22.800 -> Timeout took 3000 ms. Wait another about 0 minutes for the error to occur
16:45:24.726 -> Millis() rolled over: 4290674261. In micros / 1000: 1932
Issue #2330 was closed, but the issue is not solved in the release mentioned in that PR. In that release the type of micros() and millis() is changed to unsigned long, but this is not solving the issue!
In my honest opinion, this is a very serious error!
The problem is that many timeout loops depend on 2’s complement arithmetic and millis() being defined as:
unsigned long IRAM_ATTR millis()
{
return (unsigned long) (esp_timer_get_time() / 1000);
}
See
arduino-esp32/cores/esp32/esp32-hal-misc.c
Line 109 in 25a2d94
Take for instance in Stream.cpp timedRead
arduino-esp32/cores/esp32/Stream.cpp
Line 31 in 86fdb5b
int Stream::timedRead()
{
int c;
_startMillis = millis();
do {
c = read();
if(c >= 0) {
return c;
}
} while(millis() - _startMillis < _timeout);
return -1; // -1 indicates timeout
}
The default _timeout is 1000 ms.
Now assume that esp_timer_get_time() is about to roll over at the moment that _startMillis() = millis() is assigned. So assume esp_timer_get_time() is 0xFFFFFFFF, and millis() will be 0xFFFFFFFF / 1000 is 4294967295 / 1000 is 4294967. Thus _startMillis will be 4294967.
Next assume that one do { } while loop takes 100 microseconds (the exact duration is not important to demonstrate the effect.
At the time of the evaluation of the while condition, micros() will be 0xFFFFFFFF + 100 is 99, because of 2’s complement arithmetic. So millis() will be 99 / 1000 is 0.
The while condition at the loop will evaluate to 0 - 4294967 is -4294967. Also because of 2’s complement arithmetic this evaluates to 4290672329. The while condition evaluates to 4290672329 < 1000, which is obviously false. So the while loop stops after only 100 _micro_seconds of waiting (the assumed duration of one iteration of the while loop), instead of the intended 1000 _milli_seconds!!!!
In the above example the same thing happens in a real example. The last timeout, exactly after about 72 minutes of running lasted only 1932 ms instead of the wanted 3000 ms.
Stream is in the heart of many programs. And this is just one example. Many timeout loops use the same structure: millis() - start < timeout
@me-no-dev, could you take at look at this?