Skip to content

OTA and upload of larger files fails #9990

Closed
tasmota/arduino-esp32
#457
@TD-er

Description

@TD-er

Board

Any with large enough flash

Device Description

This is about larger uploads of either the sketch, or file to the filesystem.
So any board with large enough flash to have an upload of files > 2 MB

Hardware Configuration

Version

latest master (checkout manually)

IDE Name

PlatformIO

Operating System

Windows 11

Flash frequency

40MHz

PSRAM enabled

yes

Upload speed

115200

Description

OTA updates of larger sketches (typically > 2 MB in size) lately tend to fail.
The device then is inaccessible for 10 - 15 minutes.

On the serial console, you'll see the dots per 1436 bytes of upload and they typically stop at 1100 - 1600 dots.
Thus somewhere between 1.4 - 2.2 MB and never seem to make it to the full 2.5 MB needed for my "max" build bin files.

I found 2 issues here.

This function does no longer have a timeout implemented:

int WebServer::_uploadReadByte(NetworkClient &client) {
int res = client.read();
if (res < 0) {
while (!client.available() && client.connected()) {
delay(2);
}
res = client.read();
}
return res;
}

Related issue: #9163
Related PR: https://github.com/espressif/arduino-esp32/pull/9167/files

IMHO the old implementation of this function was just fine:

int WebServer::_uploadReadByte(WiFiClient& client){
  int res = client.read();
  if(res < 0) {
    // keep trying until you either read a valid byte or timeout
    unsigned long startMillis = millis();
    long timeoutIntervalMillis = client.getTimeout();
    boolean timedOut = false;
    for(;;) {
      if (!client.connected()) return -1;
      // loosely modeled after blinkWithoutDelay pattern
      while(!timedOut && !client.available() && client.connected()){
        delay(2);
        timedOut = millis() - startMillis >= timeoutIntervalMillis;
      }

      res = client.read();
      if(res >= 0) {
        return res; // exit on a valid read
      }
      // NOTE: it is possible to get here and have all of the following
      //       assertions hold true
      //
      //       -- client.available() > 0
      //       -- client.connected == true
      //       -- res == -1
      //
      //       a simple retry strategy overcomes this which is to say the
      //       assertion is not permanent, but the reason that this works
      //       is elusive, and possibly indicative of a more subtle underlying
      //       issue

      timedOut = millis() - startMillis >= timeoutIntervalMillis;
      if(timedOut) {
        return res; // exit on a timeout
      }
    }
  }

  return res;
}

(of course needs to have NetworkClient as function argument now)

The other issue is in WebServer::handleClient():

case HC_WAIT_READ:
// Wait for data from client to become available
if (_currentClient.available()) {
if (_parseRequest(_currentClient)) {
// because HTTP_MAX_SEND_WAIT is expressed in milliseconds,
// it must be divided by 1000
_currentClient.setTimeout(HTTP_MAX_SEND_WAIT); /* / 1000 removed, WifiClient setTimeout changed to ms */
_contentLength = CONTENT_LENGTH_NOT_SET;
_handleRequest();

setTimeout should be called before _parseRequest

I'm not 100% sure this is all that's needed to make OTA uploads work in all situations again.
But at least with these changes I managed to get the OTA working again on the 2nd attempt where it was nearly impossible before these changes.
Also it is nice the board will remain accessible and not hang for 10+ minutes in that _uploadReadByte function :)

Sketch

-

Debug Message

-

Other Steps to Reproduce

No response

I have checked existing issues, online documentation and the Troubleshooting Guide

  • I confirm I have checked existing issues, online documentation and Troubleshooting guide.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions