Skip to content

bad double for loop optimization  #8261

Closed
@mhightower83

Description

@mhightower83

Basic Infos

  • This issue complies with the issue POLICY doc.
  • I have read the documentation at readthedocs and the issue is not addressed there.
  • I have tested that the issue is present in current master branch (aka latest git).
  • I have searched the issue tracker for a similar issue.
  • If there is a stack dump, I have decoded it.
  • I have filled out all fields below.

Platform

  • Hardware: [ESP-12]
  • Core Version: [5f04fbb]
  • Development Env: [Arduino IDE]
  • Operating System: [Ubuntu]

Settings in IDE

  • Module: [Nodemcu]
  • Flash Mode: [dio]
  • Flash Size: [4MB]
  • lwip Variant: [v2 Higher Bandwidth]
  • Reset Method: [nodemcu]
  • Flash Frequency: [40Mhz]
  • CPU Frequency: [80Mhz]
  • Upload Using: [SERIAL]
  • Upload Speed: [460800] (serial upload only)

Problem Description

With the newer "xtensa-lx106-elf-gcc (GCC) 10.3.0" I have had problems from time to time, that I have dismissed as being an error in my use of Extended ASM. It now appears that may not have been completely true. The MCVE below builds and executes correctly with -O0 or the old "xtensa-lx106-elf-gcc (GCC) 4.8.2" compiler with -O0 or -Os.
However, for the case of "xtensa-lx106-elf-gcc (GCC) 10.3.0" and -Os it generates bad code that does not properly update the array as it executes a double for loop.

This MCVE is an extreme minimization of the original bearssl experiment that was failing for me. There is always a chance I minimized out the failure case and added my own bug. However, I think I got everything cleaned up.

MCVE Sketch

uint16_t x[] = {1, 2, 3};

extern "C" void test(uint16_t *x);

void setup() {
  Serial.begin(115200);
  delay(200);
  Serial.println("\r\nTesting 1, 2, 3");

  test(x);

  for (size_t i = 0; i < sizeof(x)/sizeof(uint16_t); i++)
    ets_uart_printf("addr(%p), x[%u](0x%04X)\n", &x[i], i, x[i]);
}

void loop() {
}

Avoid the compiler defeating our test and optimizing to an answer by place this code with sketch In a separate .c file:

#include <Arduino.h>
#include <pgmspace.h>  // make sure dependencies catch debug edits to file

#if 1
// Used this one to get a simplified ASM for inspection
#undef pgm_read_word
#define pgm_read_word get_uint16

static inline __attribute__((always_inline))
uint16_t get_uint16(const uint16_t *p16) {
  uint32_t val;
  asm volatile ("l16ui %[val], %[base], 0;" :[val]"=r"(val) :[base]"r"(p16) :);
  return val;
}

#elif 0
//  Extended ASM is not an issue, this will also fail.
#undef pgm_read_word
#define pgm_read_word get_uint16

static inline __attribute__((always_inline))
uint16_t get_uint16(const uint16_t *p16) {
  uint32_t val = (*(uint32_t *)((uintptr_t)p16 & ~0x3u));
  uint32_t pos = ((uintptr_t)p16 & 0x3u) * 8u;
  val >>= pos;
  return (uint16_t)val;
}

#else
//  Use PROGMEM, the original context the failure was observed with.
#endif


// #pragma GCC optimize("O0")   // Works
#pragma GCC optimize("Os")   // Fails
void test(uint16_t *x) {
  size_t len = 3;
  for (size_t u = 0; u < len; u++) {
    uint16_t x1 = pgm_read_word(&x[0]);
    for (size_t v = 0; v < len; v++) {
      x[v] = pgm_read_word(&x[v]) + x1;
    }
  }
}

Debug Messages

Sketch output when working will look something like this:

  Testing 1, 2, 3
  addr(0x3ffe85c8), x[0](0x0008)
  addr(0x3ffe85ca), x[1](0x0009)
  addr(0x3ffe85cc), x[2](0x000A)

Sketch failing output will look something like this (-Os):

  Testing 1, 2, 3
  addr(0x3ffe85c8), x[0](0x0002)
  addr(0x3ffe85ca), x[1](0x0003)
  addr(0x3ffe85cc), x[2](0x0004)

Working code generated under Arduino ESP8266 Core 2.7.4
xtensa-lx106-elf-gcc (GCC) 4.8.2 with -Os option

Disassembly of section .text.test:

00000000 <test>:
   0:	340c                	movi.n	a4, 3
   2:	001272               	l16ui	a7, a2, 0
   5:	030c                	movi.n	a3, 0
   7:	f47070               	extui	a7, a7, 0, 16
   a:	523a                	add.n	a5, a2, a3
   c:	001562               	l16ui	a6, a5, 0
   f:	676a                	add.n	a6, a7, a6
  11:	005562               	s16i	a6, a5, 0
  14:	332b                	addi.n	a3, a3, 2
  16:	f06366               	bnei	a3, 6, a <test+0xa>
  19:	440b                	addi.n	a4, a4, -1
  1b:	fe3456               	bnez	a4, 2 <test+0x2>
  1e:	f00d                	ret.n

Failing code generated under Arduino ESP8266 Core current master
xtensa-lx106-elf-gcc (GCC) 10.3.0 with -Os option

Disassembly of section .text.test:

00000000 <test>:
   0:	822b                	addi.n	a8, a2, 2
   2:	04c272               	addi	a7, a2, 4
   5:	03a062               	movi	a6, 3
   8:	001232               	l16ui	a3, a2, 0
   b:	f43030               	extui	a3, a3, 0, 16
   e:	001252               	l16ui	a5, a2, 0
  11:	535a                	add.n	a5, a3, a5
  13:	f45050               	extui	a5, a5, 0, 16
  16:	001842               	l16ui	a4, a8, 0
  19:	434a                	add.n	a4, a3, a4
  1b:	f44040               	extui	a4, a4, 0, 16
  1e:	001792               	l16ui	a9, a7, 0
  21:	339a                	add.n	a3, a3, a9
  23:	660b                	addi.n	a6, a6, -1
  25:	f43030               	extui	a3, a3, 0, 16
  28:	fdc656               	bnez	a6, 8 <test+0x8>
  2b:	005252               	s16i	a5, a2, 0
  2e:	015242               	s16i	a4, a2, 2
  31:	025232               	s16i	a3, a2, 4
  34:	f00d                	ret.n

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions