Skip to content

Add workaround for STM32F4 hardfault in sleep mode #12662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 30, 2020

Conversation

artokin
Copy link
Contributor

@artokin artokin commented Mar 20, 2020

Summary of changes

This workaround is related to Mbed OS issue
#12294

We would like temporarily disable sleep from STM32F4. The hardfault occurs so often that it is preventing us to continue with large scale network testing.

This workaround can be removed once the root cause for hardfault is corrected.

Impact of changes

Migration actions required

Documentation


Pull request type

[x] Patch update (Bug fix / Target update / Docs update / Test update / Refactor)
[] Feature update (New feature / Functionality change / New API)
[] Major update (Breaking change E.g. Return code change / API behaviour change)

Test results

[x] No Tests required for this change (E.g docs only update)
[] Covered by existing mbed-os tests (Greentea or Unittest)
[] Tests / results supplied as part of this PR

Reviewers

@TuomoHautamaki , @teetak01 , @mikter , @jeromecoutant


Copy link
Collaborator

@jeromecoutant jeromecoutant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in master branch

Issue is pyOCD side

@MarceloSalazar
Copy link

I agree with @jeromecoutant , we cannot reproduce this consistently and it seems caused by tools/infrastructure.
We need to continue to investigate to get to the bottom of the problem (I'm currently testing)

@TuomoHautamaki
Copy link

Not disagreeing but we need workaround untill problem is fixed

@MarceloSalazar
Copy link

@TuomoHautamaki as a workaround the team can create a patch to apply locally after cloning the repo. This will let you proceed with testing until we understand the root cause of the problem.

@ciarmcom ciarmcom requested review from jeromecoutant and a team March 20, 2020 12:00
@ciarmcom
Copy link
Member

@artokin, thank you for your changes.
@jeromecoutant @ARMmbed/mbed-os-maintainers please review.

@TuomoHautamaki
Copy link

We don't want to adjust our test jobs when there are issues in the SW under testing and this has been open quite long now.

I understand this may look a bit harsh when disabling the whole feature. However, there are also other users reporting the same issue and I doubt they are using our CI system. Disabling this feature works for them as well.

We can enable sleep feature again once fix is available.

@MarceloSalazar MarceloSalazar self-requested a review March 20, 2020 12:04
Copy link

@MarceloSalazar MarceloSalazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workaround not suitable for unknown problem

@jeromecoutant
Copy link
Collaborator

You can disable DEVICE_HAS_SLEEP in your local mbed_app.json

@juhhei01
Copy link
Contributor

If every application must disable sleep it not sound a good idea. This problem have not new and it comes typically when binary size is updated. With workaround there is no problem so when this problem will be fixed ”properly”. App change hidden issue not fix that.

@0xc0170
Copy link
Contributor

0xc0170 commented Mar 23, 2020

You can disable DEVICE_HAS_SLEEP in your local mbed_app.json

@jeromecoutant reading the issue referenced, the problem was confirmed by various users. Shouldn't we disable sleep for specific targets? Not changing the driver code as in this PR at the moment. The problem is known - sleep is causing it but a root cause not yet (might be something else but that needs further investigation). We saw reports for L4, odin and nucleo f429 at least.

Issue is pyOCD side

It was reported with IAR or openOCD as well, this seems more to be in drivers than debug tools?

Is this only with debug profile, release is all good?

I think yes until this is fixed. Adding device_has_remove: ['SLEEP'] for affected targets. What targets are affected, all F4 (every target with label STM32F4 should have it?) ?

Let's do it other way around - remove sleep support for affected targets, have a known issue in the next release if its not fixed there (hopefully will be so not needed). Rather than keeping this enabled and known issue like "it might fail for you, just disable it in your app config".

@0xc0170
Copy link
Contributor

0xc0170 commented Mar 23, 2020

Set this to needs: review, more discussion ongoing how to proceed.

@jamesbeyond
Copy link
Contributor

Hi @artokin,
I believe we found the cause of the issue,
please see my comments on #12294 (comment)

Also, could you try the suggested solutions? that is works for me and 2 other users. if the solution works for you, can you update your PR to use that as a fix

@mergify mergify bot dismissed MarceloSalazar’s stale review March 27, 2020 11:59

Pull request has been modified.

@MarceloSalazar MarceloSalazar changed the title Disable sleep on STM32F4 as an workaround for stability issues. Add workaround for STM32F4 hardfault in sleep mode Mar 27, 2020
@MarceloSalazar
Copy link

@jeromecoutant we discussed and found this workaround of adding 3 x NOPs seems to be the best option as it's independent of tools and debug/release profiles.
Can you please check and confirm you're ok with this so we can proceed with CI testing? Thanks!

@LMESTM
Copy link
Contributor

LMESTM commented Mar 27, 2020

@jeromecoutant we discussed and found this workaround of adding 3 x NOPs seems to be the best option as it's independent of tools and debug/release profiles.
Can you please check and confirm you're ok with this so we can proceed with CI testing? Thanks!

Question: looks like a good way forward.
Extra question: Is the testing environment using DEBUG profile ? If so, can we have the NOP workarounds under MBED_DEBUG compilation switch so that the RELEASE builds aren't impacted ?

@jeromecoutant
Copy link
Collaborator

ST_INTERNAL_REF 83447

@mergify mergify bot added needs: CI and removed needs: work labels Mar 27, 2020
Copy link
Collaborator

@jeromecoutant jeromecoutant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ST CI OK

@jamesbeyond
Copy link
Contributor

Question: looks like a good way forward.
Extra question: Is the testing environment using DEBUG profile ? If so, can we have the NOP workarounds under MBED_DEBUG compilation switch so that the RELEASE builds aren't impacted ?

Hi @LMESTM, I was thinking about your suggestion. But I feel maybe we should leave it like this.

The reason is in the errata sheet, it didn't claim the crash have anything to do with the compiler debug options. So my theory is the app will crash regardless of what build profile user use. It just because the debug tools will most likely set the MUCDBG_CR flag when a debug build image being flashed. hence we seeing more frequent of this issue.

Also, our client team did report they randomly seeing similar crashes on develop/release profile build, which looks might be related.

Furthermore, I don't feel it is huge impact/compromise for adding 3xNOP in the code, even to the release profile. if you have other concerns, feel free to let us know.

@@ -392,6 +392,9 @@ void HAL_PWR_EnterSLEEPMode(uint32_t Regulator, uint8_t SLEEPEntry)
{
/* Request Wait For Interrupt */
__WFI();
__NOP(); // Workaround for STM32F4 errata
__NOP(); // see chapter 2.1.3 - Debugging Sleep/Stop mode with WFE/WFI entry
__NOP(); // https://www.st.com/resource/en/errata_sheet/dm00037591-stm32f405-407xx-and-stm32f415-417xx-device-limitations-stmicroelectronics.pdf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the WFE() in below lines ? do we need them to be patched as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mbed OS doesn't use that mode, so not required for us - would just be an unnecessary image size increase. But if the change is being upstreamed to the HAL, seems like you'd want them there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the upstream comes to our repo, this will be updated once driver is updated.

@jeromecoutant We fine as it is here now, correct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that ST driver team will accept this change :-(
But as it solves some mbed issues, we are ok to merge this wrkaround.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better alternative, given the errata description? That is the most universally-feasible workaround in your errata sheet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As soon as confirmed, I'll merge this. It would be good to have this in today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LMESTM if you don't mind, we'll proceed with this change for both debug and release for now, as there are other random failures in release mode that we're seeing in CI and need to investigate and rule out for the limited time we have.
Then we can introduce optimizations once we have more information.

Ok. If RELEASE build is used in CI as well, then the same loading mechanism will apply (thru debug port) and the same issue may arise. So we can't put the proposed compilation switches - please forget my comment and feel free to proceed ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a separate argument that something should be done to get that debug/sleep control register back into its default state anyway, otherwise the just-flashed release build in test is not representative of the deployed image in terms of sleep, whether that comes down to power measurements, timing, hitting bugs like this. Arguably release builds should be manually putting that back to its zero state. Flasher should also have taken more care...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We think it's the flasher role to do so. If you power cycle the product, the release build will work just fine ... I'm not sure we should add an explicit reset of those bits, because even the release build should be debugged and in this case the flasher or debugger (or flasher of the debugger) should set the bits.

Copy link
Contributor

@kjbracey kjbracey Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know bigger systems like Linux are intensely paranoid about chip state post-bootloader, so tend not to rely on any default state. But we can't really afford the ROM space for that level of paranoia in an embedded image.

And yes, in this specific case, I see your point that a debugger may obviously have deliberately set those bits before running us, for a good reason, so an image should not reset any bits there, regardless of the general "trust default state or not" philosphy.

@0xc0170
Copy link
Contributor

0xc0170 commented Mar 30, 2020

@artokin This should also be in 5.15.2 (please send another PR directly to the branch) ?

@MarceloSalazar MarceloSalazar self-requested a review March 30, 2020 12:07
@0xc0170 0xc0170 merged commit 92cdcfb into ARMmbed:master Mar 30, 2020
@mergify mergify bot removed the ready for merge label Mar 30, 2020
@0xc0170
Copy link
Contributor

0xc0170 commented Mar 30, 2020

@artokin This should also be in 5.15.2 (please send another PR directly to the branch) ?

I've got a branch, I'll create PR shortly

@0xc0170
Copy link
Contributor

0xc0170 commented Mar 30, 2020

#12717 Created

@0xc0170
Copy link
Contributor

0xc0170 commented Mar 30, 2020

@Mergifyio backport mbed-os-5.15

@0xc0170
Copy link
Contributor

0xc0170 commented Mar 31, 2020

That did not work as I assume I created PR (cant create duplicate). I'll test this on another PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.