-
Notifications
You must be signed in to change notification settings - Fork 3k
Add workaround for STM32F4 hardfault in sleep mode #12662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This workaround is related to Mbed OS issue ARMmbed#12294
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in master branch
Issue is pyOCD side
I agree with @jeromecoutant , we cannot reproduce this consistently and it seems caused by tools/infrastructure. |
Not disagreeing but we need workaround untill problem is fixed |
@TuomoHautamaki as a workaround the team can create a patch to apply locally after cloning the repo. This will let you proceed with testing until we understand the root cause of the problem. |
@artokin, thank you for your changes. |
We don't want to adjust our test jobs when there are issues in the SW under testing and this has been open quite long now. I understand this may look a bit harsh when disabling the whole feature. However, there are also other users reporting the same issue and I doubt they are using our CI system. Disabling this feature works for them as well. We can enable sleep feature again once fix is available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Workaround not suitable for unknown problem
You can disable DEVICE_HAS_SLEEP in your local mbed_app.json |
If every application must disable sleep it not sound a good idea. This problem have not new and it comes typically when binary size is updated. With workaround there is no problem so when this problem will be fixed ”properly”. App change hidden issue not fix that. |
@jeromecoutant reading the issue referenced, the problem was confirmed by various users. Shouldn't we disable sleep for specific targets? Not changing the driver code as in this PR at the moment. The problem is known - sleep is causing it but a root cause not yet (might be something else but that needs further investigation). We saw reports for L4, odin and nucleo f429 at least.
It was reported with IAR or openOCD as well, this seems more to be in drivers than debug tools? Is this only with debug profile, release is all good? I think yes until this is fixed. Adding Let's do it other way around - remove sleep support for affected targets, have a known issue in the next release if its not fixed there (hopefully will be so not needed). Rather than keeping this enabled and known issue like "it might fail for you, just disable it in your app config". |
Set this to needs: review, more discussion ongoing how to proceed. |
Hi @artokin, Also, could you try the suggested solutions? that is works for me and 2 other users. if the solution works for you, can you update your PR to use that as a fix |
Pull request has been modified.
@jeromecoutant we discussed and found this workaround of adding 3 x NOPs seems to be the best option as it's independent of tools and debug/release profiles. |
Question: looks like a good way forward. |
ST_INTERNAL_REF 83447 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ST CI OK
Hi @LMESTM, I was thinking about your suggestion. But I feel maybe we should leave it like this. The reason is in the errata sheet, it didn't claim the crash have anything to do with the compiler debug options. So my theory is the app will crash regardless of what build profile user use. It just because the debug tools will most likely set the Also, our client team did report they randomly seeing similar crashes on develop/release profile build, which looks might be related. Furthermore, I don't feel it is huge impact/compromise for adding 3x |
@@ -392,6 +392,9 @@ void HAL_PWR_EnterSLEEPMode(uint32_t Regulator, uint8_t SLEEPEntry) | |||
{ | |||
/* Request Wait For Interrupt */ | |||
__WFI(); | |||
__NOP(); // Workaround for STM32F4 errata | |||
__NOP(); // see chapter 2.1.3 - Debugging Sleep/Stop mode with WFE/WFI entry | |||
__NOP(); // https://www.st.com/resource/en/errata_sheet/dm00037591-stm32f405-407xx-and-stm32f415-417xx-device-limitations-stmicroelectronics.pdf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the WFE()
in below lines ? do we need them to be patched as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mbed OS doesn't use that mode, so not required for us - would just be an unnecessary image size increase. But if the change is being upstreamed to the HAL, seems like you'd want them there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the upstream comes to our repo, this will be updated once driver is updated.
@jeromecoutant We fine as it is here now, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that ST driver team will accept this change :-(
But as it solves some mbed issues, we are ok to merge this wrkaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better alternative, given the errata description? That is the most universally-feasible workaround in your errata sheet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As soon as confirmed, I'll merge this. It would be good to have this in today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LMESTM if you don't mind, we'll proceed with this change for both debug and release for now, as there are other random failures in release mode that we're seeing in CI and need to investigate and rule out for the limited time we have.
Then we can introduce optimizations once we have more information.
Ok. If RELEASE build is used in CI as well, then the same loading mechanism will apply (thru debug port) and the same issue may arise. So we can't put the proposed compilation switches - please forget my comment and feel free to proceed ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a separate argument that something should be done to get that debug/sleep control register back into its default state anyway, otherwise the just-flashed release build in test is not representative of the deployed image in terms of sleep, whether that comes down to power measurements, timing, hitting bugs like this. Arguably release builds should be manually putting that back to its zero state. Flasher should also have taken more care...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We think it's the flasher role to do so. If you power cycle the product, the release build will work just fine ... I'm not sure we should add an explicit reset of those bits, because even the release build should be debugged and in this case the flasher or debugger (or flasher of the debugger) should set the bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know bigger systems like Linux are intensely paranoid about chip state post-bootloader, so tend not to rely on any default state. But we can't really afford the ROM space for that level of paranoia in an embedded image.
And yes, in this specific case, I see your point that a debugger may obviously have deliberately set those bits before running us, for a good reason, so an image should not reset any bits there, regardless of the general "trust default state or not" philosphy.
@artokin This should also be in 5.15.2 (please send another PR directly to the branch) ? |
I've got a branch, I'll create PR shortly |
#12717 Created |
@Mergifyio backport mbed-os-5.15 |
That did not work as I assume I created PR (cant create duplicate). I'll test this on another PR |
Summary of changes
This workaround is related to Mbed OS issue
#12294
We would like temporarily disable sleep from STM32F4. The hardfault occurs so often that it is preventing us to continue with large scale network testing.
This workaround can be removed once the root cause for hardfault is corrected.
Impact of changes
Migration actions required
Documentation
Pull request type
Test results
Reviewers
@TuomoHautamaki , @teetak01 , @mikter , @jeromecoutant