Description
arduino/ArduinoCore-avr#42
@matthijskooijman
I've been struggling with exactly what you describe in your comments here. In my one-master, one-slave bus, my slave is periodically (~1:100,000 chances) sending its ACK one clock pulse too soon and then my calls to the stock Wire library never return and my firmware is toast. I've dropped the code in this repo in place of my Wire library, but I'm still getting the lockups! I'd be happy for any advise you can offer me. I've been in #arduino in freenode.net irc for a few days trying to figure this out and I've gained a ton of conceptual understanding for the lockups and why they might happen, but haven't yet arrived to a solution, join if you can!
The comment here says you think the master/slave desync is caused by noise on the bus. I'm not so sure it's noise. My leading theory right now is that glitches (tall, thin voltage spikes) I've seen in SDA just after or before many of the the slave/master control ACK/NAK bit handoffs are causing the desync. I often see ~200ns gaps (with 400kHz SCL) during these handoffs where neither the master nor the slave actually does the job of pulling down SDA (when both agree that it should be low). During these gaps, the pullups I have cause little spikes in the waveform. Depending on what SCL is doing at the time, these could be interpreted by either member of the bus as start/stop/repeated start conditions or something else I haven't thought of. Here's what I'm talking about:
- SDA is channel 1 (yellow)
- SCL is channel 2 (magenta)
- The low to high clock transition centered on the capture is for the ACK/NAK bit which in this case is sent by the master. So here, I think the master (Arduino MEGA2560) waits 200ns too long to take control of the SDA line from the slave for its NAK.
Similarly, for the opposite case ACK/NAK handoff, where the master just sent 8 bits to the slave and the slave takes the bus to send its ACK/NAK bit to the master, the master (arduino) waits 200ns too long to take SDA back after, causing a 4.2V spike that just should not be:
- here again, the low-->high clock transition centered on the trace is for the slave to send its ACK/NAK bit to the master.
In summary: My salve is taking control of SDA pretty much right at the falling edge of SCL which looks like it makes sense to me (I actually can't find the requirement for this in the spec). The Arduino is waiting 340ns after SCL has fallen to take control of SDA after a NAK/ACK which leaves SDA uncontrolled for 200ns causing unwanted glitches/spikes, which might be the root cause of the lockups that the timeout approach this repo uses to recover from the issue.