Skip to content

I2C Slave race condition leads to handler being out of sync  #3699

Closed
@shahokun

Description

@shahokun

We have an Arduino Due acting as the slave device on an I2C bus. Rarely, we will see the Arduino get "stuck" and be unresponsive. When we attached a debugger to the running processor, we saw that it was constantly getting bombarded with I2C interrupts (WIRE_ISR_HANDLER). The status flag indicated that the Arduino was in SLAVE_SEND mode and had already transmitted data (TWI TXRDY bit = 0). However, the I2C transmission would never complete, the I2C clock was pulled low, and the processor was stuck in an infinite loop of servicing an interrupt but not doing anything (i.e. none of the if blocks were entered).

The confounding factor was that the SVREAD bit of the TWI status register was 0, indicating a Master Write. If this were the case, then the correct status should be SLAVE_RECV. Working backwords from this, the likely culprit would result from a race condition at the following lines (339-343) in TwoWire::onService()

// Transfer completed
TWI_EnableIt(twi, TWI_SR_SVACC);
TWI_DisableIt(twi, TWI_IDR_RXRDY | TWI_IDR_GACC | TWI_IDR_NACK
    | TWI_IDR_EOSACC | TWI_IDR_SCL_WS | TWI_IER_TXCOMP);
status = SLAVE_IDLE;

Assume a Master Read operation just completed and the Arduino has status = SLAVE_SEND. The Arduino is wrapping up the I2C transaction and enables the SVACC interrupt (TWI_EnableIt(twi, TWI_SR_SVACC);). After this happens, but before the status = SLAVE_IDLE; line is executed, a new I2C transaction comes in and triggers an interrupt. This time, it is a Master Write operation. It jumps into the handler, but because status has not been set to SLAVE_IDLE, it does not execute the first if block which would properly set status to SLAVE_RECV. Moreover, all of our Master Write transactions are two or more bytes, so the TWI would stretch the clock indefinitely because it never reads the first byte due to status being incorrect.

This is a bit theoretical because we will need to do extended testing to see if the problem goes away. The proposed solution is to reverse the order of the cleanup steps:

// Transfer completed
status = SLAVE_IDLE;
TWI_DisableIt(twi, TWI_IDR_RXRDY | TWI_IDR_GACC | TWI_IDR_NACK
    | TWI_IDR_EOSACC | TWI_IDR_SCL_WS | TWI_IDR_TXCOMP);
TWI_EnableIt(twi, TWI_IER_SVACC);

This keeps the status flag protected against synchronization issues. It also reverses the order of the DisableIt()/EnableIt() at lines 301-303 so that you never have both groups of interrupts enabled at once. I would change the flags to use TWI_IDR_* for all DisableIt() and TWI_IER_* for all EnableIt() operations, for posterity; however, TWI_IDR_x, TWI_IER_x, and TWI_SR_x are the same value for all x, so this does not lead to any incorrect behavior.

Could anyone provide advice on whether these sorts of nested interrupts could occur in the TWI peripheral? I wasn't able to find explicit information about how the NVIC on the Due interacts with the TWI interrupt registers. I assume that since specific TWI flags are explicitly enabled and disabled that they are treated separately. In the proposed scenario, the Due would be in the handler for a TXCOMP interrupt, then get interrupted by a SVACC interrupt.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions