Skip to content

rework pulseIn function to solve issue #2409 #2495

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 29, 2015

Conversation

facchinm
Copy link
Member

This patchset reworks pulseIn library function to get a fixed behavior, untouched by GCC/libc optimizations.
It also adds an helper function (pulseInLong) that uses a millis() implementation to handle in a better way IRQ affected and long pulse scenarios.
Tested against currently provided GCC and 4.9.1 version on all available pins with an external pulse generator.
Comments/enhancements (especially in the ASM part) are very welcome.

@cmaglie cmaglie added this to the Release 1.5.9 milestone Dec 12, 2014
@cmaglie cmaglie self-assigned this Dec 12, 2014
@cmaglie
Copy link
Member

cmaglie commented Jan 6, 2015

Hi @facchinm

Thanks for the patch!

I see that there are three different asm loops (that checks the pins respectively on PORTA, PORTB and PORTC), is it possible to refactor them into a single loop and pass the PORT register as a parameter? I see that the port register is actually passed as parameter, but ignored in the asm subroutine. I'm thinking about the Mega boards that uses also PORTD and PORTE.

Another question: have you made the assembly subroutine from scratch? or have you generated them with gcc?

@facchinm
Copy link
Member Author

facchinm commented Jan 7, 2015

Hi @cmaglie

I've been struggling with the problem of making the code more generic by passing the port register as a parameter (as you saw in the function signature) for many days but no solution was enough portable (all involved Z register). Also, GCC refuses to compile the line if the in opcode's second parameter was not static.

Since the common use case does not involve changing the port, the additional loops could be purged by the compiler (I'm thinking about the dead-strip flag on ARM architecture but I don't know if it applies also to newer avr-gcc)

The routine(s) are all handwritten and the introduction of in opcode gives a good granularity boost (2x) over the previous version, so I thought that the (maybe removable) increased code size could be an acceptable downside.

I also noticed some typos in the ASM section looking back at the code, so I'll make an updated pull request including PORTE handling and these fixes

@cmaglie cmaglie added Component: Core Related to the code for the standard Arduino API Type: Regression Something that used to work and now doesn't Version: 1.5.x labels Jan 7, 2015
@cmaglie
Copy link
Member

cmaglie commented Jan 7, 2015

Uhmm... are you sure that the asm part is keeping the original behaviour?
I can see at least two discrepancies in your implementation:

  1. width is an unsigned long (== uint32_t) but you are using 16 bit operations in the asm loops.

  2. the timeout in the original pulseIn is intended for the whole pulseIn operation (including the time spent waiting for the start pulse) instead you seems to discard the elapsed time because you don't pass the variable numloops to the asm subroutine.

This from a raw analysis of the patch. I guess that the asm part, being handwritten from scratch, should be carefully checked.

Instead, what do you think about generating the asm with gcc and brutally copying it into a .s file? or even better as inline __asm__ but I guess that this requires much more work.

@facchinm
Copy link
Member Author

Hi @cmaglie, you are right on (almost :) ) everything so the expanded patcheset includes fixes for:

  1. width as uint32_t -> I only tested the routine with short pulses ( > 100 HZ - my fault) and it was broken with for pulses < 80Hz (fixed in adca790)

  2. the timeout now works as intended (patch a72e1b7), passing to the asm function only the "remaining" timeout

  3. additional ports support (for ATMega 2560 powered boards) added in ca18193 (untested for lack of hardware)

    Now the philosophical questions:

  4. __inline asm is not guaranteed to leave everything as-is (I tried it at first and it gave very different results on gcc 4.3 and 4.9)

  5. copy-paste gcc-generated asm code would lead (in my opinion) to unreadable and non understandable code hanging there forever... The call does not involve 4 instructions that can be left there as black working magic but if far more complicated and writing it by hand with good comments can leave space to other people for enhancements.

At the end, I macroified the ASM call (patch f06cb4f) to be maintainable without going crazy

If someone could test it on ArduinoMega it would be great!

@cmaglie
Copy link
Member

cmaglie commented Jan 18, 2015

Hi @facchinm

I tried to compile the following sketch for a Uno with your patch:

void setup() {
  pulseIn(9, HIGH, 10000L);
}
void loop() {
}

There is an increase in size from 832 to 1142 (+310 bytes), and this is not acceptable. I suspect that the big switch/case force the linker to keep all the instances of the macro.

I beg you to give the compiler a go, you may be surprised by the quality of produced code.
I usually use the following command line options to generate the asm listing:
-gstabs -Wa,-ahlmsd=output.lst -dp -fverbose-asm

To avoid the need to define macros we can obtain the port address once and use it into the compiled function using a volatile register definition like:

unsigned long ASM_busyLoop(uint8_t bit, volatile uint8_t *port,
                       uint8_t stateMask, unsigned long maxloops) {
    ...
    while ((*port & bit) == stateMask)
    ...
}

and call it from pulseIn(..) with:

unsigned long width = busyLoop(bit, portInputRegister(port), stateMask, maxloops);

doing this way the compiler will take care of all the boring portInputRegister(...) stuff.

Finally, I'd like also to have all the three loops as asm code not only the last one.

@facchinm
Copy link
Member Author

Hi @cmaglie,
I was aware that the increased size could be an issue, so I patched the routine to pass the pointer to the port as a parameter (thank you for the tip, really) and the main problem should be solved (150d430)

By the way, I also tried to produce a readable generated asm listing using your command line options and the result was great (apparently) but not enough optimized.
In fact, the C routine

unsigned long pulseInSimpl(volatile uint8_t *port, uint8_t bit, uint8_t stateMask, unsigned long maxloops);
{
    unsigned long width = 0;
    // wait for any previous pulse to end
    while ((*port & bit) == stateMask)
        if (--maxloops == 0)
            return 0;

    // wait for the pulse to start
    while ((*port & bit) != stateMask)
        if (--maxloops == 0)
            return 0;

    // wait for the pulse to stop
    while ((*port & bit) == stateMask) {
        if (++width == maxloops)
            return 0;
    }
    return width;
}

performed very well with GCC 4.8, using only 16 instructions per cycle, while the readable ASM listing takes more than 48. Of course the objdumped object file also takes 16 instructions but is completely unreadable. So I decided to give another chance to the handwritten asm (which takes 17).

2- I understand your request to rewrite the setup loops in assembly, because of course also the timeout calculation is compiler dependent.

As they are not critical code I decided to let them as they are for the moment, only changing the maxloops calculation divider to be something more realistic.

3- I noticed that the new git HEAD contains another solution. As this pull request has gone way too long I'd like to simplify the history and rebase it upon the current head. Then I'd open another pull request and close this one. Is it ok?

@cmaglie cmaglie modified the milestones: Release 1.6.0, Release 1.6.1 Feb 18, 2015
@facchinm facchinm force-pushed the test-pulseIn-pullreq2 branch from 19e108a to c437761 Compare March 2, 2015 12:02
@facchinm
Copy link
Member Author

facchinm commented Mar 2, 2015

Fixed as requested with GCC generated asm code including all three loops

@cmaglie
Copy link
Member

cmaglie commented Mar 2, 2015

@ArduinoBot build this please

@cmaglie cmaglie added the Architecture: AVR Applies only to the AVR microcontrollers (Uno, etc.) label Apr 15, 2015
@cmaglie cmaglie assigned facchinm and unassigned cmaglie May 12, 2015
@ffissore ffissore modified the milestones: Release 1.6.1, Release 1.6.5 May 20, 2015
This reverts commit 8ddc519.
To be substituted by ASM generated code
@facchinm facchinm force-pushed the test-pulseIn-pullreq2 branch 2 times, most recently from 41b9e05 to a7d81d0 Compare May 29, 2015 13:04
@facchinm
Copy link
Member Author

To address issue #3112 , I've added two commits (bb3963c and a7d81d0) to

  • fix timeout definition for Due
  • have a sort of "common" implementation (GCC agnostic)
    Resolution gets a nice 2x boost also 😄

facchinm added a commit that referenced this pull request May 29, 2015
rework pulseIn function to solve issue #2409
@facchinm facchinm merged commit 3859fe6 into arduino:ide-1.5.x May 29, 2015
@facchinm facchinm deleted the test-pulseIn-pullreq2 branch January 4, 2017 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Architecture: AVR Applies only to the AVR microcontrollers (Uno, etc.) Component: Core Related to the code for the standard Arduino API Type: Regression Something that used to work and now doesn't
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants