Skip to content
This repository was archived by the owner on Jan 28, 2021. It is now read-only.

Added better packet validity checking #65

Closed
wants to merge 5 commits into from
Closed

Added better packet validity checking #65

wants to merge 5 commits into from

Conversation

PaulZC
Copy link
Collaborator

@PaulZC PaulZC commented Dec 28, 2019

Hi Nathan (@nseidle),
OK. I've made some good progress.
I'm concentrating on getting the GeoFence example working again, and I seem to have done that but it is a bodge at present.
Please don't merge this PR as-is. But it does contain useful clues on where to head next...
I'm testing the code on an ATmega328 RedBoard (Qwiic) and a ZOE-M8Q breakout (Qwiic) but am using SoftwareSerial: RX on Pin 2; TX on Pin 3.
To help, I'm sniffing the packets using two copies of u-center tee'd into the Tx and Rx lines via two FTDI cables.

First up, the "Uno" doesn't seem to like debugPrint((char*)F("")); so I've changed all of those to _debugSerial->print(F("")); instead. The RAM usage is the same.

Next, the good stuff:

I really like what you've done with the ACK strategy. It's all good.

We have this dilemma of what to do when packetCfg and packetAck can be overwritten whatever is next in the buffer. I was going to suggest that for Serial, process should return as soon as it has received a valid packet, even if there is still data in the buffer. But, as you have pointed out, that is way more tricky for I2C. Maybe we could push the unprocessed I2C data into a RingBufferN and come back to it. But RingBufferN isn't supported on all architectures and, as you say, it means using more RAM. It is a good trick though.
So, I suggest we stick with what you have. It is very nearly there.

I have spotted a couple of gremlins:

In waitForNoACKResponse packetCfg.cls and .id will appear valid as soon as they are process'd, way before the whole message has been received. Maybe this is limited to serial where the data can come in slowly enough for checkUblox to return part way though a packet? But I was seeing multiple CLS/ID match but failed CRC errors when really it was just because the packet is still being processed. My bodge is to check that currentSentence == NONE i.e. it is no longer UBX - because the full packet has been received. Now this is just a bodge. There will be many ways it can fail if more data arrives. But at least it checks that the packet is not still being processed.

The other possible gremlin is where waitForACKResponse checks if packetCfg.len == 1. I see where you're going with that, and I like it, but I think it misses the point that most gets are zero length. The CFG PRT is an odd-ball with its length of 1.

Anyway, if you do try this PR, it does appear to let the GeoFence example work correctly again - on SoftwareSerial on the ZOE (I haven't tried any of the other permutations - yet). But, again, don't use it as-is. You'll need a more robust way of checking that the end of packet has been reached before checking the .cls and .id.

All the best,

Paul

@PaulZC PaulZC mentioned this pull request Dec 28, 2019
@PaulZC PaulZC changed the title Do Not Merge As-Is! But contains useful things! Added better packet validity checking Jan 3, 2020
@PaulZC
Copy link
Collaborator Author

PaulZC commented Jan 3, 2020

Hi Nathan (@nseidle),

OK. I think I've got it working nicely. The geofence example seems to work fine on both I2C and SoftwareSerial on the NEO and ZOE when connected to an ATmega328 RedBoard. I've only got a couple of boards with me, and so I haven't yet tested all of the permutations, but my confidence in this PR is high...

The second commit (fc73deb) adds three states for packetCfg.valid (and packetAck.valid): NOT_DEFINED, VALID and NOT_VALID (they're defined in an enum and are prefixed with SFE_UBLOX_PACKET_VALIDITY_). Where we would normally initialize .valid to false, we now set it to NOT_DEFINED. It becomes VALID or NOT_VALID once the full packet has been received and the checksum checked. .valid will only be NOT_VALID if the checksum fails.
In WaitForNoACKResponse, we check that .valid is either VALID or NOT_VALID when checking the class and ID. This prevents the problem I identified earlier (in e16a266) that class and ID can appear valid as soon as they arrive and can allow the function to exit before all the bytes have arrived if they are coming in slowly.

I'm afraid I really didn't follow your logic about packetCfg.len needing to be 1 in WaitForACKResponse. I've added my explanation in the comments and have tweaked the code there. Please have a look and let me know if I've missed anything.

One final change is that factoryReset now copies the default config to the permanent config, and loads the permanent config into the current config. u-center seems to recommend doing it this way. I changed it because I'm not 100% convinced that hardReset loads them when doing a cold start. (I was having problems with the low power setting once it had been set in the permanent config and needed a reliable way to clear it. If you try to power up the NEO with low power mode enabled in the permanent config, the module really doesn't like it and resets periodically... We should probably add a warning to the example...)

Enjoy!
Paul

@PaulZC
Copy link
Collaborator Author

PaulZC commented Jan 4, 2020

Ah... Hang on! We're not done yet!

There are many places where we still assume sendCommand will return a boolean:

if (sendCommand(packetCfg, maxWait) == false)

We need to update all of those for sfe_ublox_status_e...

Another commit will follow shortly!

sendCommand now returns sfe_ublox_status_e but many functions were still assuming it returned a boolean. This commit fixes that...
**It hasn't yet been tested on the ZED!**
@PaulZC
Copy link
Collaborator Author

PaulZC commented Jan 4, 2020

Commit 9eccda5 now correctly checks the sfe_ublox_status_e values returned by sendCommand (previously they were boolean).
It is not yet fully tested - the SETVAL and GETVAL functions need to be tested on the ZED.
So far, it has only been tested on the NEO using I2C and SoftwareSerial connections to an ATmega328 RedBoard.
Enjoy!
Paul

@PaulZC
Copy link
Collaborator Author

PaulZC commented Jan 5, 2020

With hindsight, might it be better if sendCommand went back to returning a Boolean? At the moment, we need to know if it will return DATA_SENT or DATA_RECEIVED when we call it, depending on what we are doing...
I’m happy to redo this PR as needed - let me know.
Also, I should probably split the debugPrints and factoryReset into separate PRs?
All the best,
Paul

@nseidle
Copy link
Member

nseidle commented Jan 6, 2020

  1. Thank you for this work and all the comments. Looks really good.

  2. I think it's fine to have a return of DATA_SENT or DATA_RECEIVED based on what we are doing. Requires the caller to think, but that's a good thing.

  3. It's ok to combine a few things. I can see the debug changes and factoryReset changes. All good.

  4. Testing with the ZED with your changes, everything checks out. VALGET works. The only thing that doesn't work is VALSET example. It looks like VALSET only returns an ACK (image below). However, it's failing repeatedly...
    image
    The PVT packet is coming in at the same time and the logic within waitForACKResponse is getting cross threaded. A valid config packet is coming in (PVT) but it doesn't match the CLS/ID of the VALSET so the waitForACKResponse routine continues to wait until it times out. We need to tell waitForACKResponse that we only want a response.

Perhaps we need to take a different approach: create getData and sendData functions to replace or augment the sendCommand in some functions (like SETVAL). We would make it clear from the calling if we expect an ACK, an ACK with Data, or just Data.

@sslupsky
Copy link

I've been reviewing some of the changes.

One thing comes to mind as I reviewed one of the comments in the source:

//Returns false if we timed out, got a NACK (command unknown), or had a CLS/ID mismatch

and
return (SFE_UBLOX_STATUS_COMMAND_UNKNOWN); //Received a NACK

If I recall corrected, a NACK doesn't necessarily mean the command is unknown. A NACK can be returned for a valid command. I think the NACK means that the module is busy and cannot respond.

@PaulZC PaulZC closed this Feb 24, 2020
@PaulZC PaulZC deleted the PaulZC_testing_#64 branch February 24, 2020 07:54
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants