Skip to content

Fix #74264: grapheme_sttrpos() broken for negative offsets #7189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

cmb69
Copy link
Member

@cmb69 cmb69 commented Jun 23, 2021

We must not assume that usearch_last() gives the proper result for
negative offsets. Instead we'd need to continue to search backwards
(usearch_previous) until we find a proper match. However, apparently
searching backwards is broken, so we work around by searching forward
from the start of the string until we pass the offset_pos, and then
use the previous result.

We must not assume that `usearch_last()` gives the proper result for
negative offsets.  Instead we'd need to continue to search backwards
(`usearch_previous`) until we find a proper match.  However, apparently
searching backwards is broken, so we work around by searching forward
from the start of the string until we pass the `offset_pos`, and then
use the previous result.
@cmb69 cmb69 added the Bug label Jun 23, 2021
@nikic
Copy link
Member

nikic commented Jun 28, 2021

However, apparently searching backwards is broken

Broken how?

@cmb69
Copy link
Member Author

cmb69 commented Jun 28, 2021

For the test string "déjàààà", usearch_last() for "à" reports the last character (6) which is correct. A following usearch_previous() reports the third but last character (4), though. The same happens with usearch_preceding() passing 6 as position. I tested with recent ICU 66.1 and 68.2 on Windows.

@nikic
Copy link
Member

nikic commented Jul 5, 2021

@cmb69 Looking through the docs, I believe this should be a matter of setting the USEARCH_OVERLAP attribute.

@cmb69
Copy link
Member Author

cmb69 commented Jul 5, 2021

Thanks, that improves the backward search, but there is still an issue at the end of the string; e.g. grapheme_strrpos('déjàààà', 'à', -2) returns 6, but that should be 5.

@nikic
Copy link
Member

nikic commented Jul 5, 2021

I've tried a lot of different variants, but in the end I can only agree with your conclusion that usearch_previous/preceding are buggy. It looks like generally the starting offset of the match must be strictly lower than the starting position, but if the match is at the very end of the string, then the match can be at the starting position. I think we'll have to go back to your first implementation :(

@cmb69 cmb69 closed this in 28c9376 Jul 5, 2021
@cmb69 cmb69 deleted the cmb/74264 branch July 5, 2021 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants