Skip to content

Fix CFCharacterSet equality detection #1308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Fix CFCharacterSet equality detection #1308

wants to merge 1 commit into from

Conversation

sashabelonogov
Copy link
Contributor

@sashabelonogov sashabelonogov commented Nov 9, 2017

I was investigating why the test checking the CharacterSet equality was failing.

Firstly, I found out that the _hashValue of the CFCharacterSet is always 0. The method that checks equality(__CFCharacterSetEqual) is checking the _hashValue, which is correct and the method calculating hash(__CFCharacterSetHash) seemed to do it properly but it seemed that it was simply never called to set the _hashValue field.
So, as the solution for it, I decided to add this method call at the step of CFCharacterSet initialization when _hashValue should not be 0. After that the tests started to execute correctly, but I'm not sure if it's a correct solution, maybe there is a better way to do it.

Nevertheless, __CFCharacterSetEqual shouldn't rely only on hash but it also should check the contents, which it did, but most likely not correctly. I found out that here:

const UniChar *buf1 = __CFCSetStringBuffer((CFCharacterSetRef)cf1);
const UniChar *buf1End = buf1 + __CFCSetStringLength((CFCharacterSetRef)cf1);
const UniChar *buf2 = __CFCSetStringBuffer((CFCharacterSetRef)cf2);
const UniChar *buf2End = buf2 + __CFCSetStringLength((CFCharacterSetRef)cf2);

 while ((buf1 < buf1End) && (buf2 < buf2End)) {
                        UniChar char1 = *buf1;
                        UniChar char2 = *buf2;

                        if (char1 != char2) return false;

                        do { ++buf1; } while ((buf1 < buf1End) && (char1 == *buf1));
                        do { ++buf2; } while ((buf2 < buf2End) && (char2 == *buf2));
                    }

because of && operator in while ((buf1 < buf1End) && (buf2 < buf2End))
it was not comparing all the characters in both string buffers and replacing it with || fixes the issue.
Also, for some cases we can make the comparison here easier simply by checking the length of the strings beforehand, which is also implemented in this PR.

Please, let me know what you think

Copy link
Contributor

@alblue alblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor observations; plus, I'd like @phausler to have a look as well.

@@ -1365,7 +1371,9 @@ Boolean _CFCharacterSetInitWithCharactersInString(CFMutableCharacterSetRef cset,

if (0 == length) {
__CFCSetPutHasHashValue(cset, true); // _hashValue is 0
} else if (length > 1) { // Check for surrogate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't move the comments as it increases noise in the delta. Can you move that back here, and then it will be clearly seen you're just adding a new line in.

Copy link
Contributor Author

@sashabelonogov sashabelonogov Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this comment is not related to the line that I was adding, so I decided to move it below, if you think it's still will be clear when it will "Check for surrogate" - I can leave it where it was.

@@ -1406,7 +1414,9 @@ CFCharacterSetRef CFCharacterSetCreateWithCharactersInString(CFAllocatorRef allo

if (0 == length) {
__CFCSetPutHasHashValue(cset, true); // _hashValue is 0
} else if (length > 1) { // Check for surrogate
} else if (length > 1) {
__CFCharacterSetHash(cset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, plus the indentation of the _CFCharacterSetHash doesn't appear to be indented correctly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be fixed, thanks for noticing

@@ -1547,7 +1558,7 @@ CFCharacterSetRef CFCharacterSetCreateInvertedSet(CFAllocatorRef alloc, CFCharac
Boolean _CFCharacterSetInitMutable(CFMutableCharacterSetRef cset) {
if (!__CFCSetGenericInit(cset, __kCFCharSetClassBitmap| __kCFCharSetIsMutable)) return false;
__CFCSetPutBitmapBits(cset, NULL);
__CFCSetPutHasHashValue(cset, true);
__CFCSetPutHasHashValue(cset, true); // Hash value is 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this comment copied and pasted from above? I thought the comment was regarding the condition rather than the action of the line, so I'm not sure it makes sense to add it here - but @phausler would know more, I suspect.

Copy link
Contributor Author

@sashabelonogov sashabelonogov Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply added it for the consistency, in all the similar places the same comment exist.

@alblue alblue requested a review from phausler November 10, 2017 11:25
@parkera
Copy link
Contributor

parkera commented Nov 16, 2017

I think that #1305 finally has the other character set fixes we needed. We should check the behavior of this bug vs that once it lands too.

@sashabelonogov
Copy link
Contributor Author

sashabelonogov commented Nov 16, 2017

Yes, the tests will execute successfully with the changes done in #1305, surely the comparison of buffers with memcmp idea is better than what is implemented in this PR. The simple comparison of lengths is also implemented in the other PR. I will check the situation with _hashValue later.

But anyways, it means that most (if not all of them) of the changes implemented in this PR are not relevant anymore.

@alblue
Copy link
Contributor

alblue commented Nov 28, 2017

Since #1305 has been merged (then reverted in #1328 and then re-reverted in #1338) I don't think this PR is necessary as is. However, it might be worth testing whether or not the hashValue change can be applied as a subsequent PR if needed.

@alblue alblue closed this Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants