-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Fix CFCharacterSet equality detection #1308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CFCharacterSet equality detection #1308
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor observations; plus, I'd like @phausler to have a look as well.
@@ -1365,7 +1371,9 @@ Boolean _CFCharacterSetInitWithCharactersInString(CFMutableCharacterSetRef cset, | |||
|
|||
if (0 == length) { | |||
__CFCSetPutHasHashValue(cset, true); // _hashValue is 0 | |||
} else if (length > 1) { // Check for surrogate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't move the comments as it increases noise in the delta. Can you move that back here, and then it will be clearly seen you're just adding a new line in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this comment is not related to the line that I was adding, so I decided to move it below, if you think it's still will be clear when it will "Check for surrogate" - I can leave it where it was.
@@ -1406,7 +1414,9 @@ CFCharacterSetRef CFCharacterSetCreateWithCharactersInString(CFAllocatorRef allo | |||
|
|||
if (0 == length) { | |||
__CFCSetPutHasHashValue(cset, true); // _hashValue is 0 | |||
} else if (length > 1) { // Check for surrogate | |||
} else if (length > 1) { | |||
__CFCharacterSetHash(cset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, plus the indentation of the _CFCharacterSetHash
doesn't appear to be indented correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be fixed, thanks for noticing
@@ -1547,7 +1558,7 @@ CFCharacterSetRef CFCharacterSetCreateInvertedSet(CFAllocatorRef alloc, CFCharac | |||
Boolean _CFCharacterSetInitMutable(CFMutableCharacterSetRef cset) { | |||
if (!__CFCSetGenericInit(cset, __kCFCharSetClassBitmap| __kCFCharSetIsMutable)) return false; | |||
__CFCSetPutBitmapBits(cset, NULL); | |||
__CFCSetPutHasHashValue(cset, true); | |||
__CFCSetPutHasHashValue(cset, true); // Hash value is 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this comment copied and pasted from above? I thought the comment was regarding the condition rather than the action of the line, so I'm not sure it makes sense to add it here - but @phausler would know more, I suspect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simply added it for the consistency, in all the similar places the same comment exist.
I think that #1305 finally has the other character set fixes we needed. We should check the behavior of this bug vs that once it lands too. |
Yes, the tests will execute successfully with the changes done in #1305, surely the comparison of buffers with memcmp idea is better than what is implemented in this PR. The simple comparison of lengths is also implemented in the other PR. I will check the situation with _hashValue later. But anyways, it means that most (if not all of them) of the changes implemented in this PR are not relevant anymore. |
I was investigating why the test checking the CharacterSet equality was failing.
Firstly, I found out that the _hashValue of the CFCharacterSet is always 0. The method that checks equality(__CFCharacterSetEqual) is checking the _hashValue, which is correct and the method calculating hash(__CFCharacterSetHash) seemed to do it properly but it seemed that it was simply never called to set the _hashValue field.
So, as the solution for it, I decided to add this method call at the step of CFCharacterSet initialization when _hashValue should not be 0. After that the tests started to execute correctly, but I'm not sure if it's a correct solution, maybe there is a better way to do it.
Nevertheless, __CFCharacterSetEqual shouldn't rely only on hash but it also should check the contents, which it did, but most likely not correctly. I found out that here:
because of
&&
operator inwhile ((buf1 < buf1End) && (buf2 < buf2End))
it was not comparing all the characters in both string buffers and replacing it with
||
fixes the issue.Also, for some cases we can make the comparison here easier simply by checking the length of the strings beforehand, which is also implemented in this PR.
Please, let me know what you think