Description
NOTE: I have filed this both here and as feedback FB16417968, unsure which was the most relevant place.
CFStringGetCStringPtr
is documented to return a string pointer in the given encoding, and "is simply an optimization", but it does not actually check whether the string is of that encoding, only whether it can be represented in that encoding.
This means that for example the string "♥", which can be represented in UTF-16 as the hex bytes "65 26", can end up being interpreted as the same UTF-8 hex bytes "65 26", but which mean something completely different, namely "e&". The expected result would be that CFStringGetCStringPtr
returned NULL in this case.
An alternative to resolving this issue would be to update the documentation to state this footgun.
Reproducer
The example code below shows the discrepancy between CFStringGetCStringPtr
and CFStringGetCString
:
#include <CoreFoundation/CoreFoundation.h>
int main() {
// Supposed to contain a "♥".
char* heart_utf16le = "\x65\x26";
char* heart_utf8 = "\xE2\x99\xA5";
// Create CFString with the heart from UTF-16 / Unicode.
CFStringRef s = CFStringCreateWithCString(kCFAllocatorDefault, heart_utf16le, kCFStringEncodingUTF16LE);
// CFStringGetCString converts to UTF-8.
char buf[20];
CFStringGetCString(s, (char*)&buf, 20, kCFStringEncodingUTF8);
printf("%s\n", buf); // prints "♥"
// But CFStringGetCStringPtr completely ignores the UTF-8 conversion
// we asked it to do, i.e. a huge correctness footgun!
const char* ptr = CFStringGetCStringPtr(s, kCFStringEncodingUTF8);
printf("%s\n", ptr); // prints "e&"
}
Run with:
clang -framework CoreFoundation example.c && ./a.out
Expected result:
♥
(null)
Actual result:
♥
e&
Occurs on:
- Mac OS X 10.12.6 on x86_64.
- macOS 14.7.1 on Aarch64.
- macOS 15.1.1 in VM.