Description
Hey
Not sure whether this is a bug or a hole in knowledge...
I have a PHP function invoked from V8Js which returns raw binary data. The idea being that a user writing JS scripts in their sandbox can use this data with another V8Js-invoked PHP function to send it to an API.
The issue is the file appears to undergo a bit of conversion and i'm not sure of the best one-size-fits-all approach that'll work whether the file is binary or not. Ideally, no conversion at all would be best :)
We'll call the function called from JS PHP.get_file()
for simplicity and assume that it does a basic file_get_contents() in PHP, returned directly.
- Original file is a 32404 byte PDF
- When examining the length of data to return, in PHP, it gives me the correct value of 32404
- When examining the length of the data in the JS variable i assign the return value to, it gives me 31115
- Using PHP-side functions called from JS to encode/decode base64 is pointless, as problem is back as soon as I decode and return the value.
- if i return a base64 encoded string from my PHP function instead, take the value output from JS and decode it manually, it's also fine. 32404.
The only thing that worked in this case after randomly trying stuff was, instead of returning the value directly, doing a quick conversion first:
function get_file(string $filename): string
{
.... stuff ....
$content = file_get_contents($filename);
return mb_convert_encoding($content, 'UTF-8', 'ASCII');
}
which seems a little odd, given that PHP tells me that the result of the mb_convert_encoding is 46013 but the variable in JS it's directly assigned to now reports it as 32404....the correct number. mb_detect_encoding at this stage gives me nothing, so i hard-coded "ASCII" for testing.
Until i pass the binary data back into another PHP function from my JS. Now it's back to 46013 and a quick mb_convert_encoding the other way (ASCII from UTF-8) gives me back my original 32404 file. mb_detect_encoding before that gives me UTF-8
Encodings give me headaches on good days, so apologies if that was a long-winded way to explain a simple thing. I'm hoping that there is a way this can be resolved WITHOUT the JS code writer having to change their stuff.
Is this a bug? Or something that's easily solvable? Hoping that there's:
- a setting/flag somewhere to prevent conversion taking place at all
- a solution that works regardless of whether the file is binary or not
- maybe even a decent simple base64 decode function for JS. i tried a couple, but all changed the file in some manner or other - not great when you're also using checksums :-p