Skip to content

Allow variable strings to be defined for a string offset #5013

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

Girgias
Copy link
Member

@Girgias Girgias commented Dec 15, 2019

This stops truncating strings with multiple bytes to one byte and using the first byte as the char to replace the string offset.

At the same time allow to pass an empty string to remove the byte from the string.

This partially invalidates Bug 71572.

@nikic
Copy link
Member

nikic commented Dec 15, 2019

This is an interesting idea... implicit truncation is definitely not good, but I'm not totally sure if I like these semantics better than making the assignment of multi-char strings an error.

@Girgias
Copy link
Member Author

Girgias commented Dec 15, 2019

This is an interesting idea... implicit truncation is definitely not good, but I'm not totally sure if I like these semantics better than making the assignment of multi-char strings an error.

I'll bring it up on internals for discussion seems like the best way forward to see what people think.

@Girgias Girgias force-pushed the variable-str-length-str-offset-replacement branch 3 times, most recently from 855242b to f5efc92 Compare December 16, 2019 05:57
This stops truncating strings with multiple bytes to one byte and using the first byte as the char to replace the string offset.

At the same time allow to pass an empty string to remove the byte from the string.

This partially invalidates Bug 71572.
Because I don't know how to fix the error with the char * defintion
This reverts commit 1c9a8aa.
Due to some null bytes disapering and I have no idea why.
This reverts commit 236c583.
@Girgias Girgias force-pushed the variable-str-length-str-offset-replacement branch from f5efc92 to 0b6d52c Compare December 16, 2019 09:17
@claudepache
Copy link
Contributor

Problem: What is the expected result of

$str = "あello world"; // utf-8-encoded
$str[0] = "H";

? I guess, a non-surprising result is not really possible, because it would need the knowledge of the expected encoding of the string.

@Girgias
Copy link
Member Author

Girgias commented Dec 16, 2019

Problem: What is the expected result of

$str = "あello world"; // utf-8-encoded
$str[0] = "H";

? I guess, a non-surprising result is not really possible, because it would need the knowledge of the expected encoding of the string.

You'll just get a byte stream which doesn't have a correct encoding. This is the case currently and has been since forever. This doesn't change any of the behaviour on write.

Now if one day we have unicode (or some sort of it) support then I would expect the codepoint to be altered and not the byte.

@cmb69
Copy link
Member

cmb69 commented Dec 19, 2019

You'll just get a byte stream which doesn't have a correct encoding.

That. And I'd rather not "improve" support for something that can't generally work; instead I'd let multibyte assignments fail.

@Girgias
Copy link
Member Author

Girgias commented Dec 19, 2019

You'll just get a byte stream which doesn't have a correct encoding.

That. And I'd rather not "improve" support for something that can't generally work; instead I'd let multibyte assignments fail.

ACK.

Closing this in favour of an implementation which adds an explicit warning for multi bytes values.
Will try to code this when I've got time.

@Girgias Girgias closed this Dec 19, 2019
@Girgias Girgias deleted the variable-str-length-str-offset-replacement branch January 7, 2020 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants