Skip to content

Fix parsing of semi-reserved tokens at offset > 4GB #6616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

TysonAndre
Copy link
Contributor

Possibly related to GH-5668
Also affects php 8.0 but this changes the
datastructures in public(?) header files.
(binary ABI change)
PHP 7.4 wouldn't even parse a class with the below example.

  1. Fix parsing semi-reserved tokens such as namespace
    with an offset that's larger than 2**32
  2. Fix parsing some tokens when their length is larger than 2**32

Offset is relative to yy_start(the start of the file?)

Low priority because 4GB php files and the configured memory limits to support them are extremely rare - this was only noticed when looking into edge cases in the parser - this test case is artificial

Instead of parsing the token 'namespace', the character range from 4GB before is used due to truncation to 32 bits

php > ini_set('memory_limit', '12G');
php > eval(str_repeat(' ' , 2**32) . 'class MyClass{ public static function namespace() {}}');
php > MyClass::namespace();

Warning: Uncaught Error: Call to undefined method MyClass::namespace() in php shell code:1
Stack trace:
  thrown in php shell code on line 1
php > var_export((new ReflectionClass('MyClass'))->getMethods());
array (
  0 =>
  ReflectionMethod::__set_state(array(
     'name' => '         ',
     'class' => 'MyClass',
  )),
)

@nikic
Copy link
Member

nikic commented Jan 18, 2021

This used 32-bit integers do avoid doubling the size of the parser stack elements...

Related to phpGH-5668
Also affects php 8.0 but this changes the
datastructures in public header files.
(binary ABI change)
PHP 7.4 wouldn't even parse a class with the below example.

1. Fix parsing semi-reserved tokens such as `namespace`
   with an offset that's larger than 2**32
2. Fix parsing some tokens when their length is larger than 2**32

Offset is relative to yy_start(the start of the file?)

Low priority because 4GB php files are extremely rare

```
php > ini_set('memory_limit', '12G');
php > eval(str_repeat(' ' , 2**32) . 'class MyClass{ public static function namespace() {}}');
php > MyClass::namespace();

Warning: Uncaught Error: Call to undefined method MyClass::namespace() in php shell code:1
Stack trace:
  thrown in php shell code on line 1
php > var_export((new ReflectionClass('MyClass'))->getMethods());
array (
  0 =>
  ReflectionMethod::__set_state(array(
     'name' => '         ',
     'class' => 'MyClass',
  )),
)
```
@TysonAndre
Copy link
Contributor Author

TysonAndre commented Jan 18, 2021

This can probably use 64-bit integers just for the offset and continue using 32-bit integers for the length?

Is it possible to reuse the memory for zend_string * elem->str (and possibly num) instead from stack elements?
For example, make it a union of zend_string * and size_t offset, and make the token type determine whether it's used as an ident or a zend_string?
(EDIT: missed that this was already a union: typedef union _zend_parser_stack_elem {)

emit_token_with_ident:
	if (PARSER_MODE()) {
		elem->ident.offset = SCNG(yy_text) - SCNG(yy_start);
		elem->ident.len = SCNG(yy_leng);
	}
	if (SCNG(on_event)) {
		SCNG(on_event)(ON_TOKEN, token, start_line, yytext, yyleng, SCNG(on_event_context));
	}
	return token;

@nikic
Copy link
Member

nikic commented Jan 19, 2021

This can probably use 64-bit integers just for the offset and continue using 32-bit integers for the length?

That makes no difference, due to padding.

A way to preserve the current size would be to store a char* to the start of the keyword, and then determine the length in zend_lex_tstring. After all, keywords are [a-zA-Z_], so it's not hard to determine the length from that.

TBH I have no idea how important it is to preserve the size, it's just why I did this at the time.

@nikic
Copy link
Member

nikic commented Jan 25, 2021

I committed a fix using the approach I mentioned in my previous comment in cc3e03c. Please tell me if you see any further issues with this.

@nikic nikic closed this Jan 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants