-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Fix #70962: xml_parse_into_struct strips embedded whitespace with XML… #7493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…_OPTION_SKIP_WHITE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this! The patch looks generally good to me. I left some comments on details to improve.
One thing that makes me wonder is the description of XML_OPTION_SKIP_WHITE
in the manual:
Whether to skip values consisting of whitespace characters.
That makes me think that not only the value
element of the respective $values
would be skipped, but the complete $values
element (i.e. for the given test script count($values) === 3
). However, changing this would be a BC break, and given the long-standing behavior, it might be better to clarify the documentation.
ext/xml/tests/bug70962.phpt
Outdated
--EXPECT-- | ||
<d> | ||
<e> | ||
bool(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: missing line break
ext/xml/xml.c
Outdated
} else { | ||
zend_string_release_ex(decoded_value, 0); | ||
} | ||
if (parser->lastwasopen) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use tabs (as four spaces) for indentation.
ext/xml/xml.c
Outdated
|
||
/* check if the current tag already has a value - if yes append to that! */ | ||
if ((myval = zend_hash_str_find(Z_ARRVAL_P(parser->ctag), "value", sizeof("value") - 1))) { | ||
int newlen = Z_STRLEN_P(myval) + ZSTR_LEN(decoded_value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int newlen = Z_STRLEN_P(myval) + ZSTR_LEN(decoded_value); | |
size_t newlen = Z_STRLEN_P(myval) + ZSTR_LEN(decoded_value); |
That was int
before this patch, but I think we should fix that right away.
I just noticed that empty "cdata" nodes have an empty "value" even if PS: it is actually different, see https://3v4l.org/ue9dU and https://3v4l.org/GTZtn. So, if |
Unfortunately I was not able to find any information regarding XML_OPTION_SKIP_WHITE behavior as well. I may try to work on skipping the whole value object if it makes sense, but it will introduce BC break as you have already mentioned. |
I suggest to leave this as is for now, but I noticed a change due to this PR regarding cdata element. Please see https://3v4l.org/L4vTC; this has no "cdata" element, but with this PR there is one now. I think that needs to fixed. This extension's test suite is definitely not comprehensive enough. :( |
Ok I will leave this PR as it is. Does it make sense to work on a separate PR for skipping an empty "cdata" element as it was before? |
I think this needs to be fixed in this PR; otherwise we would introduce a BC break. By the way, do you test locally with libxml2 or libexpat? I don't think that we have CI testing with libexpat, but the behavior may be different. |
Ok, will try to fix this issue in the current PR. Based on compile info it is libxml2. Sorry, I am really new in C and do not know how to change library and test using libexpat. I need to dig deeper into code to find a way. |
No need for you to check with libexpat; I can do that. :) |
Thank you :) |
Something like the following might be sufficient to solve the cdata issue: ext/xml/xml.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/ext/xml/xml.c b/ext/xml/xml.c
index 45fd06971d..273e523d94 100644
--- a/ext/xml/xml.c
+++ b/ext/xml/xml.c
@@ -948,7 +948,9 @@ void _xml_characterDataHandler(void *userData, const XML_Char *s, int len)
add_assoc_string(&tag, "tag", SKIP_TAGSTART(parser->ltags[parser->level-1]));
add_assoc_str(&tag, "value", decoded_value);
- add_assoc_string(&tag, "type", "cdata");
+ if (!parser->skipwhite) {
+ add_assoc_string(&tag, "type", "cdata");
+ }
add_assoc_long(&tag, "level", parser->level);
zend_hash_next_index_insert(Z_ARRVAL(parser->data), &tag); This diff won't cleanly apply, because it's based on an earlier version of this PR, but you can apply manually. :) PS: Nope, disregard this. Sorry for the noise. |
Maybe we should go for a simpler fix? ext/xml/xml.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/ext/xml/xml.c b/ext/xml/xml.c
index fd8aebe03a..df052b8910 100644
--- a/ext/xml/xml.c
+++ b/ext/xml/xml.c
@@ -900,7 +900,6 @@ void _xml_characterDataHandler(void *userData, const XML_Char *s, int len)
break;
}
}
- if (doprint || (! parser->skipwhite)) {
if (parser->lastwasopen) {
zval *myval;
@@ -915,7 +914,7 @@ void _xml_characterDataHandler(void *userData, const XML_Char *s, int len)
add_assoc_str(parser->ctag, "value", decoded_value);
}
- } else {
+ } else if (doprint || (! parser->skipwhite)) {
zval tag;
zval *curtag, *mytype, *myval;
@@ -949,7 +948,6 @@ void _xml_characterDataHandler(void *userData, const XML_Char *s, int len)
} else if (parser->level == (XML_MAXLEVEL + 1)) {
php_error_docref(NULL, E_WARNING, "Maximum depth exceeded - Results truncated");
}
- }
} else {
zend_string_release_ex(decoded_value, 0);
} And then fix the indentation, and |
Tag |
Oh, right! Just when I thought I understand what the function is currently doing, I have to learn more about it. :) |
Latest commit returns an original behavior for skipping empty "cdata" elements. I think all cases are covered now and there is no BC break anymore :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This is good now, and also works fine with libexpat. I'm going to commit.
@Flashwade1990 - This introduces a bug, a CRLF and multiple spaces gets added to the structure. using the following code snipet <title>Dynamic Allocations</title> Configuring_Account_Allocation$p = xml_parser_create('UTF-8'); echo $errcode; having the next tag on the same line as the ending CDATA tag is a workaround: |
@radu-barbu-sage, this is deliberate. |
…_OPTION_SKIP_WHITE.