Skip to content

Fix #77686: Immediately remove ID from DOMDocument when removing child element #6936

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

trowski
Copy link
Member

@trowski trowski commented May 3, 2021

In the code below, the current behavior of the last two calls to DOMDocument::getElementById() is returning the DOMElement objects that have been removed from the document unless the DOMElement object is destroyed.

With this patch, null is returned from getElementById() immediately after an element is removed from the document even if a reference remains to the removed DOMElement object.

$doc = new DOMDocument;
$doc->loadHTML('<html><body id="x"><div id="y"><span id="z"></span></div></body></html>');
$body = $doc->getElementById('x');
$div = $doc->getElementById('y');
$span = $doc->getElementById('z');
$body->removeChild($div);

// DOMElement now, NULL after patch
var_dump($doc->getElementById('y'));
var_dump($doc->getElementById('z'));

This similar code below demonstrates the current inconsistency, where null is returned for the last two calls to DOMDocument::getElementById() because no reference is kept to the removed DOMElement.

$doc = new DOMDocument;
$doc->loadHTML('<html><body id="x"><div id="y"><span id="z"></span></div></body></html>');
$body = $doc->getElementById('x');
$body->removeChild($doc->getElementById('y'));

// NULL
var_dump($doc->getElementById('y'));
var_dump($doc->getElementById('z'));

Prior a removed node was still returned from DOMDocument::getElementById() until the DOMElement object was destroyed. Now null is returned from getElementById() even if a reference remains to the removed DOMElement object.
@nikic
Copy link
Member

nikic commented May 4, 2021

cc @beberlei

@trowski
Copy link
Member Author

trowski commented May 4, 2021

I need to dig into this a little further to see what's going on, as re-appending the child then does not find the element by Id. Seems PHP is associating the ID with the document at element creation, however, I'm not sure this is the correct behavior.

JavaScript behavior:

let doc = document.implementation.createHTMLDocument();
let body = doc.getElementsByTagName('body')[0];
let div = doc.createElement('div');
div.setAttribute('id', 'y');

console.log(doc.getElementById('y')); // null

body.appendChild(div);

console.log(doc.getElementById('y').tagName); // "DIV"

body.removeChild(div);

console.log(doc.getElementById('y')); // null

Compared to the current PHP behavior:

$doc = new DOMDocument;
$doc->loadHTML('<html><body></body></html>');
$body = $doc->getElementsByTagName('body')[0];
$div = $doc->createElement('div');
$div->setAttribute('id', 'y');

var_dump($doc->getElementById('y')->tagName); // "div"

$body->appendChild($div);

var_dump($doc->getElementById('y')->tagName); // "div"

$body->removeChild($div);

var_dump($doc->getElementById('y')->tagName); // "div"

@trowski trowski marked this pull request as draft May 4, 2021 15:11
@cmb69
Copy link
Member

cmb69 commented May 4, 2021

Seems PHP is associating the ID with the document at element creation, […]

This might be intrinsic libxml2 behavior. :(

Anyhow, we should also check the behavior wrt. duplicate IDs (cf. https://bugs.php.net/79701).

@trowski
Copy link
Member Author

trowski commented May 4, 2021

Yes, I think it is intrinsic to libxml2. I was hoping to fix the behavior noted in this test in a Wikimedia project. At first glance it seemed simple, but digging further into the code it looks like libxml2 is doing this automatically. I could match JavaScript with a lot of extra work, but that's probably not a good idea both for maintainability and backward compatibility.

Bug 79701 looks like a completely separate issue. Perhaps setIdAttribute should be throwing in that case?

@beberlei
Copy link
Contributor

beberlei commented May 4, 2021

I am honestly still a bit overwhelmed by libxml inner workings and lucky to have found one corner that I did master, there are a bunch of thing sin ext/dom just to make libxml happy, for example DOMNameSpaceNode not being a DOMNode and others.

@trowski
Copy link
Member Author

trowski commented May 5, 2021

@beberlei Interesting that you mention DOMNamespaceNode not extending DOMNode, since someone else recently made a pull request to change exactly that: #6939.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants