UTF-8 corruption in `\Dom\HTMLDocument`

### Description

The following code:

```php
<?php
$Repeated = str_repeat( '–', 4096 );
//$Repeated = str_repeat( '😏', 4096 );
$Data = '<!DOCTYPE HTML><html>' . $Repeated . '</html>';
$Document = \Dom\HTMLDocument::createFromString( $Data, 0, 'UTF-8' );

echo $Document->saveHTML();
// var_dump($Document->body->textContent);
```

The resulting string contains random invalid UTF-8 sequences like with the � character. With the repeated emoji, emojis become corrupted. If you repeat the string for longer, there are more corrupted bytes in random places.

I initially spotted this bug when parsing a real HTML document and used `textContent` (innerHTML produces the same issue) on an element I found with xpath.

![image](https://github.com/user-attachments/assets/03b9f13c-708c-4fa6-9d6e-4143b7519adc)


### PHP Version

PHP 8.4.2

### Operating System

Windows 11 and Debian 12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UTF-8 corruption in `\Dom\HTMLDocument` #17481

Description

PHP Version

Operating System

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UTF-8 corruption in \Dom\HTMLDocument #17481

Description

Description

PHP Version

Operating System

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

UTF-8 corruption in `\Dom\HTMLDocument` #17481