Skip to content

Optimize size of ext/dom #12830

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions ext/dom/lexbor/lexbor/core/sbst.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,12 @@ extern "C" {
typedef struct {
lxb_char_t key;

void *value;
size_t value_len;
lxb_char_t value[6];
unsigned char value_len;

size_t left;
size_t right;
size_t next;
unsigned short left;
unsigned short right;
unsigned short next;
}
lexbor_sbst_entry_static_t;

Expand Down
9,442 changes: 4,721 additions & 4,721 deletions ext/dom/lexbor/lexbor/html/tokenizer/res.h

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion ext/dom/lexbor/lexbor/html/tokenizer/state.c
Original file line number Diff line number Diff line change
Expand Up @@ -1815,7 +1815,7 @@ lxb_html_tokenizer_state_char_ref_named(lxb_html_tokenizer_t *tkz,
goto done;
}

if (entry->value != NULL) {
if (entry->value[0] != 0) {
tkz->entity_end = (tkz->pos + (data - begin)) - tkz->start;
tkz->entity_match = entry;
}
Expand Down
2 changes: 2 additions & 0 deletions ext/dom/lexbor/lexbor/tag/res.h
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,7 @@ static const lxb_tag_data_t lxb_tag_res_data_default[LXB_TAG__LAST_ENTRY] =
{{.u.short_str = "xmp", .length = 3, .next = NULL}, LXB_TAG_XMP, 1, true}
};

#if 0
static const lxb_tag_data_t lxb_tag_res_data_upper_default[LXB_TAG__LAST_ENTRY] =
{
{{.u.short_str = "#UNDEF", .length = 6, .next = NULL}, LXB_TAG__UNDEF, 1, true},
Expand Down Expand Up @@ -423,6 +424,7 @@ static const lxb_tag_data_t lxb_tag_res_data_upper_default[LXB_TAG__LAST_ENTRY]
{{.u.short_str = "WBR", .length = 3, .next = NULL}, LXB_TAG_WBR, 1, true},
{{.u.short_str = "XMP", .length = 3, .next = NULL}, LXB_TAG_XMP, 1, true}
};
#endif

static const lexbor_shs_entry_t lxb_tag_res_shs_data_default[] =
{
Expand Down
2 changes: 2 additions & 0 deletions ext/dom/lexbor/lexbor/tag/tag.c
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ lxb_tag_data_by_name(lexbor_hash_t *hash, const lxb_char_t *name, size_t len)
lexbor_hash_search_lower, name, len);
}

#if 0
const lxb_tag_data_t *
lxb_tag_data_by_name_upper(lexbor_hash_t *hash,
const lxb_char_t *name, size_t len)
Expand All @@ -114,6 +115,7 @@ lxb_tag_data_by_name_upper(lexbor_hash_t *hash,
return (const lxb_tag_data_t *) lexbor_hash_search(hash,
lexbor_hash_search_upper, name, len);
}
#endif

/*
* No inline functions for ABI.
Expand Down
12 changes: 10 additions & 2 deletions ext/dom/lexbor/patches/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,23 @@ This contains the following patch files in mailbox format.
A PHP specific patch to patch utilities and data structure to be able to generate smaller lookup tables.
This patch won't be upstreamed because it breaks generality of those data structures, i.e. it only works
because we only use it for character encoding.
* 0001-Remove-unused-upper-case-tag-static-data.patch
A PHP specific patch to remove unused upper case tag static data. This shrinks the static data size.
* 0001-Shrink-size-of-static-binary-search-tree.patch
A PHP specific patch to shrink the size of the static binary search tree for entities.
This shrinks the static data size and reduces data cache pressure.

**Note** for this patch the utilities to generate the tables are also patched.
Make sure to apply on a fresh Lexbor clone and run (in `lexbor/utils/encoding`): `python3 single-byte.py` and `python3 multi-byte.py` to generate the tables.
Also run `python3 tokenizer_entities_bst.py` to generate the static binary search tree for entities.

## How to apply

* cd into `ext/dom/lexbor_bridge/lexbor`
* cd into `ext/dom/lexbor/lexbor`
* `git am -3 ../patches/0001-Expose-line-and-column-information-for-use-in-PHP.patch`
* `git am -3 ../patches/0001-Track-implied-added-nodes-for-options-use-in-PHP.patch`
* `git am -3 ../patches/0001-Patch-out-CSS-parser.patch`
* `git am -3 ../patches/0001-Patch-utilities-and-data-structure-to-be-able-to-gen.patch`
* `git reset HEAD~4` # 4 is the number of commits created by the above commands
* `git am -3 ../patches/0001-Remove-unused-upper-case-tag-static-data.patch`
* `git am -3 ../patches/0001-Shrink-size-of-static-binary-search-tree.patch`
* `git reset HEAD~6` # 6 is the number of commits created by the above commands