Skip to content

Commit 07a5e57

Browse files
committed
Replace sax with @rgrove/parse-xml
Closes GH-3. This commit switches from `sax`, a lax parser, to `parse-xml`, a proper and fast parser. This does mean that invalid XML can no longer be passed, so it’s a breaking change (for example, all documents now require a root element, doctypes need to be capitalized, and CDATA around the root element is no longer okay). It also means that whitespace around the root element is no longer present in the tree: XML requires stripping that. It does mean that positional info is now much better, processing instructions are supported, and the size is cut by a lot.
1 parent 2c32bd5 commit 07a5e57

File tree

35 files changed

+812
-1014
lines changed

35 files changed

+812
-1014
lines changed

lib/index.js

Lines changed: 248 additions & 548 deletions
Large diffs are not rendered by default.

package.json

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -38,10 +38,9 @@
3838
"index.js"
3939
],
4040
"dependencies": {
41-
"@types/sax": "^1.0.0",
42-
"@types/unist": "^2.0.0",
41+
"@rgrove/parse-xml": "^4.1.0",
4342
"@types/xast": "^1.0.0",
44-
"sax": "^1.0.0",
43+
"vfile-location": "^4.0.0",
4544
"vfile-message": "^3.0.0"
4645
},
4746
"devDependencies": {
@@ -75,26 +74,25 @@
7574
"xo": {
7675
"prettier": true,
7776
"rules": {
78-
"unicorn/prefer-code-point": "off"
77+
"unicorn/prefer-switch": "off"
7978
},
8079
"overrides": [
8180
{
8281
"files": "test/**/*.js",
8382
"rules": {
84-
"no-await-in-loop": 0
83+
"no-await-in-loop": "off"
8584
}
8685
}
8786
]
8887
},
8988
"remarkConfig": {
9089
"plugins": [
91-
"preset-wooorm"
90+
"remark-preset-wooorm"
9291
]
9392
},
9493
"typeCoverage": {
9594
"atLeast": 100,
9695
"detail": true,
97-
"strict": true,
98-
"ignoreCatch": true
96+
"strict": true
9997
}
10098
}

readme.md

Lines changed: 10 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -27,15 +27,14 @@
2727

2828
## What is this?
2929

30-
This package is a utility that takes XML input and turns it into a [xast][]
31-
syntax tree.
32-
It uses [`sax`][sax], which turns XML into events, while it turns those events
33-
into nodes.
30+
This package is a utility that takes serialized XML as input and turns it into
31+
a [xast][] syntax tree.
32+
It uses [`@rgrove/parse-xml`][parse-xml], which is a good and fast XML parser,
33+
and turns its results into xast.
3434

3535
## When should I use this?
3636

37-
If you want to handle syntax trees, use this.
38-
Use [`sax`][sax] itself instead when you want to do other things.
37+
If you want to use xast syntax trees, use this.
3938

4039
The utility [`xast-util-to-xml`][xast-util-to-xml] does the inverse of this
4140
utility.
@@ -84,7 +83,7 @@ import {fromXml} from 'xast-util-from-xml'
8483

8584
const tree = fromXml(await fs.readFile('example.xml'))
8685

87-
console.log(tree)
86+
console.dir(tree, {depth: null})
8887
```
8988

9089
…now running `node example.js` yields (positional info removed for brevity):
@@ -121,15 +120,14 @@ console.log(tree)
121120
},
122121
{type: 'text', value: '\n'}
123122
]
124-
},
125-
{type: 'text', value: '\n'}
123+
}
126124
]
127125
}
128126
```
129127

130128
## API
131129

132-
This package exports the identifier [`fromXml`][fromxml].
130+
This package exports the identifier [`fromXml`][api-from-xml].
133131
There is no default export.
134132

135133
### `fromXml(value)`
@@ -236,8 +234,8 @@ abide by its terms.
236234

237235
[root]: https://github.com/syntax-tree/xast#root
238236

239-
[sax]: https://github.com/isaacs/sax-js
237+
[parse-xml]: https://github.com/rgrove/parse-xml
240238

241239
[xast-util-to-xml]: https://github.com/syntax-tree/xast-util-to-xml
242240

243-
[fromxml]: #fromxmlvalue
241+
[api-from-xml]: #fromxmlvalue

test/fixtures/attribute/index.json

Lines changed: 12 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -109,22 +109,18 @@
109109
"offset": 144
110110
}
111111
}
112+
}
113+
],
114+
"position": {
115+
"start": {
116+
"line": 1,
117+
"column": 1,
118+
"offset": 0
112119
},
113-
{
114-
"type": "text",
115-
"value": "\n",
116-
"position": {
117-
"start": {
118-
"line": 10,
119-
"column": 8,
120-
"offset": 144
121-
},
122-
"end": {
123-
"line": 11,
124-
"column": 1,
125-
"offset": 145
126-
}
127-
}
120+
"end": {
121+
"line": 11,
122+
"column": 1,
123+
"offset": 145
128124
}
129-
]
125+
}
130126
}

0 commit comments

Comments
 (0)