Skip to content

Commit 1d21f83

Browse files
improve
1 parent 87b6313 commit 1d21f83

File tree

1 file changed

+82
-62
lines changed

1 file changed

+82
-62
lines changed

website/blog/typed-napi.md

Lines changed: 82 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -122,13 +122,9 @@ Thanks to Tree-Sitter's design, we can leverage this rich type information to bu
122122
123123
Our new API follows a progressive enhancement approach to type safety:
124124
125-
**Preserve untyped AST access**
125+
**Preserve untyped AST access**. The existing untyped API remains available by default, ensuring backward compatibility
126126
127-
The existing untyped API remains available by default, ensuring backward compatibility
128-
129-
**Optional type safety on demand**
130-
131-
Users can opt into typed AST nodes either manually or automatically for enhanced type checking and autocompletion
127+
**Optional type safety on demand**. Users can opt into typed AST nodes either manually or automatically for enhanced type checking and autocompletion
132128
133129
However, it is a bumpy ride to transition to a new typed API via the path of Tree-sitter's static type.
134130
@@ -138,28 +134,31 @@ Second, the JSON contains a lot of unnamed kinds, which are not useful to users.
138134
139135
Finally, as mentioned earlier, the JSON contains alias types. We need to resolve the alias type to its concrete type, which is also covered in the next section.
140136
141-
## Define Type
137+
## Define Types
138+
139+
New API's core involves several key new types and extensions to existing types.
142140
143-
### Give `SgNode` its type
141+
### Let `SgNode` Have Type
144142
145-
We add two type parameters to `SgNode` to represent the language type map and the node's kind.
146-
`SgNode<M, K>` is the main type in the new API. It is a generic type that represents a node with kind `K` of language type map `M`. By default, it is a union of all possible kinds of nodes.
143+
`SgNode` class, the cornerstone of our new API, now accepts two new optional type parameters.
147144
148145
```typescript
149-
class SgNode<M extends TypesMap, K extends keyof M = Kinds<M>> {
146+
class SgNode<M extends TypesMap, K extends Kinds<M> = Kinds<M>> {
150147
kind: K
151148
fields: M[K]['fields'] // demo definition, real one is more complex
152149
}
153150
```
154151

155-
It provides a **correct** interface for an AST node in a specific language. While it is still **robust** enough to not trigger compiler error when no type information is available.
152+
It represents a node in a language with type map `M` that has a specific kind `K`. e.g. `SgNode<TypeScript, "function_declaration">` means a function declaration node in TypeScript. When used without a specific kind parameter, `SgNode` defaults to accepting any valid node kind in the language.
153+
154+
`SgNode` provides a **correct** AST interface in a specific language. While at the same time, it is still **robust** enough to not trigger compiler error when no type information is available.
156155

157156

158157
### `ResolveType<M, T>`
159158

160-
TreeSitter's type alias is helpful to reduce the generated JSON file size but it is not useful to users because the alias is never directly used as a node's kind nor is used as `kind` in ast-grep rule. For example, `declaration` mentioned above can never be used as `kind` in ast-grep rule.
159+
While Tree-sitter's type aliases help keep the JSON type definitions compact, they present a challenge: these aliases never appear as actual node kinds in ast-grep rules.
161160

162-
We need to use a type alias to **correctly** resolve the alias type to its concrete type.
161+
To handle this, we created `ResolveType` to **correctly** map aliases to their concrete kinds:
163162

164163
```typescript
165164
type ResolveType<M, T extends keyof M> =
@@ -168,122 +167,143 @@ type ResolveType<M, T extends keyof M> =
168167
: T
169168
```
170169
170+
This type recursively resolves aliases until it reaches actual node types that developers work with.
171+
171172
### `Kinds<M>`
172173
173-
Having a collection of possible AST node kinds is awesome, but it is sometime too clumsy to use a big string literal union type.
174-
Using a type alias to **concisely** represent all possible kinds of nodes is a huge UX improvement.
174+
Having access to all possible AST node types is powerful, but it is unwieldy to work with large string literal union types. It can be a huge UX improvement to use a type alias to **concisely** represent all possible kinds of nodes.
175175
176-
Also, TreeSitter's static type contains a lot of unnamed kinds, which are not useful to users. Including them in the union type is too noisy. We need to allow users to opt-in to use the kind, and fallback to a plain `string` type, creating a more **robust** API.
176+
Additionally, Tree-sitter's static type contains a bunch of noisy unnamed kinds. But excluding them from the union type can lead to a incomplete type signature. ast-grep instead bundle them into a plain `string` type, creating a more **robust** API.
177177
178178
```typescript
179-
type Kinds<M> = keyof M & LowPriorityString
179+
type Kinds<M> = ResolveType<M, keyof M> & LowPriorityString
180180
type LowPriorityString = string & {}
181181
```
182182
183-
The above type is a linient string type that is compatible with any string type. But it also uses a well-known trick to take advantage of TypeScript's type priority to prefer the `keyof M` type in completion over the `string & {}` type. To make it more self-explanatory, the `stirng & {}` type is aliased to `LowPriorityString`.
183+
The above type is a linient string type that is compatible with any string type. But it also uses a [well-known trick](https://stackoverflow.com/a/61048124/2198656) to take advantage of TypeScript's type priority to prefer the `ResolveType` in completion over the `string & {}` type.
184+
184185
185-
Problem? open-ended union is not [well](https://github.com/microsoft/TypeScript/issues/33471) [supported](https://github.com/microsoft/TypeScript/issues/26277) in TypeScript.
186+
We alias `string & {}` to `LowPriorityString` to make the code's intent clearer. This approach creates a more intuitive developer experience, though it does run into [some limitations](https://github.com/microsoft/TypeScript/issues/33471) with TypeScript's handling of [open-ended unions](https://github.com/microsoft/TypeScript/issues/26277).
186187
187-
We need other tricks to make it work better. Introducing `RefineNode` type.
188+
We need other tricks to address these limitations. Introducing `RefineNode` type.
188189
189190
### Bridging general nodes and specific nodes via `RefineNode`
190191
191-
There are two categories of nodes:
192-
* general `string`ly typed SgNode
193-
* precisely typed SgNode
192+
A key challenge in our type system was handling two distinct categories of nodes:
194193
195-
general node is like the untyped old API (but with better completion)
196-
precisely typed node is a union type of all possible kinds of nodes
194+
1. **General Nodes**: String-based typing (like our original API, but with enhanced completion), `SgNode<M, Kinds<M>>`.
195+
2. **Specific Nodes**: Precisely typed nodes with known kinds, `SgNode<M, 'specific_kind'>`.
197196
198-
The previous general node is typed as `SgNode<M, Kinds<M>>`, the later is typed as `SgNode<M, 'specific_kind'>`.
197+
When dealing with nodes that could be several specific kinds, we faced an interesting type system challenge. Consider these two approaches:
199198
200-
when it comes to a node that can have several specific kinds, it is better to use a union type of all possible kinds of nodes.
199+
```typescript
200+
// Approach 1: Union in the type parameter
201+
let single: SgNode<'expression' | 'type'>
201202

202-
Which kind of union should we use?
203+
// Approach 2: Union of specific nodes
204+
let union: SgNode<'expression'> | SgNode<'type'>
205+
```
203206
204-
Note `SgNode<'expression' | 'type'>` is different from `SgNode<'expression'> | SgNode<'type'>`
205-
TypeScript has difficulty in narrowing the previous type, because it not safe to assume the former is equivalent to the later.
207+
These approaches behave differently in TypeScript, for a [good reason](https://x.com/hd_nvim/status/1868706176281854151):
206208
207209
```typescript
208210
let single: SgNode<'expression' | 'type'>
209211
if (single.kind === 'expression') {
210-
single // Still SgNode<'expression' | 'type'>, not narrowed
212+
single // Remains SgNode<'expression' | 'type'> - not narrowed!
211213
}
214+
212215
let union: SgNode<'expression'> | SgNode<'type'>
213216
if (union.kind === 'expression') {
214-
union // SgNode<'expression'>, narrowed
217+
union // Successfully narrowed to SgNode<'expression'>
215218
}
216219
```
217220

218-
However, `SgNode` is covariant in the kind parameter and this means it is okay.
219-
it is general okay to distribute the type constructor over union type if the parameter is covariant.
220-
but TypeScript does not support this feature.
221-
222-
So ast-grep uses a trick via the type `RefineNode<M, K>` to let you refine the former one to the later one.
221+
`SgNode` is technically covariant in its kind parameter, meaning it's safe to distribute the type constructor over unions. However TypeScript doesn't support this automatically. (We will not go down the rabbit hole of type constructor variance here. But interested readers can check out [this wiki](https://en.wikipedia.org/wiki/Covariance_and_contravariance_(computer_science)).)
223222

224-
If we don't have confidence to narrow the type, that is, the union type `K` contains a constituent of `string` type, it is equivalent to `SgNode<M, Kinds<M>>`.
225-
Otherwise, we can refine the node to a union type of all possible kinds of nodes.
223+
To bridge this gap, we introduced the `RefineNode` type:
226224

227225
```typescript
228-
type RefineNode<M, K> = string extends K ? SgNode<M, K> :
229-
K extends keyof M ? SgNode<M, K> : never // this conditional type unpack the string union to Node union
226+
type RefineNode<M, K extends Kinds<M>> =
227+
type RefineNode<M, K> = string extends K ? SgNode<M, K> : // one SgNode
228+
K extends keyof M ? SgNode<M, K> : never // distribute over union
230229
```
231-
it is like biome / rowan's API where you can refine the node to a specific kind.
232230
233-
https://github.com/biomejs/biome/blob/09a04af727b3cdba33ac35837d112adb55726add/crates/biome_rowan/src/ast/mod.rs#L108-L120
231+
This utility type provides two key behaviors:
232+
1. When `K` includes a string type, it preserves the general node behavior
233+
2. Otherwise, it refines the node into a union of specific types, using TypeScripts' [distributive conditional types](https://www.typescriptlang.org/docs/handbook/2/conditional-types.html#distributive-conditional-types).
234234
235-
Again, having both untyped and typed API is a good trade-off between **correct** and **robust** type checking. You want the compiler to infer as much as possible if a clue of the node type is given, but you also want to allow writing code without type.
235+
This approach, inspired by [Biome's Rowan API](https://github.com/biomejs/biome/blob/09a04af727b3cdba33ac35837d112adb55726add/crates/biome_rowan/src/ast/mod.rs#L108-L120), achieves our dual goals: it remains **correct** by preserving proper type relationships and stays **robust** by gracefully handling both typed and untyped usage.
236236
237+
This hybrid approach gives developers the best of both worlds: strict type checking when types are known, with the flexibility to fall back to string-based typing when needed.
237238
238239
## Refine Type
239240
240241
Now let's talk about how to refine the general node to a specific node in ast-grep/napi.
241-
242-
Both manual and automatic refinement are **concise** and idiomatic in TypeScript.
242+
We've implemented two concise and idiomatic approaches in TypeScript: manual and automatic refinement.
243243
244244
### Refine Node, Manually
245245
246-
You can do runtime checking via `sgNode.is("kind")`
246+
#### Runtime Type Checking
247+
248+
The first manual approach uses runtime verification through the `is` method:
249+
247250
```typescript
248251
class SgNode<M, K> {
249252
is<T extends K>(kind: T): this is SgNode<M, T>
250253
}
251254
```
252255

253-
It can offer one time type narrowing
256+
This enables straightforward type narrowing:
254257

255258
```typescript
256259
if (sgNode.is("function_declaration")) {
257260
sgNode.kind // narrow to 'function_declaration'
258261
}
259262
```
260263

261-
Another way is to provide an optional type parameter to the traversal method to refine the node to a specific kind, in case you are confident that the node is always of a specific kind and want to skip runtime check.
264+
#### Type Parameter Specification
262265

263-
This is like the `document.querySelector<T>` method in the [DOM API](https://www.typescriptlang.org/docs/handbook/dom-manipulation.html#the-queryselector-and-queryselectorall-methods). It returns a general `Element` type, but you can refine it to a specific type like `HTMLDivElement` by providing generic argument.
266+
Another manual approach lets you explicitly specify node types through type parameters. This is particularly useful when you're certain about a node's kind and want to skip runtime checks for better performance.
264267

265-
For example `sgNode.parent<"program">()`. This will refine the node to a specific kind `SgNode<TS, "program">`.
268+
This pattern may feel familiar if you've worked with the [DOM API](https://www.typescriptlang.org/docs/handbook/dom-manipulation.html#the-queryselector-and-queryselectorall-methods)'s `querySelector<T>`. Just as `querySelector` can be refined from a general `Element` to a specific `HTMLDivElement`, we can refine our nodes:
269+
270+
```typescript
271+
sgNode.parent<"program">() // Returns SgNode<TS, "program">
272+
```
266273

267-
This uses the interesting overloading feature of TypeScript
274+
275+
The type parameter approach uses an interesting overloading signature
268276

269277
```typescript
270278
interface NodeMethod<M, K> {
271-
(): SgNode
272-
<T extends K>(): RefineNode<M, T>
279+
(): SgNode<M> // Untyped version
280+
<T extends K>(): RefineNode<M, T> // Typed version
273281
}
274282
```
275-
If no type is provided, it returns a general node, `SgNode<M>`.
276-
If a type is provided, it returns a specific node, `SgNode<M, K>`.
277283

278-
The reason why we use two overloading signatures here is to distinguish the two cases. If we use a single generic signature, TypeScript will always return the single version `SgNode<M, K1|K2>` or always returns a union of different `SgNode`s.
284+
If no type is provided, it returns a general node, `SgNode<M>`. If a type is provided, it returns a specific node, `SgNode<M, K>`.
285+
286+
This dual-signature typing avoids the limitations of a single generic signature, which would either always return `SgNode<M, K1|K2>` or always produce a union of `SgNode`s.
287+
288+
#### Choosing the Right Type
289+
290+
When should you use each manual refinement method? Here are some guidelines:
291+
292+
✓ Use `is()` when:
293+
* You need runtime type check
294+
* Node types might vary
295+
* Type safety is crucial
279296

297+
✓ Use type parameters when:
280298

281-
:::tip When to use type parameter and when `is`?
299+
* You're completely certain of the node type
300+
* Performance is critical
301+
* The node type is fixed
282302

283-
If you cannot guarantee the node kind and want to do runtime check, use `is` method.
303+
:::tip Safety Tip
284304

285-
If you are 100% sure about the node kind and want to avoid the runtime check overhead, use type parameter.
286-
Note this option can break type safety if misused. This command can help you to audit.
305+
Be cautious with type parameters as they bypass runtime checks. It can break type safety if misused.
306+
You can audit their usage with the command:
287307

288308
```bash
289309
ast-grep -p '$NODE.$METHOD<$K>($$$)'

0 commit comments

Comments
 (0)