proposal: encoding/json: garbage-free reading of tokens

As @bradfitz noted in the reviews of the original API, the `Decoder.ReadToken` API [is a garbage factory](https://go-review.googlesource.com/c/go/+/11651/2/src/encoding/json/stream.go#279). Although, as @rsc noted at the time, "a clean API ... is more important here. I expect people to use it to get to the position they want in the stream and then call Decode", the inefficiency is a problem in practice for anyone that wishes to use the encoding/json tokenizer as a basis for some other kind of decoder.

Dave Cheney's  ["Building a high performance JSON parser"](https://dave.cheney.net/high-performance-json.html) details some of the issues involved. He comes to the conclusion that the interface-based nature of [json.Token](https://golang.org/pkg/encoding/json/#Token) is a fundamental obstacle. I like the current interface-based API, but it does indeed make it impossible to return arbitrary tokens without creating garbage. Dave suggests [a new `Scanner` API](https://dave.cheney.net/high-performance-json.html#_scanning), somewhat more complex, that is also not backwardly compatible with the current API in encoding/json.

I propose instead that the following method be added to the `encoding/json` package:

```
// TokenBytes is like Token, except that for strings and numbers, it returns
// a static Token value with the actual data payload in the []byte parameter,
// which is only valid until the next call to Token or TokenBytes or Decode.
// For strings, the returned Token will be ""; for a number, it will be
// Number("0"); for all other kinds of token, the Token will be returned as by
// Token method and the []byte value will be nil.
//
// This is more efficient than using Token because it avoids the
// allocations required by that API.
func (dec *Decoder) TokenBytes() (Token, []byte, error)
```

`Token` can be implemented in terms of `TokenBytes` as follows:

```
func (dec *Decoder) Token() (Token, error) {
	tok, data, err := dec.TokenBytes()
	if err != nil || data == nil {
		return tok, err
	}
	switch tok {
	case "":
		return string(data)
	case Number(0):
		return Number(data)
	}
	panic("unreachable")
}
```

## Discussion

This proposal relies on the observation that the `Decoder.Token` API only generates garbage for two kinds of tokens: numbers and strings. For all other token types, no garbage need be generated, as small numbers (`json.Delim` and bool) do not incur an allocation when boxed in an interface.

It maintains the current API as-is. Users can opt-in to the new API if they require efficiency at some risk of incorrectness (the caller could hold onto the data slice after the next call to Decode). The cognitive overhead of `TokenBytes` is arguably low because of its similarity to the existing API.

If [this proposal](https://github.com/golang/go/issues/40127) is accepted, an `Encoder.EncodeTokenBytes` could easily be added to provide garbage-free streaming JSON generation too.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

proposal: encoding/json: garbage-free reading of tokens #40128

Discussion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

proposal: encoding/json: garbage-free reading of tokens #40128

Description

Discussion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions