Description
As @bradfitz noted in the reviews of the original API, the Decoder.ReadToken
API is a garbage factory. Although, as @rsc noted at the time, "a clean API ... is more important here. I expect people to use it to get to the position they want in the stream and then call Decode", the inefficiency is a problem in practice for anyone that wishes to use the encoding/json tokenizer as a basis for some other kind of decoder.
Dave Cheney's "Building a high performance JSON parser" details some of the issues involved. He comes to the conclusion that the interface-based nature of json.Token is a fundamental obstacle. I like the current interface-based API, but it does indeed make it impossible to return arbitrary tokens without creating garbage. Dave suggests a new Scanner
API, somewhat more complex, that is also not backwardly compatible with the current API in encoding/json.
I propose instead that the following method be added to the encoding/json
package:
// TokenBytes is like Token, except that for strings and numbers, it returns
// a static Token value with the actual data payload in the []byte parameter,
// which is only valid until the next call to Token or TokenBytes or Decode.
// For strings, the returned Token will be ""; for a number, it will be
// Number("0"); for all other kinds of token, the Token will be returned as by
// Token method and the []byte value will be nil.
//
// This is more efficient than using Token because it avoids the
// allocations required by that API.
func (dec *Decoder) TokenBytes() (Token, []byte, error)
Token
can be implemented in terms of TokenBytes
as follows:
func (dec *Decoder) Token() (Token, error) {
tok, data, err := dec.TokenBytes()
if err != nil || data == nil {
return tok, err
}
switch tok {
case "":
return string(data)
case Number(0):
return Number(data)
}
panic("unreachable")
}
Discussion
This proposal relies on the observation that the Decoder.Token
API only generates garbage for two kinds of tokens: numbers and strings. For all other token types, no garbage need be generated, as small numbers (json.Delim
and bool) do not incur an allocation when boxed in an interface.
It maintains the current API as-is. Users can opt-in to the new API if they require efficiency at some risk of incorrectness (the caller could hold onto the data slice after the next call to Decode). The cognitive overhead of TokenBytes
is arguably low because of its similarity to the existing API.
If this proposal is accepted, an Encoder.EncodeTokenBytes
could easily be added to provide garbage-free streaming JSON generation too.
Metadata
Metadata
Assignees
Type
Projects
Status