Improve compression of pickled quotes

We need to improve the encoding performed in [TastyString](https://github.com/lampepfl/dotty/blob/master/compiler/src/dotty/tools/dotc/core/tasty/TastyString.scala). With the encoding found in the [discussion](https://github.com/lampepfl/dotty/pull/3662#discussion_r161507417).

> @lrytz 5  days ago  •  Owner
> It took me a while to find it.. Need to clean this up / document. Method parseScalaSigBytes calls ConstantPool.getBytes which goes through ByteCodecs.decode.
> 
> The encoding is explained here http://www.scala-lang.org/old/sites/default/files/sids/dubochet/Mon,%202010-05-31,%2015:25/Storage%20of%20pickled%20Scala%20signatures%20in%20class%20files.pdf
> 
> first map all 8-bit bytes to 7 bits (shifting the rest)
> then increment all by 1 (in 7 bits), so 0x7f becomes 0x00
> then encode 0x00 as 0xc0 0x80, which is an overlong utf 8 encoding for zero. it's what the jvm classfile spec uses to avoid having 0x00 in strings. it's called "modified utf 8".
> the reason for the incrementing by 1 that 0x7f is expected to be less common than 0x00, so the two byte encoding hits less often.
> 
> The confusing part is that the class ScalaSigBytes used in the backend to encode the signature uses ByteCodecs.encode8to7, but does the +1 itself. It doesn't need to map 0x00 to the two byte version because ASM will do it when writing the annotation to the classfile. However, in the unpickler, we don't use ASM to read the annotation, but just get the bytes from the classfile directly. So there we'll see the two byte encoding. ByteCodecs.decode does the necessary work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve compression of pickled quotes #3877

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve compression of pickled quotes #3877

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions