Skip to content

Commit a24f4db

Browse files
Mark Freemangopherbot
Mark Freeman
authored andcommitted
internal/pkgbits, cmd/compile/internal/noder: document string section
To understand this change, we begin with a short description of the UIR file format. Every file is a header followed by a series of sections. Each section has a kind, which determines the type of elements it contains. An element is just a collection of one or more primitives, as defined by package pkgbits. Strings have their own section. Elements in the string section contain only string primitives. To use a string, elements in other sections encode a reference to the string section. To illustrate, consider a simple file which exports nothing at all. package p In the meta section, there is an element representing a package stub. In that package stub, a string ("p") represents both the path and name of the package. Again, these are encoded as references. To manage references, every element begins with a reference table. Instead of writing the bytes for "p" directly, the package stub encodes an index in this reference table. At that index, a pair of numbers is stored, indicating: 1. which section 2. which element index within the section Effectively, elements always use *2* layers of indirection; first to the reference table, then to the bytes themselves. With some minor hand-waving, an encoding for the above package is given below, with (S)ections, (E)lements and (P)rimitives denoted. + Header | + Section Ends // each section has 1 element | | + 1 // String is elements [0, 1) | | + 2 // Meta is elements [1, 2) | + Element Ends | | + 1 // "p" is bytes [0, 1) | | + 6 // stub is bytes [1, 6) + Payload | + (S) String | | + (E) String | | | + (P) String { byte } 0x70 // "p" | + (S) Meta | | + (E) Package Stub | | | + Reference Table | | | | + (P) Entry Count uvarint 1 // there is a single entry | | | | + (P) 0th Section uvarint 0 // to String, 0th section | | | | + (P) 0th Index uvarint 0 // to 0th element in String | | | + Internals | | | | + (P) Path uvarint 0 // 0th entry in table | | | | + (P) Name uvarint 0 // 0th entry in table Note that string elements do not have reference tables like other elements. They behave more like a primitive. As this is a bit complicated and getting into details of the UIR file format, we omit some details in the documentation here. The structure will become clearer as we continue documenting. Change-Id: I12a5ce9a34251c5358a20f2f2c4d0f9bd497f4d0 Reviewed-on: https://go-review.googlesource.com/c/go/+/671997 Reviewed-by: Robert Griesemer <gri@google.com> Auto-Submit: Mark Freeman <mark@golang.org> TryBot-Bypass: Mark Freeman <mark@golang.org>
1 parent adcad7b commit a24f4db

File tree

2 files changed

+39
-22
lines changed

2 files changed

+39
-22
lines changed

src/cmd/compile/internal/noder/doc.go

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ The payload is a series of sections. Each section has a kind which determines
2020
its index in the series.
2121
2222
SectionKind = Uint64 .
23-
Payload = SectionString // TODO(markfreeman) Define.
23+
Payload = SectionString
2424
SectionMeta
2525
SectionPosBase // TODO(markfreeman) Define.
2626
SectionPkg // TODO(markfreeman) Define.
@@ -40,6 +40,12 @@ accessed using an index relative to the start of the section.
4040
// TODO(markfreeman): Rename to SectionIndex.
4141
RelIndex = Uint64 .
4242
43+
## String Section
44+
String values are stored as elements in the string section. Elements outside
45+
the string section access string values by reference.
46+
47+
SectionString = { String } .
48+
4349
## Meta Section
4450
The meta section provides fundamental information for a package. It contains
4551
exactly two elements — a public root and a private root.

src/internal/pkgbits/doc.go

Lines changed: 32 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -16,51 +16,62 @@ zvarint = (* a zig-zag encoded signed variable-width integer *) .
1616
uvarint = (* an unsigned variable-width integer *) .
1717
1818
# Strings
19-
Strings are not encoded directly. Rather, they are deduplicated during encoding
20-
and referenced where needed.
19+
A string is a series of bytes.
2120
22-
String = [ Sync ] StringRef .
23-
StringRef = [ Sync ] Uint64 . // TODO(markfreeman): Document.
21+
// TODO(markfreeman): Does this need a marker?
22+
String = { byte } .
2423
25-
StringSlice = Uint64 // the number of strings in the slice
26-
{ String }
27-
.
24+
Strings are typically not encoded directly. Rather, they are deduplicated
25+
during encoding and referenced where needed; this process is called interning.
2826
29-
// TODO(markfreeman) It is awkward to discuss references (and by extension
30-
// strings and constants). We cannot explain how they resolve without mention
31-
// of foreign concepts. Ideally, references would be defined in familar terms —
32-
// perhaps using an index on the byte array.
27+
StringRef = [ Sync ] Ref[String] .
28+
29+
Note that StringRef is *not* equivalent to Ref[String] due to the extra marker.
30+
31+
# References
32+
References specify the location of a value. While the representation here is
33+
fixed, the interpretation of a reference is left to other packages.
34+
35+
Ref[T] = [ Sync ] Uint64 . // points to a value of type T
36+
37+
# Slices
38+
Slices are a convenience for encoding a series of values of the same type.
39+
40+
// TODO(markfreeman): Does this need a marker?
41+
Slice[T] = Uint64 // the number of values in the slice
42+
{ T } // the values
43+
.
3344
3445
# Constants
3546
Constants appear as defined via the package constant.
3647
3748
Constant = [ Sync ]
38-
Bool // whether the constant is a complex number
39-
Scalar // the real part
40-
[ Scalar ] // if complex, the imaginary part
49+
Bool // whether the constant is a complex number
50+
Scalar // the real part
51+
[ Scalar ] // if complex, the imaginary part
4152
.
4253
4354
A scalar represents a value using one of several potential formats. The exact
4455
format and interpretation is distinguished by a code preceding the value.
4556
4657
Scalar = [ Sync ]
47-
Uint64 // the code
58+
Uint64 // the code indicating the type of Val
4859
Val
4960
.
5061
5162
Val = Bool
5263
| Int64
53-
| String
54-
| Term // big integer
55-
| Term Term // big ratio, numerator / denominator
56-
| BigBytes // big float, precision 512
64+
| StringRef
65+
| Term // big integer
66+
| Term Term // big ratio, numerator / denominator
67+
| BigBytes // big float, precision 512
5768
.
5869
5970
Term = BigBytes
60-
Bool // whether the term is negative
71+
Bool // whether the term is negative
6172
.
6273
63-
BigBytes = String . // bytes of a big value
74+
BigBytes = StringRef . // bytes of a big value
6475
6576
# Markers
6677
Markers provide a mechanism for asserting that encoders and decoders are

0 commit comments

Comments
 (0)