You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/tokens.md
+30-22Lines changed: 30 additions & 22 deletions
Original file line number
Diff line number
Diff line change
@@ -37,6 +37,8 @@ Literals are tokens used in [literal expressions].
37
37
38
38
[^nsets]: The number of `#`s on each side of the same literal must be equivalent.
39
39
40
+
> **Note**: Character and string literal tokens never include the sequence of `U+000D` (CR) immediately followed by `U+000A` (LF): this pair would have been previously transformed into a single `U+000A` (LF).
41
+
40
42
#### ASCII escapes
41
43
42
44
|| Name |
@@ -156,13 +158,10 @@ A _string literal_ is a sequence of any Unicode characters enclosed within two
156
158
`U+0022` (double-quote) characters, with the exception of `U+0022` itself,
157
159
which must be _escaped_ by a preceding `U+005C` character (`\`).
158
160
159
-
Line-breaks are allowed in string literals.
160
-
A line-break is either a newline (`U+000A`) or a pair of carriage return and newline (`U+000D`, `U+000A`).
161
-
Both byte sequences are translated to `U+000A`.
162
-
161
+
Line-breaks, represented by the character `U+000A` (LF), are allowed in string literals.
163
162
When an unescaped `U+005C` character (`\`) occurs immediately before a line break, the line break does not appear in the string represented by the token.
164
163
See [String continuation escapes] for details.
165
-
164
+
The character `U+000D` (CR) may not appear in a string literal other than as part of such a string continuation escape.
166
165
167
166
#### Character escapes
168
167
@@ -198,10 +197,10 @@ following forms:
198
197
199
198
Raw string literals do not process any escapes. They start with the character
200
199
`U+0072` (`r`), followed by fewer than 256 of the character `U+0023` (`#`) and a
201
-
`U+0022` (double-quote) character. The _raw string body_ can contain any sequence
202
-
of Unicode characters and is terminated only by another `U+0022` (double-quote)
203
-
character, followed by the same number of `U+0023` (`#`) characters that preceded
204
-
the opening `U+0022` (double-quote) character.
200
+
`U+0022` (double-quote) character.
201
+
202
+
The _raw string body_ can contain any sequence of Unicode characters other than `U+000D` (CR).
203
+
It is terminated only by another `U+0022` (double-quote) character, followed by the same number of `U+0023` (`#`) characters that preceded the opening `U+0022` (double-quote) character.
205
204
206
205
All Unicode characters contained in the raw string body represent themselves,
207
206
the characters `U+0022` (double-quote) (except when followed by at least as
@@ -259,6 +258,11 @@ the literal, it must be _escaped_ by a preceding `U+005C` (`\`) character.
259
258
Alternatively, a byte string literal can be a _raw byte string literal_, defined
260
259
below.
261
260
261
+
Line-breaks, represented by the character `U+000A` (LF), are allowed in byte string literals.
262
+
When an unescaped `U+005C` character (`\`) occurs immediately before a line break, the line break does not appear in the string represented by the token.
263
+
See [String continuation escapes] for details.
264
+
The character `U+000D` (CR) may not appear in a byte string literal other than as part of such a string continuation escape.
265
+
262
266
Some additional _escapes_ are available in either byte or non-raw byte string
263
267
literals. An escape starts with a `U+005C` (`\`) and continues with one of the
> _any ASCII (i.e. 0x00 to 0x7F) except IsolatedCR_
289
293
290
294
Raw byte string literals do not process any escapes. They start with the
291
295
character `U+0062` (`b`), followed by `U+0072` (`r`), followed by fewer than 256
292
-
of the character `U+0023` (`#`), and a `U+0022` (double-quote) character. The
293
-
_raw string body_ can contain any sequence of ASCII characters and is terminated
294
-
only by another `U+0022` (double-quote) character, followed by the same number of
295
-
`U+0023` (`#`) characters that preceded the opening `U+0022` (double-quote)
296
-
character. A raw byte string literal can not contain any non-ASCII byte.
296
+
of the character `U+0023` (`#`), and a `U+0022` (double-quote) character.
297
+
298
+
The _raw string body_ can contain any sequence of ASCII characters other than `U+000D` (CR).
299
+
It is terminated only by another `U+0022` (double-quote) character, followed by the same number of `U+0023` (`#`) characters that preceded the opening `U+0022` (double-quote) character.
300
+
A raw byte string literal can not contain any non-ASCII byte.
297
301
298
302
All characters contained in the raw string body represent their ASCII encoding,
299
303
the characters `U+0022` (double-quote) (except when followed by at least as
@@ -340,6 +344,11 @@ C strings are implicitly terminated by byte `0x00`, so the C string literal
340
344
literal `b"\x00"`. Other than the implicit terminator, byte `0x00` is not
341
345
permitted within a C string.
342
346
347
+
Line-breaks, represented by the character `U+000A` (LF), are allowed in C string literals.
348
+
When an unescaped `U+005C` character (`\`) occurs immediately before a line break, the line break does not appear in the string represented by the token.
349
+
See [String continuation escapes] for details.
350
+
The character `U+000D` (CR) may not appear in a C string literal other than as part of such a string continuation escape.
351
+
343
352
Some additional _escapes_ are available in non-raw C string literals. An escape
344
353
starts with a `U+005C` (`\`) and continues with one of the following forms:
345
354
@@ -382,11 +391,10 @@ c"\xC3\xA6";
382
391
383
392
Raw C string literals do not process any escapes. They start with the
384
393
character `U+0063` (`c`), followed by `U+0072` (`r`), followed by fewer than 256
385
-
of the character `U+0023` (`#`), and a `U+0022` (double-quote) character. The
386
-
_raw C string body_ can contain any sequence of Unicode characters (other than
387
-
`U+0000`) and is terminated only by another `U+0022` (double-quote) character,
388
-
followed by the same number of `U+0023` (`#`) characters that preceded the
389
-
opening `U+0022` (double-quote) character.
394
+
of the character `U+0023` (`#`), and a `U+0022` (double-quote) character.
395
+
396
+
The _raw C string body_ can contain any sequence of Unicode characters other than `U+0000` (NUL) and `U+000D` (CR).
397
+
It is terminated only by another `U+0022` (double-quote) character, followed by the same number of `U+0023` (`#`) characters that preceded the opening `U+0022` (double-quote) character.
390
398
391
399
All characters contained in the raw C string body represent themselves in UTF-8
392
400
encoding. The characters `U+0022` (double-quote) (except when followed by at
0 commit comments