Description
Currently, we handle \r\n
explicitly in the lexer. We should do this at the file read time instead.
Motivation:
-
Line endings should not affect semantics of the language. For example, git on windows by default checkouts with
\r\n
line endings, and it would be bad if compiling code on windows led to observably different results. This could be handled on a case-by-case basis in the lexer (the current approach), but it's easy to miss something (for example Raw Byte strings do not handle\r
#60604). Additionally, proc macros currently see original tokens, and so can observe different line endings. The simplest way to make compiler lineending-invariant is to normalize line endings as soon as possible. -
For IDE features, which work close to the source code, and especially for code generation during refactorings, the surface are where you need to handle different line endings is much larger. It would be easier for IDE to assume
\n
line endings and convert at one place, at the boundary.
i propose that we convert \r\n
once, in SourceFile::new
constructor. Note that we already do BOM-removal there, so, in theory, all code should already be prepared to mismatch between in-memory and on-disk file content. The replacement shortens the string, so it can be a pretty fast in-place transformation.
One technical question is what to do with isolated \r
? I see two options:
- leave them as is. This makes the transform non-idempotent though:
\r\r\n
->\r\n
, and requires other code to explicitely not treat\r\n
as line ending. - error at file load time if there's a lone
\r
in the source text. That is, after normalization,\r
can not be found in the source code at all.
I propose to go with the second option it's slightly more annoying to implement, but seems more robust.