Description
Currently in PSScriptAnalyzer, script formatting occurs by writing a ScriptRule that emits corrections. Formatting is done by running rules one at a time, applying the first of any corrections on each diagnostic, re-assembling and re-parsing the script and running again until all rules have been run.
Looking at this, we should be able to improve the efficiency of formatting significantly with a good implementation, but ideally the design does not constrain the implementation beyond what is essential to the domain problem. Therefore a formatting design should:
- Not depend on any representation of the script beyond those natural to PowerShell; formatters should depend on the string representation, AST, etc and not think about buffers or string builders
- Be agnostic of concurrency; formatters should not need to consider what else is happening while they format scripts
- Not require allocation for each formatting edit; implementations may be able to amortise allocations for formats, so forcing an object to be allocated each time adds unnecessary overhead
- Track what class supplied what formatting edit, so that edits can be filtered or otherwise managed on a per-provider basis
- Ideally still allow rules to implement formatters
A simple proposal for this is:
public abstract class ScriptEditor
{
protected ScriptEditor(ScriptEditorInfo editorInfo)
{
EditorInfo = editorInfo;
}
public ScriptEditorInfo EditorInfo { get; }
// The formatting engine would call this to run the editor
public abstract void Run(string scriptContent, Ast ast, IReadOnlyList<Token> tokens, string scriptFilePath);
// This method is how implementers edit scripts
// We can easily add overloads that accept IScriptPositions, IScriptExtents, Asts, Tokens, even line/column combinations
protected void Replace(int startOffset, int endOffset, ReadOnlySpan<char> newValue)
{
...
}
}
This could then be encapsulated into an object that actually runs the editors as a formatter:
// This might also be presented to consumers as an IScriptFormatter implemented by an internal class created with a builder
// for added flexibility in the API
public class ScriptFormatter
{
// Run inside implementation
private readonly IReadOnlyList<ScriptEditor> _scriptEditors;
public string FormatInput(string scriptInput)
{
...
}
public string FormatFile(string scriptFilePath) => FormatFile(inputFile: scriptFilePath, outputFile: scriptFilePath);
public string FormatFile(string inputFile, string outputFile)
{
...
}
}
This design is nice because:
- It limits the number of types/concepts required for an implementer to one; implement the abstract method and call the protected method
- Things like the
ScriptEditorInfo
object can be passed to the formatter implicitly, without requiring the implementer to hook anything up or opening the possibility that they will hook things up wrong - The
ScriptFormatter
object andScriptEditor
base class are free to choose an implementation, whether that serially runs, applies and reparses rules
or concurrently runs them, reifies the edits into objects, applies them serially in order, and then reruns those that couldn't be applied (this kind of choice could also be captured by something like anIScriptFormattingStrategy
, to generalise the relationship betweenScriptFormatter
andScriptEditor
)
The main drawback is that it's less simple to implement a rule and a formatter in the same class. One way to accomplish this might be a compositional approach, where rules declare themselves to also be formatters with an attribute or by implementing an interface, and then we have a ScriptEditor
implementation that uses those rules internally:
internal class RuleScriptEditor : ScriptEditor
{
private readonly ScriptRule _rule;
public RuleScriptEditor(ScriptRule rule)
{
_rule = rule;
}
public override void Run(...)
{
_rule.Run(...);
...
// Collect diagnostics from rule and use them to call the edit API
}
}
Doing this might enable the traditional approach of rules that format, while encouraging the decoupling of the two functions into two more cohesive APIs.
A final alternative to this is to make formatting APIs purely interface-based:
public interface IScriptEditor
{
void Run(IFormatStream stream, ...);
}
public interface IFormatStream
{
// Possibly IScriptEditor in the first argument
void Replace(ScriptEditorInfo editorInfo, int startOffset, int endOffset, ReadOnlySpan<char> newValue);
}
public static class ScriptEditorExtensions
{
// Convenience APIs provided to make API use simpler and require less redundancy
}
The big problems here are:
- We now require implementers to know about three classes rather than one
- An implementer must write more boilerplate about the editor info object and could pass through a bad value of
editorInfo
in the first argument and cause issues for the engine - We have no simple way to inject a
ScriptEditorInfo
object into implementing classes so we can enforce nice uniqueness/referential equality properties