From af284e1ad223d63ace5768bb4db325d2c250ddd6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Facundo=20Dom=C3=ADnguez?= Date: Tue, 11 Mar 2025 11:28:11 -0300 Subject: [PATCH 1/4] Tooling for maintaining a GHC API --- proposals/ghc-api-tooling.md | 197 +++++++++++++++++++++++++++++++++++ 1 file changed, 197 insertions(+) create mode 100644 proposals/ghc-api-tooling.md diff --git a/proposals/ghc-api-tooling.md b/proposals/ghc-api-tooling.md new file mode 100644 index 0000000..4f611d4 --- /dev/null +++ b/proposals/ghc-api-tooling.md @@ -0,0 +1,197 @@ +# Tooling for maintaining a GHC API + +## Abstract + +This proposal is to build tools to define and maintain a GHC API. Some +automation is necessary to monitor the needs of projects using GHC as a library, +and to make GHC developers aware when their changes affect these projects. With +this knowledge, the involved parts of GHC can be better defined and documented. + +## Background + +The Haskell Foundation started the GHC API stability initiative last year. This +is a project that aims to identify and mitigate how the GHC compiler affects the +maintainance costs of Haskell tools which use the GHC compiler as a library. + +During an [outreach phase], the most cited concern by tooling authors was the lack +of documentation that would effectively help them use and upgrade the +[ghc library]. + +[ghc library]: https://hackage.haskell.org/package/ghc +[outreach phase]: https://discourse.haskell.org/t/ghc-api-stability-update-3/11407 + +Documentation is important to use any library, and the compiler is documented +both in the code and [beyond][ghc commentary]. However, the general sentiment +is that documentation is still lacking. This could be due to documentation +being not easy to navigate and discover, for instance if there are relevant +cross references that are missing. And secondly, it could be due to +documentation being written for an audience with a shared context about GHC +internals, which does not always include the authors of Haskell tooling. + +[ghc commentary]: https://gitlab.haskell.org/ghc/ghc-wiki-mirror/-/blob/master/commentary.md + +When considering what to document better, GHC developers conjecture that not +all of the GHC implementation is currently used by users of the `ghc` library, +and so it would be necessary to identify which parts of the implementation need +to be documented for external use. + +## Problem Statement + +The problem this proposal aims to address is identifying the parts of the GHC +implementation that are used in other packages, and improving the documentation +of these parts so it is accessible to an audience not initially acquainted with +the GHC implementation. + +If the project succeeds, good documentation will save tooling authors the cost +of discovering what the GHC implementation does by trial and error. In practice, +poor understanding of the GHC implementation translates in a long stream of +bugs to fix in downstream projects until each project finally gets the +understanding right. Additionally, the definition of a GHC API should reduce the +amount of changes necessary to Haskell tools during upgrades of the API. + +A solution should make easy for GHC developers to know when they are about to +change parts of the GHC implementation that are used in other packages, and it +should offer to tool authors the documentation they need to make effective use +of the GHC implementation in their projects. This documentation must allow a +newcomer to answer at least which features are offered by the GHC +implementation, how they are used, and what is the meaning of the involved +types and functions. + +## Prior Art and Related Efforts + +To the best of my knowledge, no project has tried before to improve the +documentation of the GHC implementation, though there have been efforts +to refactor the implementation itself to make it easier to maintain and +reuse. This author thinks that accessible documentation amplifies the +the benefits of any code changes. + +## Technical Content + +In order to identify which parts of the GHC implementation are used by other +packages, GHC developers should have an index of the names from the `ghc` +library that are used in a selected set of packages, called henceforth the +indexing set. This index can be used to define the curated subset of the GHC +implementation that will be exposed to tooling authors under some designated +module hierarchy. In this document, we will refer to this curated subset as +the GHC API. + +The size of the initial GHC API can be tuned by growing the indexing set +progressively, starting with the projects that are considered most relevant to +the community, and relaxing it as more resources become available. + +The GHC API will indicate the features that need to be documented for external +use, and it will allow to flag the changes to GHC that affect it. GHC +developers would then have the opportunity to decide whether to make the changes +backward compatible or document the API changes for their users. + +The following phases emerge from these considerations. + +### Indexing Phase + +This phase should produce a tool that can build the index of names from the +`ghc` library (and perhaps `ghc-lib-parser`) which are used in other packages. +It should be possible to configure which packages or units to include in the +indexed set. + +In addition, a library should be provided that allows us to query the index. +The following queries should be possible to answer: + +* The list of names from the `ghc` library that are used by other packages. Note + that the index should provide enough information to allow importing the name + (e.g. whether it is a pattern synonym; or if it is the name of a data + constructor or a field, it should be accompanied by the name of the data type). +* The modules from other units that are using a given name +* The most commonly used names from the `ghc` library + +This phase could be based on the compiler plugin and the analysis script in +[this repo][indexing repo], or it could be based on other indexing solutions. + +[indexing repo]: https://github.com/tweag/ghc-api-usage-stats + +### API generation phase + +This phase should produce a tool that generates or regenerates modules in the +GHC API from the index. If a module does not exist yet, it should be +created from some configurable template. If the module already exists, the tool +should edit the export list and import declarations while trying to preserve +the contents in the rest of the module file. Other generators sometimes +implement special comments to designate lines that should not be modified by +the generator. + +The tool should probably allow us to specify rules to indicate a few things: +* which names should be exposed in which modules +* which modules should be used to bring some names into scope +* to exclude some names from being exported despite appearing in the index. + A file with a list of excluded names should be generated if using globbing + or similar in the rules, so new excluded names are made visible when + regenerating the API. + +### Documentation review phase + +In this phase, the code documentation of GHC needs to reviewed, and procedures +need to be documented to keep it up to date. + +For the review part, a team of a newcomer and an experienced contributor should +systematically review the documentation of each module in the exposed subset. +Perhaps starting by the most commonly used definitions as indicated by the +index queries. + +For the update procedures, it should be documented what the GHC API is, how to +update it, and when to update it. Newcomers should be invited to request +documentation improvements. Documentation improvements should be made fast and +easy to merge. Maybe most continuous integration (CI) jobs could be skipped for +documentation updates except for some linting. + +Additionally, the immutability of the GHC API needs to be checked in GHC's CI. +Tooling to do this already exists for other parts of GHC, so this task should +be mostly about configuration work. + +### Risks and Limitations + +The project could fail if the size of the GHC API exceeds the availability of +the community to document it all. In such a case, the project should still be +helpful to identify the areas of the GHC implementation that still need +additional effort to better support their exposure. + +Not all changes to the GHC API will be possible to detect automatically, in +particular, changes in behavior that don't modify types or the type signatures +of functions. Alternatively, the proposal could be extended to try to detect +changes to documentation of definitions that appear in the GHC API. But still +there will be shades of behavior that will likely not be caught in documentation +either. + +## Timeline + +There are no specific deadlines to this project. + +## Budget + +The cost of this project involves the engineering time needed to perform +the identified phases. The following is a rough guess from the proposer, +but it needs to be refined with whoever is appointed to execute the project. + +``` +Indexing phase --- 40 hours +API generation phase --- 80 hours +Documentation review phase --- depends on the chosen indexing set +``` + +The actual money required also needs to be negotiated with the appointed +developers. + +## Stakeholders + +* GHC developers +* Tooling authors from the [outreach phase] +* Users of Haskell tools who need them to stay up to date + +## Success + +The project will be successful if the users of the `ghc` library have an +accurate understanding of what it will take to upgrade their projects to use a +newer version of the compiler by reading changelogs and the API documentation, +thus eliminating the trial and error costs. + +The project will be successful too if accidental breakage of downstream tooling +is avoided thanks to the definition of a GHC API whose modifications are +flagged by GHC's CI. From f8a06d46a7ac74102d255af27c7f064799056b00 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Facundo=20Dom=C3=ADnguez?= Date: Wed, 19 Mar 2025 15:20:53 -0300 Subject: [PATCH 2/4] Refocus the proposal to documenting and defining a GHC API one tool at a time --- proposals/ghc-api-tooling.md | 211 ++++++++++++++++------------------- 1 file changed, 97 insertions(+), 114 deletions(-) diff --git a/proposals/ghc-api-tooling.md b/proposals/ghc-api-tooling.md index 4f611d4..70ea999 100644 --- a/proposals/ghc-api-tooling.md +++ b/proposals/ghc-api-tooling.md @@ -1,11 +1,10 @@ -# Tooling for maintaining a GHC API +# A process to incrementally document a GHC API ## Abstract -This proposal is to build tools to define and maintain a GHC API. Some -automation is necessary to monitor the needs of projects using GHC as a library, -and to make GHC developers aware when their changes affect these projects. With -this knowledge, the involved parts of GHC can be better defined and documented. +This proposal defines a process to define and document a GHC API. The process +involves tooling authors, GHC developers, and the Haskell Foundation in +defining and validating pieces of a GHC API in incremental fashion. ## Background @@ -35,6 +34,11 @@ all of the GHC implementation is currently used by users of the `ghc` library, and so it would be necessary to identify which parts of the implementation need to be documented for external use. +Another conjecture is that documenting various parts of the `ghc` library for +external consumption, will most likely require some amount of refactoring to +separate implementation details from the essentials of the features that the +library provides. + ## Problem Statement The problem this proposal aims to address is identifying the parts of the GHC @@ -59,125 +63,105 @@ types and functions. ## Prior Art and Related Efforts -To the best of my knowledge, no project has tried before to improve the -documentation of the GHC implementation, though there have been efforts -to refactor the implementation itself to make it easier to maintain and -reuse. This author thinks that accessible documentation amplifies the -the benefits of any code changes. +Improving documentation has been considered [before][ghc modularity] as a +desirable target, but it was often encountered that the best documentation +required code changes ([hierarchical modules proposal], [hierarchical modules ticket], [compiler modularity ticket]). + +Additionally, defining an API for GHC has been tried +[before][overview of the current GHC API] too. All definitions +to be exposed for external use were put in the [GHC module] of the `ghc` +library, but at the moment tools still depend on definitions not exposed +in this way. + +[ghc modularity]: https://hsyl20.fr/files/papers/2022-ghc-modularity.pdf +[hierarchical modules proposal]: https://github.com/ghc-proposals/ghc-proposals/pull/57 +[hierarchical modules ticket]: https://gitlab.haskell.org/ghc/ghc/-/issues/13009 +[compiler modularity ticket]: https://gitlab.haskell.org/ghc/ghc/-/issues/17957 +[overview of the current GHC API]: https://github.com/ghc-proposals/ghc-proposals/pull/57#issuecomment-312111938 +[GHC module]: https://gitlab.haskell.org/ghc/ghc/-/commit/71ae8ec9651216330ac49e9eae60d195e65c7506 + +This proposal aims to combine the documentation effort with the API definition. +Where earlier documentation attempts found difficulties related to the structure +of the code, this proposal aims to reduce the scope to the needs of specific +tools in incremental fashion. Where the earlier API definition didn't achieve +the engagement necessary to keep up with the evolution of the tool ecosystem, +this proposal aims to coordinate the stakeholders to support specific tools as +well. ## Technical Content -In order to identify which parts of the GHC implementation are used by other -packages, GHC developers should have an index of the names from the `ghc` -library that are used in a selected set of packages, called henceforth the -indexing set. This index can be used to define the curated subset of the GHC -implementation that will be exposed to tooling authors under some designated -module hierarchy. In this document, we will refer to this curated subset as -the GHC API. - -The size of the initial GHC API can be tuned by growing the indexing set -progressively, starting with the projects that are considered most relevant to -the community, and relaxing it as more resources become available. - -The GHC API will indicate the features that need to be documented for external -use, and it will allow to flag the changes to GHC that affect it. GHC -developers would then have the opportunity to decide whether to make the changes -backward compatible or document the API changes for their users. - -The following phases emerge from these considerations. - -### Indexing Phase - -This phase should produce a tool that can build the index of names from the -`ghc` library (and perhaps `ghc-lib-parser`) which are used in other packages. -It should be possible to configure which packages or units to include in the -indexed set. - -In addition, a library should be provided that allows us to query the index. -The following queries should be possible to answer: - -* The list of names from the `ghc` library that are used by other packages. Note - that the index should provide enough information to allow importing the name - (e.g. whether it is a pattern synonym; or if it is the name of a data - constructor or a field, it should be accompanied by the name of the data type). -* The modules from other units that are using a given name -* The most commonly used names from the `ghc` library - -This phase could be based on the compiler plugin and the analysis script in -[this repo][indexing repo], or it could be based on other indexing solutions. - -[indexing repo]: https://github.com/tweag/ghc-api-usage-stats - -### API generation phase - -This phase should produce a tool that generates or regenerates modules in the -GHC API from the index. If a module does not exist yet, it should be -created from some configurable template. If the module already exists, the tool -should edit the export list and import declarations while trying to preserve -the contents in the rest of the module file. Other generators sometimes -implement special comments to designate lines that should not be modified by -the generator. - -The tool should probably allow us to specify rules to indicate a few things: -* which names should be exposed in which modules -* which modules should be used to bring some names into scope -* to exclude some names from being exported despite appearing in the index. - A file with a list of excluded names should be generated if using globbing - or similar in the rules, so new excluded names are made visible when - regenerating the API. - -### Documentation review phase - -In this phase, the code documentation of GHC needs to reviewed, and procedures -need to be documented to keep it up to date. - -For the review part, a team of a newcomer and an experienced contributor should -systematically review the documentation of each module in the exposed subset. -Perhaps starting by the most commonly used definitions as indicated by the -index queries. - -For the update procedures, it should be documented what the GHC API is, how to -update it, and when to update it. Newcomers should be invited to request -documentation improvements. Documentation improvements should be made fast and -easy to merge. Maybe most continuous integration (CI) jobs could be skipped for -documentation updates except for some linting. - -Additionally, the immutability of the GHC API needs to be checked in GHC's CI. -Tooling to do this already exists for other parts of GHC, so this task should -be mostly about configuration work. +The core of the proposal is a project template to define and document a subset +of the GHC implementation needed for one of several tools. This template can then +be applied to more tools later on. + +In executing such a project, there needs to be at least the following roles: +* a project developer to write the documentation and analyse and implement + any necessary changes; +* a GHC mentor to represent the GHC developers, who will support the project + developer by providing feedback on the work and providing technical insight + when needed; and +* a tool maintainer to validate and provide feedback on the produced + documentation and API. + +The following steps are necessary to perform the project. + +1. The Haskell Foundation and GHC developers select some tool to serve first. + Availability of the tool maintainers needs to be checked at this stage, and + the roles of the project developer and the GHC mentor need to be assigned. + Also the project scope needs to be defined at this time. + +2. The project developer studies the functions and types that the tool uses + from the `ghc` library, and engages with the tool maintainer where their + purpose of using them is unclear. + +3. The project developer makes a proposal for an API that suits the tool use + case, if any refactorings are necessary. Both the GHC mentor and the + tool maintainer need to agree on the proposal before proceeding with the + implementation. + +4. If there is agreement on an API proposal, the project developer implements + it and documents or provides links so people unacquainted with the GHC + implementation can understand it. + +5. The GHC mentor and the tool maintainer validate the accuracy and the + completeness of the documentation. + +6. If appropriate, the project developer might update GHC so it uses the new + API as well. + +After the project, GHC developers are responsible for maintaining the new API. +No guarantees of backward compatibility are required, but guidance needs to be +provided to clients of the API to accommodate for changes to it. + +In principle, there are no constraints on which a client of the `ghc` library +should be chosen first, but the community already shows agreement on +[splitting a parser library] that could benefit from the template structure, +and there are perhaps smaller projects like [print-api] on which to test the template, +which could allow for a smaller initial commitment. + +[splitting a parser library]: https://github.com/haskellfoundation/tech-proposals/pull/56 +[print-api]: https://github.com/Kleidukos/print-api + ### Risks and Limitations -The project could fail if the size of the GHC API exceeds the availability of -the community to document it all. In such a case, the project should still be -helpful to identify the areas of the GHC implementation that still need -additional effort to better support their exposure. +The project could have difficulties if the scope definition in the initial step +ends up invalidated by new insight in later stages. This could require revising +scope and budget midway through. -Not all changes to the GHC API will be possible to detect automatically, in -particular, changes in behavior that don't modify types or the type signatures -of functions. Alternatively, the proposal could be extended to try to detect -changes to documentation of definitions that appear in the GHC API. But still -there will be shades of behavior that will likely not be caught in documentation -either. +Tools might evolve after the project, requiring APIs to be modified or redesigned. +Ideally, the design of an API would allow to use it together with unsupported GHC +definitions, so users that miss a feature are not forced to choose between using +the API or only using GHC unsupported definitions. ## Timeline -There are no specific deadlines to this project. +There are no specific deadlines to this proposal. ## Budget -The cost of this project involves the engineering time needed to perform -the identified phases. The following is a rough guess from the proposer, -but it needs to be refined with whoever is appointed to execute the project. - -``` -Indexing phase --- 40 hours -API generation phase --- 80 hours -Documentation review phase --- depends on the chosen indexing set -``` - -The actual money required also needs to be negotiated with the appointed -developers. +The cost of this project involves the engineering time needed for each of +the identified roles, and it will need to be negotiated in each case. ## Stakeholders @@ -187,11 +171,10 @@ developers. ## Success -The project will be successful if the users of the `ghc` library have an +The project will be successful if the maintainer of the chosen Haskell tool has an accurate understanding of what it will take to upgrade their projects to use a newer version of the compiler by reading changelogs and the API documentation, thus eliminating the trial and error costs. The project will be successful too if accidental breakage of downstream tooling -is avoided thanks to the definition of a GHC API whose modifications are -flagged by GHC's CI. +is avoided thanks to the definition of a GHC API. From 389c1786ff32e535b932579528d685bb89701b28 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Facundo=20Dom=C3=ADnguez?= Date: Wed, 19 Mar 2025 15:22:43 -0300 Subject: [PATCH 3/4] Rename ghc-api-tooling.md to ghc-api-definition-process.md --- proposals/{ghc-api-tooling.md => ghc-api-definition-process.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename proposals/{ghc-api-tooling.md => ghc-api-definition-process.md} (100%) diff --git a/proposals/ghc-api-tooling.md b/proposals/ghc-api-definition-process.md similarity index 100% rename from proposals/ghc-api-tooling.md rename to proposals/ghc-api-definition-process.md From 319761a52660c352fc9f01567a1079ca682cb0ef Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Facundo=20Dom=C3=ADnguez?= Date: Wed, 19 Mar 2025 19:15:48 -0300 Subject: [PATCH 4/4] Note the opportunity to ask feedback from the maintainers of other tools --- proposals/ghc-api-definition-process.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/proposals/ghc-api-definition-process.md b/proposals/ghc-api-definition-process.md index 70ea999..ee74e16 100644 --- a/proposals/ghc-api-definition-process.md +++ b/proposals/ghc-api-definition-process.md @@ -117,7 +117,8 @@ The following steps are necessary to perform the project. 3. The project developer makes a proposal for an API that suits the tool use case, if any refactorings are necessary. Both the GHC mentor and the tool maintainer need to agree on the proposal before proceeding with the - implementation. + implementation. Feedback from the maintainers of other tools with similar + needs could be invited at this time. 4. If there is agreement on an API proposal, the project developer implements it and documents or provides links so people unacquainted with the GHC