Description
Is your enhancement request related to a problem? Please describe.
CI is currently getting stuck a lot. This is terrible for contributors, even simple, small changes can take weeks to merge because CI is taking multiple runs to finally succeed. The main reason for these CI issues are flaky tests, i.e. tests that sometimes fail and sometimes succeed. Contributors have no way of fixing these failing tests, waiting for a contributor who has the permission to restart failing CI jobs.
I think this drags down morale and exhausts possible contributors.
Describe the solution you'd like
As long as we cannot avoid having flaky tests, these flaky tests should not block CI.
Thus, I propose to run the tests in CI, but no longer fail a CI job if a flaky test fails.
We introduce a new combinator
-- | Mark test as flaky for a specific target
knownFlakyFor :: BrokenTarget -> String -> TestTree -> TestTree
knownFlakyFor ...
-- | Mark test as flaky for all GHC versions
knownFlaky :: String -> TestTree -> TestTree
knownFlaky ...
which works like the existing knownBrokenFor
and ignoreFor
functions and ties into the same infrastructure, by allowing us to specify on which platform the tests are flaky. Note, however, it stands to reason, tests are usually not flaky because of GHC but LSP things and parallelism.
I want to mark all tests that are known flaky with these combinators, together with possible reasons.
CI should no longer retry a single test suite, but whenever a test is demonstrable flaky, then it should be marked as such. It will also reduce the repetitive code in test.yml where we just blindly retry all test-suites.
Unfortunately, in experiments, it has shown that basically all lsp-test tests are prone to flakiness. However, they are so flaky, that re-running the test suite usually works fine to make the test succeed. Thus, we only mark tests as flaky that are too flaky, i.e., if they fail twice in a row. With that approach, I hope we can get CI to succeed reliably again.
Describe alternatives you've considered
Fix all the flaky tests, by either improving lsp-test to be easier to use correctly, migrating integration-tests to unit-tests or fixing the test-cases by introducing some kind of sync points.
I also don't think we run the risk of marking a test as flaky and forgetting about them instead of fixing them. Currently, if a test is flaky, we just restart the CI, usually not even looking at what test failed in particular (correct me if I am wrong).
Testsuite maintenance and bitrot is an important topic anyway, and we need to review the tests that are marked as broken as well, so we just document the status quo in the code, in my opinion.