Description
We've been plagued by various intermittent test failures on CI in container jasmine2
lately, for example: PR #2850 and most recently the August 17 master
.
These failures can't be reproduced locally and are almost impossible to reproduce manually during an ssh session. We suspect that these failures are due to CI ressources blowing up during our @gl
tests.
Branch try-fix-test-fail
attempted a few ways to "dilutate" those expansive @gl
in multiple test runs. The results were mitigated, as the intermittent failures seemed to persist. But note that, this wasn't a particularly great attempt: running the npm run test-jasmine
command one half of the files that contain @gl
and then the second half resulted in 25 tests in the first run and 80+ in the second - as tests under @gl
aren't evenly distributed from test file to test file.
Alternatively, we could try to incorporate karma-parallel
module into our tests, but at first glance it looks incompatible with the karma-spec-tags
plugin we're using to group tests by @
tags.
So, in a private convo @alexcjohnson propose a slightly more evolved solution:
I’m thinking (for performance as well) we should look for a way (that doesn’t require us to manage it all the time) to split the flaky and gl jasmine tests into like chunks of 10 or 20 tests, and run each of THOSE in a retry loop. Also because then if there’s a real failure in there, it will only have one small chunk to retry 5x.
But finding all it
test cases in a describe('@gl .,.'
block isn't trivial, so first we'll need to move all those @gl
tags in describe
statements down to their child it
blocks. We should then make sure no tags are set in describe
block in a test syntax test.