Remove nightly builds of external extensions #16100

cmb69 · 2024-09-28T14:13:57Z

While the idea to notice inadvertent API breaks early by building some external extensions is nice in theory, it doesn't make much sense in practice:

we would need to check many more extensions (e.g. parallel and uopz come to mind)
we also should check external SAPIs for a better coverage
we also should run that on platforms other than Linux for better coverage
if we introduce a deliberate API break, we need to disable breaking extensions, and to re-enable the builds once the issue if fixed downstream (e.g. the redis build has been disabled in January, but apparently it has been fixed without us having noticed)
all the extensions we currently test have their own GH actions set up; these jobs are likely better suited to notice API break, since they can run respective tests
if an extension build breaks due to downstream issues, we're probably puzzled what's going on, while downstream maintainers likely know what broke, and how to fix it
we would need nightly runs which are at least occasionally green; otherwise it is not that likely that someone even looks at individual issues there (the last green nightly was more than a month ago)

All in all, the costs of actually doing these builds seem to outweigh the benefits. As such we drop the night PECL builds.

Note that I'm not sure which branch that needs to target. But maybe we want to keep these builds anyway, so that wouldn't matter.

While the idea to notice inadvertent API breaks early by building some external extensions is nice in theory, it doesn't make much sense in practice: * we would need to check many more extensions (e.g. parallel and uopz come to mind) * we also should check external SAPIs for a better coverage * we also should run that on platforms other than Linux for better coverage * if we introduce a deliberate API break, we need to disable breaking extensions, and to re-enable the builds once the issue if fixed downstream (e.g. the redis build has been disabled in January, but apparently it has been fixed without us having noticed) * all the extensions we currently test have their own GH actions set up; these jobs are likely better suited to notice API break, since they can run respective tests * if an extension build breaks due to downstream issues, we're probably puzzled what's going on, while downstream maintainers likely know what broke, and how to fix it * we would need nightly runs which are at least occasionally green; otherwise it is not that likely that someone even looks at individual issues there (the last green nightly was more than a month ago) All in all, the costs of actually doing these builds seem to outweigh the benefits. As such we drop the night PECL builds.

iluuu1994 · 2024-09-29T08:44:39Z

I don't have a big preference either way.

we would need to check many more extensions (e.g. parallel and uopz come to mind)

Sure, but "there could be more tests" is not a great argument for dropping the existing tests.

all the extensions we currently test have their own GH actions set up; these jobs are likely better suited to notice API break, since they can run respective tests

That's true, but do they test against master? They might not even test against RC.

we would need nightly runs which are at least occasionally green

It was until we branched 8.4. I already notified Derick on the failure when it happened (he's on holiday until next week), and asked if we can prevent it from occurring every year.

cmb69 · 2024-09-29T09:46:10Z

Sure, but "there could be more tests" is not a great argument for dropping the existing tests.

Fair enough.

That's true, but do they test against master? They might not even test against RC.

Ah, right, very good point regarding testing dev (RC should be done by extensions, though).

It was until we branched 8.4. I already notified Derick on the failure when it happened (he's on holiday until next week), and asked if we can prevent it from occurring every year.

Thank you! However, with this I was referring to the overall nightly workflow (not the PECL job in particular). The more jobs are failing, the less likely anyone looks into that, I'm afraid.

After reconsideration, I don't think this single job is a real problem, so I'm closing this PR.

iluuu1994 · 2024-09-29T10:00:12Z

However, with this I was referring to the overall nightly workflow (not the PECL job in particular). The more jobs are failing, the less likely anyone looks into that, I'm afraid.

I completely agree. I've essentially been struggling with keeping nightly green since it was moved to GitHub. Apart from the obvious issues, we have the macOS build fails sometimes with this error:

The hosted runner: GitHub Actions 7 lost communication with the server. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.

I haven't found a solution to this problem, and unfortunately it's something GitHub support couldn't help me with. I already freed one CPU core, so that shouldn't be the cause. It might be a particular test, or set thereof, hogging memory or network. Unfortunately, the runner crashes with the output log not being available. Hence, it's not easy to identify which test is to blame.

Anyway, long story short, I look at nightly daily because of these intermittent errors. If we can find a solution for them, I would be very happy ofc. It's also the reason I'm currently not logging this to some public channel on Slack. If the build was more reliable, that would be an option, so more people look at it and I don't have to ping them manually. 🙂

cmb69 · 2024-09-29T11:41:14Z

I haven't found a solution to this problem, and unfortunately it's something GitHub support couldn't help me with. I already freed one CPU core, so that shouldn't be the cause. It might be a particular test, or set thereof, hogging memory or network. Unfortunately, the runner crashes with the output log not being available. Hence, it's not easy to identify which test is to blame.

Looks like this is a common problem with self hosted runners, but maybe even for hosted runners, it is a GH issue. On the other hand, does this also happen for our push workflow? If not, maybe we're running too many nightly jobs, or it's just a bad time.

Anyhow, I'm not too much concerned about an occassionally failing job (of course, that's annoying, though). But when I looked at yesterday's nightlies, I was a bit shocked to see so many failing jobs. PR #16107 and PR #16097 should be a good step. Now I was looking at https://github.com/php/php-src/actions/runs/11088570760, and surprised that only few jobs had failed. On closer investigation I've noticed that only branches with new commits are tested. In my opinion, that's not the best idea, since jobs may fail for other reasons (see e.g. PR #16107), and when a new release of a security branch is made, all over sudden CI may be completely red. In my opinion, we need to keep CI green even for security branches.

And I'm completely aware that the two paragraphs are contradictory ("too many jobs" vs. "always run everything"). Maybe a solution is to have alternating nightlies (run one half of the jobs on one day, and the other the other day); might still be good enough to catch regressions.

iluuu1994 · 2024-09-30T01:40:43Z

On the other hand, does this also happen for our push workflow?

I've seen it on occasion, but it's more noticeable in nightly because it runs many more jobs.

In my opinion, that's not the best idea, since jobs may fail for other reasons (see e.g. PR #16107), and when a new release of a security branch is made, all over sudden CI may be completely red

Nightly is already a ton of work. Keeping security branches green all the time (not just when we do a security release) essentially means 5 years of support per branch (1 for development, 2 active support, 2 security support). Since CI branches often diverge it's not just a matter of merging one branch lower. This is especially problematic when there are actual code changes required to keep the build green, as we're not supported to do that in security branches. Even Ubuntu LTS will run out for every other PHP release.

Also note that with security support now being extended, we would essentially be going from 3 branches to 5 tested in nightly, increasing the number of jobs by 66%.

cmb69 · 2024-09-30T17:24:20Z

That's true, but do they test against master? They might not even test against RC.

Ah, right, very good point regarding testing dev (RC should be done by extensions, though).

I've noticed that none of the extension run CI for PHP 8.4 on Windows (some even don't for other OSs); submitted PR about that.

Nightly is already a ton of work.

I know. Thanks for working on that! However, 16:0 votes on https://wiki.php.net/rfc/travis_ci, and 34:0 on https://wiki.php.net/rfc/release_cycle_update, and apparently almost nobody of the voters cares about that. Sad.

iluuu1994 · 2024-09-30T20:46:48Z

I wasn't aware of the Travis CI RFC/policy. It does say:

The policy would apply for all branches in active development or maintenance, starting with PHP 5.5.

Which doesn't seem to include security branches.

Don't get me wrong, I can see your point. Of course, it would be optimal to have all branches that still get releases passing. I'm just afraid this is going to be a significant pain point.

I would say that I'd object to running all branches nightly. I don't think that's necessary and wasteful. If we want to keep security branches green, it should be enough to run them weekly. That would reduce build times by a lot. After some time, I can check in my work logs how much time I invested in it.

We might also try to get to the bottom of #16100 (comment) by logging to some external service, so that the logs persist. It's annoying but I don't see any other way to get to the bottom of this, especially since it occurs intermittently.

I would appreciate help on these though.

cmb69 · 2024-10-01T09:49:05Z

If we want to keep security branches green, it should be enough to run them weekly.

After reconsideration, I think it would even be okay to have workflow_dispatch triggers for the push actions of all branches. These can be run from time to time for those branches which don't get any commits (in practise, only security branches), and that should be sufficient to detect the most serious issues with CI. See #16148.

I'll try to help with getting nightly green; last night's run doesn't look too bad. And I doubt that we can resolve all these intermittent test failures anyway, especially when nightly runs so many jobs on hosted runners.

cmb69 · 2024-10-01T12:04:23Z

@iluuu1994, the nightly MSAN issue should now be fixed via 4a16940.

iluuu1994 · 2024-10-01T13:51:01Z

Thank you!

cmb69 requested a review from TimWolla as a code owner September 28, 2024 14:13

cmb69 requested review from iluuu1994 and removed request for TimWolla September 28, 2024 14:14

cmb69 closed this Sep 29, 2024

cmb69 deleted the cmb/drop-pecl-builds branch September 29, 2024 09:47

cmb69 mentioned this pull request Oct 4, 2024

Allow Xdebug to be built with PHP master xdebug/xdebug#977

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove nightly builds of external extensions #16100

Remove nightly builds of external extensions #16100

Uh oh!

cmb69 commented Sep 28, 2024

Uh oh!

iluuu1994 commented Sep 29, 2024 •

edited

Loading

Uh oh!

cmb69 commented Sep 29, 2024

Uh oh!

iluuu1994 commented Sep 29, 2024

Uh oh!

cmb69 commented Sep 29, 2024

Uh oh!

iluuu1994 commented Sep 30, 2024

Uh oh!

cmb69 commented Sep 30, 2024

Uh oh!

iluuu1994 commented Sep 30, 2024

Uh oh!

cmb69 commented Oct 1, 2024

Uh oh!

cmb69 commented Oct 1, 2024

Uh oh!

iluuu1994 commented Oct 1, 2024

Uh oh!

Uh oh!

Remove nightly builds of external extensions #16100

Remove nightly builds of external extensions #16100

Uh oh!

Conversation

cmb69 commented Sep 28, 2024

Uh oh!

iluuu1994 commented Sep 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmb69 commented Sep 29, 2024

Uh oh!

iluuu1994 commented Sep 29, 2024

Uh oh!

cmb69 commented Sep 29, 2024

Uh oh!

iluuu1994 commented Sep 30, 2024

Uh oh!

cmb69 commented Sep 30, 2024

Uh oh!

iluuu1994 commented Sep 30, 2024

Uh oh!

cmb69 commented Oct 1, 2024

Uh oh!

cmb69 commented Oct 1, 2024

Uh oh!

iluuu1994 commented Oct 1, 2024

Uh oh!

Uh oh!

iluuu1994 commented Sep 29, 2024 •

edited

Loading