Skip to content

Support for exponential requeuing time and user supplied requeuing time #263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

metalcycling
Copy link
Collaborator

In this PR, code changes are intended to fix the requeuing mechanism from triggering an AppWrapper requeuing at the beginning of the execution due to PODs taking a long time to become Running. This has been observed to happen when pulling images takes too long or in the presence of init containers that take a long time to run. The new exponential waiting time growth allows for long running starting PODs to eventually have enough time to get ready. In cases where the user wants to specify this by themselves, we expose the requeuingTimeMinutes field in the schedulingSpec stanza. By default this value is set to 5 minutes which, in our experience, is usually enough time for most applications. The new updates also check the requeuing time with respect to the last condition as opposed to when the AppWrapper was Dispatched.

… to specify how long to wait for the first requeuing period after dispatching.
…ntially. This time is set by default to 5 minutes but can be modified from the AppWrapper schedulingSpec field.
…m with user supplied initial time. This tests uses init containers to trick the requeuing mechanism into thinking the PODs have failed and are not going to complete.
Copy link
Member

@asm582 asm582 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, can you please address one comment?

asm582
asm582 previously approved these changes Feb 14, 2023
@metalcycling metalcycling merged commit f52b39c into project-codeflare:main Feb 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants