Skip to content

Possible Linux kernel lock contention when running multiple judgedaemons per machine #2277

Open
@taoky

Description

@taoky

Description of the problem

When rejuding a large contest or getting a lot of submission for problems with many testcases, it could be possible that some submissions are taking much longer wall time than their CPU time. With a short timelimit overshoot these submissions might be judged as TLE even if they are correct.

And this is actually what happens in a recent ICPC Asia Regional Contest (with ~350 teams and an easy problem with 50 testcases). After taking a lot time bisecting kernel and debugging, it was found out that a lock contention issue (2 global locks: shrinker_rwsem and cgroup_mutex) in kernel < 6.3 under heavy load might block kernel operations such as cgroup and page fault handling inside memory cgroup for several seconds.

(This is fixed (or alleviated) after kernel commit torvalds/linux@da27f79)

Though it is impossible for judgedaemon (runguard) to "fix" this issue by code, mentioning the kernel issue in documentation could be helpful for server admins.

Your environment

  • DOMjudge/Webserver: any compatible version
  • OS: Ubuntu 22.04 with kernel 5.15 (default) or 6.2 (latest generic kernel in jammy repo)
  • Tested under a KVM with 32 cores and 21 or 30 judgedaemons, and a bare metal 2 CPUs (40 cores) server with 21 judgedaemons.

Steps to reproduce

Submit a correct solution many times at once like:

for i in $(seq 1 1000); ~/Downloads/domjudge-8.2.2/submit/submit --url http://localhost:12345/ --contest test -y G.cpp; end

And wait for it to be done.

Expected behaviour

Reasonable judgehost system load, and no submission takes a wall time much longer than its CPU time.

Actual behaviour

Judgehost system load >= 2 * judgedaemon number. With timelimit overshoot set to 1s|10%, some submissions are judged as TLE even they only take a very short CPU time. The judgement is very slow.

Any other information that you want to share?

#2157 mentions about "the call cgroup_delete_cgroup_ext did sometimes hang for multiple seconds". I'm afraid that a double check for this contest rejudgement might be necessary to ensure no correct solutions are judged as TLE...

If you are interested in this specific kernel issue, I have also written a blog post (Simp. Chinese) to help explain this to contestants affected in this regional contest, and for server admins in later contests.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions