Skip to content

LeaderElectionManager#stopLeading maybe deadlock with Operator#installShutdownHook #1614

Closed
@xiaoma20082008

Description

@xiaoma20082008

Bug Report

What did you do?

public static void main(String[] args) {
  Operator op = new Operator(client, o -> {
    // other configuration 
    o.withLeaderElectionConfiguration(new LeaderElectionConfiguration("xxx", "yyy"));
  });
  op.register(zzzReconciler);
  op.installShutdownHook();
  op.start();
}

What did you expect to see?

when I stopped the process, it's hanging and not exit.

jstack is here:

"Thread-3" #49 prio=5 os_prio=31 cpu=11.24ms elapsed=19.71s tid=0x00007fb1bb2cf000 nid=0xdd07 waiting for monitor entry  [0x000070000fa97000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at java.lang.Shutdown.exit(java.base@17.0.3/Shutdown.java:172)
	- waiting to lock <0x00000007ffd020b8> (a java.lang.Class for java.lang.Shutdown)
	at java.lang.Runtime.exit(java.base@17.0.3/Runtime.java:115)
	at java.lang.System.exit(java.base@17.0.3/System.java:1860)
	at io.javaoperatorsdk.operator.LeaderElectionManager.stopLeading(LeaderElectionManager.java:79)
	at io.javaoperatorsdk.operator.LeaderElectionManager$$Lambda$663/0x0000000801098fa8.run(Unknown Source)
	at io.fabric8.kubernetes.client.extended.leaderelection.LeaderCallbacks.onStopLeading(LeaderCallbacks.java:38)
	at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.stopLeading(LeaderElector.java:124)
	at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$null$1(LeaderElector.java:94)
	at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector$$Lambda$878/0x00000008011cde08.accept(Unknown Source)
	at java.util.concurrent.CompletableFuture.uniWhenComplete(java.base@17.0.3/CompletableFuture.java:863)
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(java.base@17.0.3/CompletableFuture.java:841)
	at java.util.concurrent.CompletableFuture.postComplete(java.base@17.0.3/CompletableFuture.java:510)
	at java.util.concurrent.CompletableFuture.cancel(java.base@17.0.3/CompletableFuture.java:2480)
	at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$null$0(LeaderElector.java:92)
	at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector$$Lambda$877/0x00000008011cdbd0.accept(Unknown Source)
	at java.util.concurrent.CompletableFuture.uniWhenComplete(java.base@17.0.3/CompletableFuture.java:863)
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(java.base@17.0.3/CompletableFuture.java:841)
	at java.util.concurrent.CompletableFuture.postComplete(java.base@17.0.3/CompletableFuture.java:510)
	at java.util.concurrent.CompletableFuture.cancel(java.base@17.0.3/CompletableFuture.java:2480)
	at io.javaoperatorsdk.operator.LeaderElectionManager.stop(LeaderElectionManager.java:98)
	at io.javaoperatorsdk.operator.Operator.stop(Operator.java:118)
	at org.MyOperator.lambda$run$0(DmqOperator.java:75)
	at org.MyOperator$$Lambda$795/0x000000080110e6f8.run(Unknown Source)
	at java.lang.Thread.run(java.base@17.0.3/Thread.java:833)

What did you see instead? Under which circumstances?

Environment

Kubernetes cluster type:

$ Mention java-operator-sdk version from pom.xml file

both 4.0.3 and 4.1.1

<dependency>
    <groupId>io.javaoperatorsdk</groupId>
    <artifactId>operator-framework</artifactId>
    <version>4.1.1</version>
</dependency>

$ java -version

openjdk version "17.0.3" 2022-04-19
OpenJDK Runtime Environment Temurin-17.0.3+7 (build 17.0.3+7)
OpenJDK 64-Bit Server VM Temurin-17.0.3+7 (build 17.0.3+7, mixed mode, sharing)

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.3", GitCommit:"434bfd82814af038ad94d62ebe59b133fcb50506", GitTreeState:"clean", BuildDate:"2022-10-12T10:47:25Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"darwin/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.8-aliyun.1", GitCommit:"2cbb16c", GitTreeState:"", BuildDate:"2021-01-27T02:20:04Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.25) and server (1.18) exceeds the supported minor version skew of +/-1

Possible Solution

Additional context

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions