Skip to content

[FLINK-28574] Bump the JOSDK version to 3.2.1 #362

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 19, 2022

Conversation

SteNicholas
Copy link
Member

@SteNicholas SteNicholas commented Sep 5, 2022

What is the purpose of the change

Java Operator SDK has released 3.2.1 version. Bump the JOSDK version to 3.2.1.

Brief change log

  • Bump the Java Operator SDK to 3.2.1

Verifying this change

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (100MB)
  • Extended integration test for recovery after master (JobManager) failure
  • Manually verified the change by running a 4 node cluster with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changes to the CustomResourceDescriptors: (yes / no)
  • Core observer or reconciler logic that is regularly executed: (yes / no)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@mbalassi
Copy link
Contributor

mbalassi commented Sep 6, 2022

Thanks @SteNicholas, it seems that bumping the fabric8 client has further implications - please double check the test failures:

[INFO] Running org.apache.flink.kubernetes.operator.kubeclient.factory.StandaloneKubernetesJobManagerFactoryTest
Error:  Tests run: 7, Failures: 0, Errors: 7, Skipped: 0, Time elapsed: 0.182 s <<< FAILURE! - in org.apache.flink.kubernetes.operator.kubeclient.factory.StandaloneKubernetesJobManagerFactoryTest
Error:  org.apache.flink.kubernetes.operator.kubeclient.factory.StandaloneKubernetesJobManagerFactoryTest.testDeploymentMetadata  Time elapsed: 0.101 s  <<< ERROR!
java.lang.NoSuchMethodError: 'io.fabric8.kubernetes.api.model.ServicePortFluent io.fabric8.kubernetes.api.model.ServiceSpecFluent$PortsNested.withNewTargetPort(java.lang.Integer)'
	at org.apache.flink.kubernetes.operator.kubeclient.factory.StandaloneKubernetesJobManagerFactoryTest.setup(StandaloneKubernetesJobManagerFactoryTest.java:75)

Error:  org.apache.flink.kubernetes.operator.kubeclient.factory.StandaloneKubernetesJobManagerFactoryTest.testTemplateMetadata  Time elapsed: 0.015 s  <<< ERROR!
java.lang.NoSuchMethodError: 'io.fabric8.kubernetes.api.model.ServicePortFluent io.fabric8.kubernetes.api.model.ServiceSpecFluent$PortsNested.withNewTargetPort(java.lang.Integer)'
	at org.apache.flink.kubernetes.operator.kubeclient.factory.StandaloneKubernetesJobManagerFactoryTest.setup(StandaloneKubernetesJobManagerFactoryTest.java:75)

Error:  org.apache.flink.kubernetes.operator.kubeclient.factory.StandaloneKubernetesJobManagerFactoryTest.testFlinkConfConfigMap  Time elapsed: 0.003 s  <<< ERROR!
java.lang.NoSuchMethodError: 'io.fabric8.kubernetes.api.model.ServicePortFluent io.fabric8.kubernetes.api.model.ServiceSpecFluent$PortsNested.withNewTargetPort(java.lang.Integer)'
	at org.apache.flink.kubernetes.operator.kubeclient.factory.StandaloneKubernetesJobManagerFactoryTest.setup(StandaloneKubernetesJobManagerFactoryTest.java:75)

@morhidi
Copy link
Contributor

morhidi commented Sep 6, 2022

Thanks @SteNicholas for the contrib! Few thoughts on this change:

  • Why do we need to bump the fabric8 client? I guess nor Flink neither JOSDK itself has made this step yet.
  • All .0 upgrades on JOSDK have brought some unexpected surprises so far :/ So, if there's no specific feature we are after from JOSDK I'd put this PR on hold at least till we release 1.2 from the Operator.

@SteNicholas
Copy link
Member Author

SteNicholas commented Sep 7, 2022

@morhidi

Why do we need to bump the fabric8 client? I guess nor Flink neither JOSDK itself has made this step yet.

I follow the bump step of JOSDK and bump the fabric8 client to 6.1.1.

All .0 upgrades on JOSDK have brought some unexpected surprises so far :/ So, if there's no specific feature we are after from JOSDK I'd put this PR on hold at least till we release 1.2 from the Operator.

Holding this PR at least till we release 1.2 makes sense to me.

@morhidi
Copy link
Contributor

morhidi commented Sep 13, 2022

Hi @SteNicholas

It'd be great if you could try to fix the CI errors, i.e. NoSuchMethod. There is also the KubernetesClientMetrics class which introduces an interceptor to be able to measure the Kubernetes API Server access that might break due to the version bump.

Thanks,
Matyas

@SteNicholas
Copy link
Member Author

SteNicholas commented Sep 15, 2022

@morhidi, @mbalassi, the java-operator-sdk released the 3.2.1 version which reverts the upgrade made in 3.2.0 to Fabric8 Client 6.1. Hence the version of the Fabric8 Client in v3.2.1 is 5.12.3. I will bump the version of the java-operator-sdk to 3.2.1 later. WDYT?

@SteNicholas SteNicholas changed the title [FLINK-28574] Bump the JOSDK version to 3.2.0 [FLINK-28574] Bump the JOSDK version to 3.2.1 Sep 15, 2022
@morhidi
Copy link
Contributor

morhidi commented Sep 15, 2022

@morhidi, @mbalassi, the java-operator-sdk released the 3.2.1 version which reverts the upgrade made in 3.2.0 to Fabric8 Client 6.1. Hence the version of the Fabric8 Client in v3.2.1 is 5.12.3. I will bump the version of the java-operator-sdk to 3.2.1 later. WDYT?

Somewhat expected :) Makes sense.

@SteNicholas SteNicholas force-pushed the FLINK-28574 branch 2 times, most recently from b09f25c to ba1a374 Compare September 16, 2022 10:31
@SteNicholas
Copy link
Member Author

SteNicholas commented Sep 17, 2022

@morhidi @mbalassi , the Java Operator SDK releases the 3.2.1 and 4.0.0 version, in which the Fabric Client version is 5.12.3 in v3.2.1 and the version is 6.1.1 in v4.0.0.

I have bumped the version of Java Operator SDK to v3.2.1 that separates executor service for workflow. PTAL.

BTW, Fabric8 Kubernetes Client 6.1.1 is most likely a breaking change, hence we could bump the version to 4.0.0 in Flink K8S operator 1.3 version. WDYT?

@@ -113,6 +113,9 @@ private void overrideOperatorConfigs(ConfigurationServiceOverrider overrider) {
} else {
LOG.info("Configuring operator with {} reconciliation threads.", parallelism);
overrider.withConcurrentReconciliationThreads(parallelism);
// feat: separate executor service for workflow
// https://github.com/java-operator-sdk/java-operator-sdk/pull/1371
overrider.withExecutorService(Executors.newFixedThreadPool(parallelism));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SteNicholas Could you please explain why we need this extra config here? Is it ok if we control it with the same config?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @csviri

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normally only overrider.withConcurrentReconciliationThreads(parallelism); if you don't want to replace the executor service with a different flavor (thus other than a fixed thread pool).

see: https://github.com/java-operator-sdk/java-operator-sdk/blob/d9e869c3bf5319163d8d884529f6b4a926bd2d85/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ConfigurationService.java#L127-L129

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csviri @morhidi, this change comes from FlinkOperatorTest.java#L54, which get the ExecutorService with default thread number 10 and the concurrentReconciliationThreads() returns the default thread number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked to @csviri offline, he'll clarify this part in a bit.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sorry, this seems to be an issue with the overrider, created a PR to fix it
operator-framework/java-operator-sdk#1484

pls for now override the service creation like above: overrider.withExecutorService

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@morhidi, like @csviri mentioned, the current way that overrides the service creation is correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csviri, could the fix be released in 3.2.2? Otherwise I will remove this line when bumping the version to 4.0.0. cc @morhidi

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SteNicholas yes, will backport it to 3.x

@morhidi
Copy link
Contributor

morhidi commented Sep 19, 2022

Hi @SteNicholas, if we stick with the current fabric8 version I don't see any problems merging this PR with the upcoming FOv1.2 release.

@SteNicholas SteNicholas requested a review from morhidi September 19, 2022 12:41
@morhidi
Copy link
Contributor

morhidi commented Sep 19, 2022

+1 LGTM

@gyfora gyfora merged commit 1a5b066 into apache:main Sep 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants