Restructure Spark Project Cross Compilation

The way that ES-Hadoop currently handles cross compiling the Spark integration for various versions of Scala has reached a point where it needs to be reconsidered. Since code compiled with one major version of Scala is not binary compatible with a Scala runtime on a different major version, most Scala based projects must re-compile and release separate artifacts for each supported version of Scala. 

This process of cross compilation is supported natively on SBT, but since this project is based first and foremost in Java and makes extensive use of the Elasticsearch project's testing facilities by ways of Gradle plugins, converting to SBT in order to fix the problems we are seeing is not an option.

The [current process for cross compiling](https://github.com/elastic/elasticsearch-hadoop/blob/v7.5.2/buildSrc/src/main/groovy/org/elasticsearch/hadoop/gradle/scala/ScalaVariantPlugin.groovy) our Scala libraries in the project is to make use of an in-house gradle plugin that recursively launches the gradle build with a different version of Scala specified. The child build process performs the variant assembly, taking care to rename artifacts and the like as needed. 

We've run into a number of problems with this process though:

1. With a recursive build Gradle cannot apply build optimizations in a normal fashion.
2. As core Gradle logic around project configuration changes, our cross compile plugin logic has broken, caused maintenance issues, and delayed upgrades.
3. As Spark and Scala release new supported variants on very different release schedules, the versions that ES-Hadoop support begin to diverge. For example, Spark 2.x does not support Scala 2.10 any more, and Spark 1.6 does not support Scala 2.12. The direction we want to take the project's structure would make our cross compile process incompatible with supporting new versions of Spark that diverge in their Scala support outside of major releases.
4. Testing the different variants requires a complicated array of CI configurations.

A potential solution to this issue that we are actively investigating is the usage of [officially supported Gradle variant artifacts](https://docs.gradle.org/current/userguide/feature_variants.html) to organize the build logic. By using Gradle's variant system we solve the following problems:

1. All variants are built with one execution of the Gradle command, which allows more build optimizations to be applied.
2. Since the variant configuration is officially supported (unlike nested and recursive builds) we are better insulated to breakages when trying to upgrade.
3. We can correctly model the project as it always should have been while still supporting new Spark and Scala versions that diverge from the earlier supported versions of both.
4. We can test all the variants in one build.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Restructure Spark Project Cross Compilation #1423

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Restructure Spark Project Cross Compilation #1423

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions