-
Notifications
You must be signed in to change notification settings - Fork 205
[AURON #1956] Add initial compatibility support for Spark 4.1 (UT/CI Pass) #1958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds initial Spark 4.1 compatibility support across build tooling, shims, and execution components.
Changes:
- Introduces a Spark 4.1 build profile + CI coverage, plus related version gating via
@sparkverannotations. - Adapts shuffle write execution to Spark 4’s
ShuffleWriteProcessor.writeAPI change (SPARK-44605). - Updates Spark UI servlet integration and introduces stubs for new Spark 4.x internal row data accessors.
Reviewed changes
Copilot reviewed 47 out of 47 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/shuffle/AuronShuffleDependency.scala | Exposes original input RDD for Spark 4 shuffle-write path via version-gated accessor. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeBroadcastExchangeBase.scala | Adjusts broadcast timeout/conf retrieval and Spark 4 session access in async broadcast execution. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/columnar/AuronColumnarStruct.scala | Adds Spark 4.1-only stubs for new Geography/Geometry/Variant getters. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/columnar/AuronColumnarBatchRow.scala | Adds Spark 4.1-only stubs for new Geography/Geometry/Variant getters. |
| spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/columnar/AuronColumnarArray.scala | Adds Spark 4.1-only stubs for new Geography/Geometry/Variant getters. |
| spark-extension/src/main/scala/org/apache/spark/sql/auron/NativeRDD.scala | Defers shuffle-writer execution by returning empty iterator for shuffle-writer plans. |
| spark-extension/src/main/scala/org/apache/spark/sql/auron/NativeConverters.scala | Removes unused pattern-bound variables in string pad conversions. |
| spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronConverters.scala | Expands Spark-version gates to include 4.1 for several shim helpers. |
| spark-extension-shims-spark/src/test/scala/org/apache/spark/sql/execution/AuronAdaptiveQueryExecSuite.scala | Enables suite for Spark 4.1 via @sparkverEnableMembers. |
| spark-extension-shims-spark/src/test/scala/org/apache/auron/BaseAuronSQLSuite.scala | Disables whole-stage codegen/factory codegen in tests to avoid code size overflow. |
| spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronQuerySuite.scala | Adjusts ORC positional evolution test logic (but introduces an unused expression). |
| spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronFunctionSuite.scala | Removes unused parsed date value and adjusts NVL2 typing via explicit casts. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/joins/auron/plan/NativeSortMergeJoinExecProvider.scala | Enables provider for Spark 4.1. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/joins/auron/plan/NativeShuffledHashJoinExecProvider.scala | Enables provider for Spark 4.1. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/joins/auron/plan/NativeBroadcastJoinExec.scala | Extends Spark-version gating for join exec behavior to include 4.1. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/shuffle/AuronShuffleWriter.scala | Enables getPartitionLengths() override for Spark 4.1. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/shuffle/AuronShuffleManager.scala | Extends Spark-version gating to include 4.1 for reader and shuffle merge finalized logic. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/shuffle/AuronRssShuffleManagerBase.scala | Enables RSS shuffle reader override for Spark 4.1. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/shuffle/AuronBlockStoreShuffleReader.scala | Extends fetch iterator gating to Spark 4.1. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeWindowExec.scala | Enables Spark 4.1 compatibility for withNewChildInternal. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeUnionExec.scala | Enables Spark 4.1 compatibility for withNewChildrenInternal. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeTakeOrderedExec.scala | Enables Spark 4.1 compatibility for withNewChildInternal. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeSortExec.scala | Enables Spark 4.1 compatibility for withNewChildInternal. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeShuffleExchangeExec.scala | Adds Spark 4.1 ShuffleWriteProcessor.write override and exposes shuffleId for Spark 4.1. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeRenameColumnsExecProvider.scala | Enables provider for Spark 4.1. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeProjectExecProvider.scala | Enables provider for Spark 4.1. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativePartialTakeOrderedExec.scala | Enables Spark 4.1 compatibility for withNewChildInternal. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeParquetSinkExec.scala | Enables Spark 4.1 compatibility for withNewChildInternal. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeParquetInsertIntoHiveTableExec.scala | Adds Spark 4.1-specific InsertIntoHiveTable wrapper and adjusts SparkSession package usage. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeLocalLimitExec.scala | Enables Spark 4.1 compatibility for withNewChildInternal. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeGlobalLimitExec.scala | Enables Spark 4.1 compatibility for withNewChildInternal. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeGenerateExec.scala | Enables Spark 4.1 compatibility for withNewChildInternal. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeFilterExec.scala | Enables Spark 4.1 compatibility for withNewChildInternal. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeExpandExec.scala | Enables Spark 4.1 compatibility for withNewChildInternal. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeCollectLimitExec.scala | Enables Spark 4.1 compatibility for withNewChildInternal. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeBroadcastExchangeExec.scala | Enables Spark 4.1 compatibility for withNewChildInternal. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeAggExec.scala | Extends Spark-version gates for aggregate exec fields/methods to include 4.1. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/ConvertToNativeExec.scala | Enables Spark 4.1 compatibility for withNewChildInternal. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/auron/ShimsImpl.scala | Adds shimVersion for Spark 4.1 and expands many shim methods to include 4.1. |
| spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/auron/InterceptedValidateSparkPlan.scala | Enables Spark 4.1 in validate/InvalidAQEPlanException handling shims. |
| pom.xml | Adds Spark 4.1 build profile + enforcer checks; updates Scala 2.13 patch level; adjusts test forking settings; adds scala-xml version property. |
| dev/auron-it/pom.xml | Adds Spark 4.1 profile + enforcer checks; updates Scala 2.13 patch level. |
| auron-spark-ui/src/main/scala/org/apache/spark/sql/execution/ui/AuronAllExecutionsPage.scala | Adds servlet API dual support via Spark-version-gated render overloads (javax vs jakarta). |
| auron-spark-ui/pom.xml | Adds provided scala-xml dependency and spark version annotation macros dependency for UI module. |
| auron-build.sh | Adds Spark 4.1 to supported Spark versions list. |
| .github/workflows/tpcds.yml | Adds Spark 4.1 TPC-DS workflow job. |
| .github/workflows/tpcds-reusable.yml | Adjusts Spark binary naming logic for Spark 4.1 + Scala 2.13. |
Comments suppressed due to low confidence (1)
spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronQuerySuite.scala:217
- This
ifexpression computes aSeq[Row]but the result is unused, leaving dead code and triggering unused-value warnings under-Ywarn-unused. Remove this block, or assign it to a value and actually assert/use it (e.g., usecheckAnswerwith the expected rows) so the test intent is clear.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...ark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeShuffleExchangeExec.scala
Show resolved
Hide resolved
ebdd43d to
984e744
Compare
87b38c0 to
e32e57d
Compare
2663cff to
7e9247a
Compare
|
@richox PTAL |
f4ea0e6 to
1f4897d
Compare
70f3a5c to
3e56ef9
Compare
Which issue does this PR close?
Closes #1956
Rationale for this change
This PR prioritizes Spark 4.1 as the first supported Spark 4.x version to accelerate the Spark 4 compatibility initiative. The implementation is designed for extensibility, enabling easy addition of Spark 4.0 support later if needed. This balances rapid adoption of the latest stable Spark 4.x release with flexibility for other 4.x versions without major rework.
What changes are included in this PR?
Spark 4 API Compatibility
Servlet API Migration
Updated
AuronAllExecutionsPage.scalato support bothjavax.servlet.http.HttpServletRequest(Spark 3.x) andjakarta.servlet.http.HttpServletRequest(Spark 4.x) via version-specific@sparkverannotations, adapting to Spark 4's migration to Jakarta EE Servlet API.Shuffle API Changes
Adapted shuffle components to address Spark 4.x's
ShuffleWriteProcessor.writeAPI refinement (SPARK-44605), which triggers early execution of shuffle writers and breaks alignment with Spark 3.x execution logic:AuronShuffleDependencywith a version-specificgetInputRddmethod (returnsnullfor Spark 3.x, returns_rddfor Spark 4.x) — the exposedinputRddfield serializes the transient_rdd, allowing the_rddto be retrieved onExecutorin Spark 4.1'sShuffleWriteProcessor.writemethod.Iterator.emptyinNativeRDD.compute()forNativeRDD.ShuffleWriteto defer execution to theShuffleWriteProcessor.write()method, aligning with Spark 3.x execution logic.ShuffleWriteProcessor.write(which now takesIterator[_]as its first parameter in Spark 4.x) inNativeShuffleExchangeExec: it asserts the input iterator is empty (validating adaptation logic), retrieves the RDD viaAuronShuffleDependency.inputRdd, and reuses core shuffle logic throughinternalWriteto maintain consistency across Spark 3.x/4.x.SparkSession Package Path Change
Addressed Spark 4.x's SparkSession package restructure:
New Data Types
Added stubs for Spark 4.x's new
GeographyVal/GeometryVal/VariantValdata types in columnar data structures (AuronColumnarArray.scala,AuronColumnarStruct.scala,AuronColumnarBatchRow.scala). These stubs throwUnsupportedOperationExceptionto resolve compilation errors.Are there any user-facing changes?
How was this patch tested?