Skip to content

Conversation

@yew1eb
Copy link
Contributor

@yew1eb yew1eb commented Jan 26, 2026

Which issue does this PR close?

Closes #1956

Rationale for this change

This PR prioritizes Spark 4.1 as the first supported Spark 4.x version to accelerate the Spark 4 compatibility initiative. The implementation is designed for extensibility, enabling easy addition of Spark 4.0 support later if needed. This balances rapid adoption of the latest stable Spark 4.x release with flexibility for other 4.x versions without major rework.

What changes are included in this PR?

Spark 4 API Compatibility

Servlet API Migration

Updated AuronAllExecutionsPage.scala to support both javax.servlet.http.HttpServletRequest (Spark 3.x) and jakarta.servlet.http.HttpServletRequest (Spark 4.x) via version-specific @sparkver annotations, adapting to Spark 4's migration to Jakarta EE Servlet API.

Shuffle API Changes

Adapted shuffle components to address Spark 4.x's ShuffleWriteProcessor.write API refinement (SPARK-44605), which triggers early execution of shuffle writers and breaks alignment with Spark 3.x execution logic:

  • Enhanced AuronShuffleDependency with a version-specific getInputRdd method (returns null for Spark 3.x, returns _rdd for Spark 4.x) — the exposed inputRdd field serializes the transient _rdd, allowing the _rdd to be retrieved on Executor in Spark 4.1's ShuffleWriteProcessor.write method.
  • Returned Iterator.empty in NativeRDD.compute() for NativeRDD.ShuffleWrite to defer execution to the ShuffleWriteProcessor.write() method, aligning with Spark 3.x execution logic.
  • Added a Spark 4.1-specific override of ShuffleWriteProcessor.write (which now takes Iterator[_] as its first parameter in Spark 4.x) in NativeShuffleExchangeExec: it asserts the input iterator is empty (validating adaptation logic), retrieves the RDD via AuronShuffleDependency.inputRdd, and reuses core shuffle logic through internalWrite to maintain consistency across Spark 3.x/4.x.

SparkSession Package Path Change

Addressed Spark 4.x's SparkSession package restructure:

  • Spark 3.x: org.apache.spark.sql.SparkSession → Spark 4.x: org.apache.spark.sql.classic.SparkSession
  • Updated references in NativeParquetInsertIntoHiveTableExec.scala and NativeBroadcastExchangeBase.scala

New Data Types

Added stubs for Spark 4.x's new GeographyVal/GeometryVal/VariantVal data types in columnar data structures (AuronColumnarArray.scala, AuronColumnarStruct.scala, AuronColumnarBatchRow.scala). These stubs throw UnsupportedOperationException to resolve compilation errors.

Are there any user-facing changes?

How was this patch tested?

  • Enabled Spark 4.1 in CI pipeline
  • Passed all existing Unit Tests (UT)
  • Passed all TPC-DS Integration Tests (IT)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds initial Spark 4.1 compatibility support across build tooling, shims, and execution components.

Changes:

  • Introduces a Spark 4.1 build profile + CI coverage, plus related version gating via @sparkver annotations.
  • Adapts shuffle write execution to Spark 4’s ShuffleWriteProcessor.write API change (SPARK-44605).
  • Updates Spark UI servlet integration and introduces stubs for new Spark 4.x internal row data accessors.

Reviewed changes

Copilot reviewed 47 out of 47 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/shuffle/AuronShuffleDependency.scala Exposes original input RDD for Spark 4 shuffle-write path via version-gated accessor.
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeBroadcastExchangeBase.scala Adjusts broadcast timeout/conf retrieval and Spark 4 session access in async broadcast execution.
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/columnar/AuronColumnarStruct.scala Adds Spark 4.1-only stubs for new Geography/Geometry/Variant getters.
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/columnar/AuronColumnarBatchRow.scala Adds Spark 4.1-only stubs for new Geography/Geometry/Variant getters.
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/columnar/AuronColumnarArray.scala Adds Spark 4.1-only stubs for new Geography/Geometry/Variant getters.
spark-extension/src/main/scala/org/apache/spark/sql/auron/NativeRDD.scala Defers shuffle-writer execution by returning empty iterator for shuffle-writer plans.
spark-extension/src/main/scala/org/apache/spark/sql/auron/NativeConverters.scala Removes unused pattern-bound variables in string pad conversions.
spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronConverters.scala Expands Spark-version gates to include 4.1 for several shim helpers.
spark-extension-shims-spark/src/test/scala/org/apache/spark/sql/execution/AuronAdaptiveQueryExecSuite.scala Enables suite for Spark 4.1 via @sparkverEnableMembers.
spark-extension-shims-spark/src/test/scala/org/apache/auron/BaseAuronSQLSuite.scala Disables whole-stage codegen/factory codegen in tests to avoid code size overflow.
spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronQuerySuite.scala Adjusts ORC positional evolution test logic (but introduces an unused expression).
spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronFunctionSuite.scala Removes unused parsed date value and adjusts NVL2 typing via explicit casts.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/joins/auron/plan/NativeSortMergeJoinExecProvider.scala Enables provider for Spark 4.1.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/joins/auron/plan/NativeShuffledHashJoinExecProvider.scala Enables provider for Spark 4.1.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/joins/auron/plan/NativeBroadcastJoinExec.scala Extends Spark-version gating for join exec behavior to include 4.1.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/shuffle/AuronShuffleWriter.scala Enables getPartitionLengths() override for Spark 4.1.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/shuffle/AuronShuffleManager.scala Extends Spark-version gating to include 4.1 for reader and shuffle merge finalized logic.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/shuffle/AuronRssShuffleManagerBase.scala Enables RSS shuffle reader override for Spark 4.1.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/shuffle/AuronBlockStoreShuffleReader.scala Extends fetch iterator gating to Spark 4.1.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeWindowExec.scala Enables Spark 4.1 compatibility for withNewChildInternal.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeUnionExec.scala Enables Spark 4.1 compatibility for withNewChildrenInternal.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeTakeOrderedExec.scala Enables Spark 4.1 compatibility for withNewChildInternal.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeSortExec.scala Enables Spark 4.1 compatibility for withNewChildInternal.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeShuffleExchangeExec.scala Adds Spark 4.1 ShuffleWriteProcessor.write override and exposes shuffleId for Spark 4.1.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeRenameColumnsExecProvider.scala Enables provider for Spark 4.1.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeProjectExecProvider.scala Enables provider for Spark 4.1.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativePartialTakeOrderedExec.scala Enables Spark 4.1 compatibility for withNewChildInternal.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeParquetSinkExec.scala Enables Spark 4.1 compatibility for withNewChildInternal.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeParquetInsertIntoHiveTableExec.scala Adds Spark 4.1-specific InsertIntoHiveTable wrapper and adjusts SparkSession package usage.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeLocalLimitExec.scala Enables Spark 4.1 compatibility for withNewChildInternal.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeGlobalLimitExec.scala Enables Spark 4.1 compatibility for withNewChildInternal.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeGenerateExec.scala Enables Spark 4.1 compatibility for withNewChildInternal.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeFilterExec.scala Enables Spark 4.1 compatibility for withNewChildInternal.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeExpandExec.scala Enables Spark 4.1 compatibility for withNewChildInternal.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeCollectLimitExec.scala Enables Spark 4.1 compatibility for withNewChildInternal.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeBroadcastExchangeExec.scala Enables Spark 4.1 compatibility for withNewChildInternal.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/NativeAggExec.scala Extends Spark-version gates for aggregate exec fields/methods to include 4.1.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/execution/auron/plan/ConvertToNativeExec.scala Enables Spark 4.1 compatibility for withNewChildInternal.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/auron/ShimsImpl.scala Adds shimVersion for Spark 4.1 and expands many shim methods to include 4.1.
spark-extension-shims-spark/src/main/scala/org/apache/spark/sql/auron/InterceptedValidateSparkPlan.scala Enables Spark 4.1 in validate/InvalidAQEPlanException handling shims.
pom.xml Adds Spark 4.1 build profile + enforcer checks; updates Scala 2.13 patch level; adjusts test forking settings; adds scala-xml version property.
dev/auron-it/pom.xml Adds Spark 4.1 profile + enforcer checks; updates Scala 2.13 patch level.
auron-spark-ui/src/main/scala/org/apache/spark/sql/execution/ui/AuronAllExecutionsPage.scala Adds servlet API dual support via Spark-version-gated render overloads (javax vs jakarta).
auron-spark-ui/pom.xml Adds provided scala-xml dependency and spark version annotation macros dependency for UI module.
auron-build.sh Adds Spark 4.1 to supported Spark versions list.
.github/workflows/tpcds.yml Adds Spark 4.1 TPC-DS workflow job.
.github/workflows/tpcds-reusable.yml Adjusts Spark binary naming logic for Spark 4.1 + Scala 2.13.
Comments suppressed due to low confidence (1)

spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronQuerySuite.scala:217

  • This if expression computes a Seq[Row] but the result is unused, leaving dead code and triggering unused-value warnings under -Ywarn-unused. Remove this block, or assign it to a value and actually assert/use it (e.g., use checkAnswer with the expected rows) so the test intent is clear.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yew1eb yew1eb force-pushed the AURON_1956 branch 2 times, most recently from ebdd43d to 984e744 Compare January 26, 2026 04:10
@yew1eb yew1eb marked this pull request as draft January 26, 2026 06:06
@yew1eb yew1eb force-pushed the AURON_1956 branch 2 times, most recently from 87b38c0 to e32e57d Compare January 26, 2026 08:35
@yew1eb yew1eb force-pushed the AURON_1956 branch 5 times, most recently from 2663cff to 7e9247a Compare January 26, 2026 12:46
@yew1eb yew1eb marked this pull request as ready for review January 26, 2026 13:40
@yew1eb
Copy link
Contributor Author

yew1eb commented Jan 26, 2026

@richox PTAL

@yew1eb yew1eb force-pushed the AURON_1956 branch 5 times, most recently from f4ea0e6 to 1f4897d Compare January 27, 2026 04:17
@github-actions github-actions bot removed the common label Jan 27, 2026
@yew1eb yew1eb force-pushed the AURON_1956 branch 3 times, most recently from 70f3a5c to 3e56ef9 Compare January 27, 2026 04:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Initial Spark 4.1 Compatibility Support

1 participant