Skip to content

IGNITE-27460 Add schema compatibility validation for full commands#7598

Open
rpuch wants to merge 11 commits intoapache:mainfrom
gridgain:ignite-27460-3
Open

IGNITE-27460 Add schema compatibility validation for full commands#7598
rpuch wants to merge 11 commits intoapache:mainfrom
gridgain:ignite-27460-3

Conversation

@rpuch
Copy link
Contributor

@rpuch rpuch commented Feb 16, 2026

https://issues.apache.org/jira/browse/IGNITE-27460

  • Introduce a mechanism to validate command+safeTime in Raft
  • this happens after we assign a safeTime to a command, but before it gets saved to log on the leader (and hence before it gets replicated)
  • this allows to request a retry of the same command by the Raft client (if the failure is temporary, like insufficient schemas information on the node) or instruct it to fail the command if it will never become valid
  • Introduce full commands validation wrt schema compatibility
  • do the validation in the Raft extension point introduced above (as safeTime becomes commitTs for full updates)
  • if the node lacks schema information, fail the validation requesting a retry from the Raft client
  • if the commitTs is invalid, return the result to Replica listener
  • in Replica listener, handle the failure by retrying the replica request (using updated schema)

What I tried before

  1. In IGNITE-27460 Add schema compatibility validation for full commands #7500, I attempted to just perform the validation in the Raft state machine. As the validation requires a schema sync and might incur waiting, I was just blocking on the future. But this creates a lot of problems when trying to stop the node as JRaft seems to be designed around commands execution of which is always bounded (wrt time). When we block on execution of a command that blocks for an indefinite amount of time, we are in trouble
  2. In IGNITE-27460 Add schema compatibility validation for full commands #7584, the approach was different:
  • Still validate in Raft state machine while executing the command
  • With external mechanisms, make sure that a command only gets executed when we have enough schema information (and hence the validation future will be completed immediately)
  • But this makes the construct fragile. As the command either applies or does not apply the write conditionally, all executions of the same command, including reapplications of the same command on the same machine, and applications on different replicas, must always yield the same result (either apply the write or not). This is difficult as the application is conditioned on a condition lying outside of the state machine.
  • Also, this creates troubles with upgrade. Commands saved in the old version (where they were unconditional) might get reapplied on the new version (where they are conditional) potentially causing different results on different nodes

In this PR (approach number 3), the commands are unconditional as before, and we make the decision 'to replicate the command or not' on just one node (on the leader) once.

@rpuch rpuch force-pushed the ignite-27460-3 branch 3 times, most recently from fdf9a30 to b65f713 Compare February 16, 2026 11:07
@rpuch rpuch requested a review from Copilot February 16, 2026 12:33
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements schema compatibility validation for full (1PC) transaction commands in the Raft replication layer. The approach validates commands after safe time assignment but before they are added to the Raft log on the leader, preventing inconsistencies that could arise from forward-incompatible schema changes.

Changes:

  • Introduces a SafeTimeValidator extension point in Raft that validates commands before replication, with support for both temporary retry (EBUSY) and permanent rejection (EREJECTED_BY_USER_LOGIC)
  • Implements PartitionSafeTimeValidator to validate full update commands against schema compatibility rules, ensuring commit timestamps are valid for the schema versions used
  • Updates PartitionReplicaListener to handle validation failures by converting EREJECTED_BY_USER_LOGIC errors to IncompatibleSchemaVersionException
  • Adds comprehensive integration tests verifying that schema version consistency is maintained across nodes during concurrent schema changes and writes

Reviewed changes

Copilot reviewed 44 out of 44 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
modules/raft/src/main/java/org/apache/ignite/raft/jraft/option/SafeTimeValidator.java New interface for safe time validation in Raft
modules/raft/src/main/java/org/apache/ignite/raft/jraft/option/SafeTimeValidationResult.java New class representing validation results (valid, retry, or rejected)
modules/raft/src/main/java/org/apache/ignite/raft/jraft/option/PermissiveSafeTimeValidator.java Default no-op validator implementation
modules/raft/src/main/java/org/apache/ignite/raft/jraft/core/NodeImpl.java Integration of validation in Raft leader's executeApplyingTasks
modules/raft/src/main/java/org/apache/ignite/raft/jraft/error/RaftError.java New EREJECTED_BY_USER_LOGIC error code for permanent rejections
modules/table/src/main/java/org/apache/ignite/internal/table/distributed/raft/PartitionSafeTimeValidator.java Partition-specific validator implementation
modules/table/src/main/java/org/apache/ignite/internal/table/distributed/replicator/PartitionReplicaListener.java Error handling for validation failures
modules/partition-replicator/src/main/java/org/apache/ignite/internal/partition/replicator/network/command/UpdateCommandBase.java New base interface for update commands
modules/partition-replicator/src/main/java/org/apache/ignite/internal/partition/replicator/schemacompat/CompatValidationResult.java Added validationFailedMessage() method
modules/runner/src/integrationTest/java/org/apache/ignite/internal/schemasync/ItSchemaForwardCompatibilityConsistencyTest.java New integration test verifying schema consistency
modules/runner/src/integrationTest/java/org/apache/ignite/internal/schemasync/ItBlockedSchemaSyncAndRaftCommandExecutionTest.java Test verifying node stop behavior with blocked schema sync
Comments suppressed due to low confidence (1)

modules/table/src/main/java/org/apache/ignite/internal/table/distributed/schema/MetadataSufficiency.java:44

  • The JavaDoc for isMetadataAvailableForTimestamp incorrectly says "Determines whether the local Catalog version is sufficient" when it should say something like "Determines whether the local schema metadata is sufficient for the given timestamp".

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

* Introduce a mechanism to validate command+safeTime in Raft
 - this happens after we assign a safeTime to a command, but before it gets saved to log on the leader (and hence before it gets replicated)
 - this allows to request a retry of the same command by the Raft client (if the failure is temporary, like insufficient schemas information on the node) or instruct it to fail the command if it will never become valid
* Introduce full commands validation wrt schema compatibility
 - do the validation in the Raft extension point introduced above (as safeTime becomes commitTs for full updates)
 - if the node lacks schema information, fail the validation requesting a retry from the Raft client
 - if the commitTs is invalid, return the result to Replica listener
 - in Replica listener, handle the failure by retrying the replica request (using updated schema)
* Returns error message corresponding to validation failure. Should only be called for a failed validation result, otherwise an
* assertion error may be thrown.
*/
public String validationFailedMessage() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we instead provide the exception itself:
public Exception validationException()
This implies better incapsulation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different exceptions can be constructed with this error message. Also, the Raft safe time validator doesn't need the exception itself, just the error message to construct a response message. So I think error message is more suitable here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments