-
Notifications
You must be signed in to change notification settings - Fork 490
[FLUSS] Add Producer Offset Snapshot for Exactly-Once semantics #2434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
efb2ffe to
d34c50f
Compare
d34c50f to
864bbe6
Compare
This commit introduces the Producer Offset Snapshot feature to support exactly-once semantics in Fluss. The feature allows producers (e.g., Flink jobs) to register their offset snapshots for recovery purposes. Main changes: - Add ProducerSnapshotManager for lifecycle management of producer snapshots - Add ProducerSnapshotStore for low-level storage operations (ZK + remote FS) - Add Admin APIs: registerProducerOffsets, getProducerOffsets, deleteProducerOffsets - Add RetryUtils for IO operations with exponential backoff - Add configuration options for snapshot TTL and cleanup interval Code quality improvements: - Remove ProducerSnapshotResultCodes class, use RegisterResult enum directly - Fix RetryUtils interrupt handling to preserve thread interrupt status - Add comprehensive tests for ProducerSnapshotManager including concurrency tests - Add interrupt handling tests for RetryUtils
42bc134 to
478e8ec
Compare
wuchong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @platinumhamburg , I left some comments.
fluss-common/src/main/java/org/apache/fluss/config/ConfigOptions.java
Outdated
Show resolved
Hide resolved
fluss-client/src/main/java/org/apache/fluss/client/admin/FlussAdmin.java
Outdated
Show resolved
Hide resolved
fluss-client/src/main/java/org/apache/fluss/client/utils/ClientRpcMessageUtils.java
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorService.java
Outdated
Show resolved
Hide resolved
.../src/test/java/org/apache/fluss/server/coordinator/producer/ProducerSnapshotManagerTest.java
Outdated
Show resolved
Hide resolved
...er/src/test/java/org/apache/fluss/server/coordinator/producer/ProducerSnapshotStoreTest.java
Outdated
Show resolved
Hide resolved
...er/src/test/java/org/apache/fluss/server/coordinator/producer/ProducerSnapshotStoreTest.java
Outdated
Show resolved
Hide resolved
...rver/src/main/java/org/apache/fluss/server/coordinator/producer/ProducerSnapshotManager.java
Show resolved
Hide resolved
- Rename 'producer-snapshot' config options to 'producer-offsets' for clarity - Refactor RegisterProducerOffsetsRequest to use PbProducerTableOffsets instead of flat PbTableBucketOffset for better organization by table - Move RegisterResult enum from fluss-rpc to fluss-client as public API - Add producer ID validation using TablePath.detectInvalidName() - Improve authorization: check per-table permissions instead of cluster-level - Add version-based delete in cleanup to avoid race conditions - Use FlussPaths for consistent remote storage path generation - Add FileNotFoundException handling for non-retryable file errors - Extract RPC message conversion utilities to ClientRpcMessageUtils and ServerRpcMessageUtils - Add comprehensive unit tests for producer ID validation
Rename classes and corresponding variable declarations: - ProducerSnapshot -> ProducerOffsets - ProducerSnapshotJsonSerde -> ProducerOffsetsJsonSerde - ProducerSnapshotStore -> ProducerOffsetsStore - ProducerSnapshotManager -> ProducerOffsetsManager Update variable names in: - CoordinatorService: producerSnapshotManager -> producerOffsetsManager - ZooKeeperClient: class type references updated - ZkData: class type references updated Test classes renamed accordingly.
7a44935 to
41d103d
Compare
|
Thanks @wuchong for the detailed review. I’ve addressed all the comments above, please take another look when you have time. |
wuchong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@platinumhamburg the updated changes looks good to me in general. I left some final comments. Besides, I appended a commit to improve renaming and comments a bit.
fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorService.java
Show resolved
Hide resolved
...erver/src/main/java/org/apache/fluss/server/coordinator/producer/ProducerOffsetsManager.java
Outdated
Show resolved
Hide resolved
fluss-common/src/main/java/org/apache/fluss/rpc/messages/RegisterResult.java
Outdated
Show resolved
Hide resolved
fluss-common/src/test/java/org/apache/fluss/utils/RetryUtilsTest.java
Outdated
Show resolved
Hide resolved
- Remove duplicated RegisterResult class from fluss-common, use magic values (0=CREATED, 1=ALREADY_EXISTS) with comments aligning to client RegisterResult enum - Introduce InvalidProducerIdException (ApiException) for producer ID validation - Register InvalidProducerIdException in Errors enum (code 63) - Re-throw ApiExceptions as-is in producer offsets APIs to preserve exception types - Wrap non-ApiExceptions with UnknownServerException instead of RuntimeException - Add authorization tests for producer offsets operations in FlussAuthorizationITCase
284b6fc to
7c294fc
Compare
|
@wuchong Thanks for your detailed review and suggestions. I’ve addressed all the comments above—please take a look when you have time. |
The deleteProducerOffsets and getProducerOffsets methods were calling authorizeTable() inside CompletableFuture.supplyAsync(), which runs on a different thread (ioExecutor). Since currentSession() relies on thread-local storage, it fails with 'No session set' error when called from the async thread. Fix: - Capture session before entering async block - Add authorizeTableWithSession() method that accepts explicit session - Refactor authorizeTable(OperationType, long) to delegate to the new method, eliminating code duplication
Core changes:
Design highlights:
Tests:
Purpose
Linked issue: close #2433
Brief change log
Tests
API and Format
Documentation