-
Notifications
You must be signed in to change notification settings - Fork 113
feat: add centralized LakeFS error handling for multipart upload and dataset version operations #4177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| val commit = withLakeFSErrorHandling { | ||
| LakeFSStorageClient.createCommit( | ||
| repoName = repositoryName, | ||
| branch = "main", | ||
| commitMessage = s"Created dataset version: $newVersionName" | ||
| ) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xuang7 There is another approach with scala implicit, it will look like the next code snippet, in your preference, is this cleaner?
val commit =
LakeFSStorageClient.safeCall(_.createCommit(
repoName = repositoryName,
branch = "main",
commitMessage = s"Created dataset version: $newVersionName"
))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. I currently centralized the LakeFS exception→HTTP mapping in the file-service (LakeFSExceptionHandler). The safeCall approach also looks clean and would centralize handling at the storage layer near LakeFSStorageClient. Would you recommend switching to LakeFSStorageClient.safeCall(...) so all callers share the same handler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep this approach for now.
It would be good to open a discussion, and would be good if @Ma77Ball can give his input since he is working in logging. Here is the things to discuss later:
I think this PR is really about how we want error handling to work across the services calls (LakeFS now, others later): do we force callers to deal with failures, or do we centralize + make it easy/consistent.
Option 1: change defs and force handling (typed)
Return Either[StorageError, A] (or F[Either[...]]). Caller can’t ignore it.
storage.commit(repo, branch, msg): Either[StorageError, CommitId]
storage.uploadPart(key, part, bytes): Either[StorageError, Unit]
// usage
storage.commit(repo, branch, msg).map(Ok(_)).leftMap(mapToHttp)Pros: type-safe + enforced. Cons: signature churn + lots of callsite updates + need Error mapping.
Option 2: implicit + Dynamic .safe wrapper (ergonomic, but risky)
This option adds automatic logs
Nice callsite but reflection + weak typing (Any) + possible runtime method lookup issues. Notice this method does not require any changes to the def like commit, or uploadPart but it is optional to the caller to use .safe and
Changing the return type (2.1)
lakefs.safe.commit(repo, branch, msg) // Either[Throwable, Any]
datasetResource.safe.uploadPart(key, part, bs) // Either[Throwable, Any]Keeping the return type and throwing (2.2)
lakefs.safe.commit(repo, branch, msg)
datasetResource.safe.uploadPart(key, part, bs)
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala
Outdated
Show resolved
Hide resolved
| case 416 => new WebApplicationException(errorResponse(416)) | ||
| case 420 => new WebApplicationException(errorResponse(420)) | ||
| case _ => new InternalServerErrorException(errorResponse(500)) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the codes be extracted dynamically from code (Maybe from LakeFs) so for future changes from their side there is no need to maintain this piece of code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. For now we're using a small fallback map for the common LakeFS status codes, which are fairly stable. Please let me know if you’d recommend dynamically extracting codes from the ApiException instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be good if possible, if not this approach is good enough. Thanks.
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala
Outdated
Show resolved
Hide resolved
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala
Outdated
Show resolved
Hide resolved
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala
Outdated
Show resolved
Hide resolved
|
LGTM. |
What changes were proposed in this PR?
This PR adds centralized error handling for LakeFS API calls in
DatasetResource, covering multipart upload and dataset version operations. This improves error visibility by returning meaningful HTTP status codes and error messages to the frontend. Currently, the wrapper is applied to these specific calls only, and has not yet been applied to all LakeFS calls.Changes
ApiExceptionto HTTP exception (400, 401, 403, 404, 409, 410, 412, 416, 420, 500)createDatasetVersion,getDatasetDiff,initMultipartUpload,finishMultipartUpload,abortMultipartUploadAny related issues, documentation, discussions?
Related to #4176
How was this PR tested?
DatasetResourceSpecfor error message response and 500 internal error handlingWas this PR authored or co-authored using generative AI tooling?
Generated-by: Claude (Anthropic)