-
Notifications
You must be signed in to change notification settings - Fork 89
refactor(clp): Adopt two-phase construction for clp::ReadOnlyMemoryMappedFile; Remove boost::iostreams from clp-s.
#1963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
clp::ReadOnlyMemoryMappedFile:
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughReplaces ReadOnlyMemoryMappedFile's throwing constructor with a static create(std::string_view) factory that returns ystdlib::error_handling::Result; updates callers to handle result-based mapping, switches internal ownership to optional, and removes Boost::iostreams linkage where mapping wrapper is used. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
components/core/src/clp_s/indexer/CMakeLists.txt (1)
115-116:⚠️ Potential issue | 🔴 CriticalMissing compilation units:
ReadOnlyMemoryMappedFile.cppandFileDescriptor.cppnot in indexer build.The indexer compiles
../ZstdDecompressor.cpp(line 115) and includes../../clp/streaming_compression/zstd/Decompressor.cpp(line 69), both of which depend onclp::ReadOnlyMemoryMappedFile. SinceReadOnlyMemoryMappedFile.cppcontains real implementations (mmap/munmap calls) and depends onFileDescriptor.cpp, both source files must be compiled for the indexer target.Currently, neither
../../clp/ReadOnlyMemoryMappedFile.cppnor../../clp/FileDescriptor.cppare inINDEXER_SOURCES, and the indexer does not linkclp_s::clp_dependencies(which contains these files). This will cause linker errors.Add both source files to
INDEXER_SOURCESor link the indexer againstclp_s::clp_dependencies.components/core/src/clp_s/ZstdDecompressor.cpp (1)
244-271:⚠️ Potential issue | 🟠 MajorErrno-based error handling on mmap failure looks correct.
The new
open(std::string const&)properly:
- Constructs the
ReadOnlyMemoryMappedFileviamake_unique- Checks
is_open()using thefalse ==pattern per coding guidelines- Reports the errno with
strerroron failure- Configures the input stream from
get_view()One concern: on line 248,
m_input_typeis set toMemoryMappedCompressedFilebefore the open check on line 252. Ifis_open()returnsfalse, the method returnsErrorCodeFailurewithm_input_typestill set toMemoryMappedCompressedFile, leaving the object in a partially-initialized state. A subsequent call to any otheropen()overload would throwErrorCodeNotReadyinstead of succeeding. The caller must explicitly callclose()to recover. Consider resettingm_input_typeback toNotInitializedon the error path.Proposed fix
m_memory_mapped_file = std::make_unique<clp::ReadOnlyMemoryMappedFile>(compressed_file_path); if (false == m_memory_mapped_file->is_open()) { auto const error_number{m_memory_mapped_file->get_errno()}; SPDLOG_ERROR( "ZstdDecompressor: Unable to memory map the compressed file with path: {}. errno = " "{} ({}).", compressed_file_path.c_str(), error_number, std::strerror(error_number) ); + m_memory_mapped_file.reset(); + m_input_type = InputType::NotInitialized; return ErrorCodeFailure; }
🤖 Fix all issues with AI agents
In `@components/core/src/clp_s/ZstdDecompressor.cpp`:
- Line 8: Remove the stale Boost header include
"<boost/iostreams/device/mapped_file.hpp>" from ZstdDecompressor.cpp; this
include is no longer used because memory-mapping is handled via
clp::ReadOnlyMemoryMappedFile, so delete the include line and ensure any
lingering references to boost::iostreams::mapped_file are removed or replaced
with clp::ReadOnlyMemoryMappedFile usage (check code in ZstdDecompressor class
and related functions to confirm compilation).
In `@components/core/src/clp/ReadOnlyMemoryMappedFile.cpp`:
- Around line 34-38: The log call in ReadOnlyMemoryMappedFile uses path.data()
(a std::string_view) which may not be null-terminated; change the SPDLOG_ERROR
invocation to pass the std::string_view itself (i.e., path) or explicitly
construct a std::string (e.g., std::string(path)) so the formatting treats it
safely, keeping the same message and ex.what() usage.
- Around line 14-40: The constructor currently returns early for empty files
leaving m_data==nullptr and m_errno==0 which makes is_open() report failure;
change the ReadOnlyMemoryMappedFile constructor to treat an empty file as a
successful open by explicitly marking the object open: set m_buf_size = 0,
m_data = nullptr (as now) and set a success flag (e.g. m_is_open = true) or
adjust is_open() to check an explicit "opened successfully" state instead of
m_data; ensure get_errno() remains 0 for this case and update any code paths
that rely on m_data to handle zero-length buffers (references:
ReadOnlyMemoryMappedFile::ReadOnlyMemoryMappedFile, is_open(), m_errno, m_data,
m_buf_size, get_errno()).
In `@components/core/src/clp/ReadOnlyMemoryMappedFile.hpp`:
- Around line 46-51: The current is_open() implementation (checking m_data,
m_buf_size, and m_errno) treats a successfully opened zero-byte file as not
open; decide and make semantics explicit: either update the doc comment above
is_open() to state "Returns false for empty files since mmap cannot map
zero-length regions" so callers (which reference m_buf_size) don't log
strerror(0) on empty files, or change is_open() to treat
zero-length-but-successful opens as open by using: return m_errno == 0 &&
(m_data != nullptr || m_buf_size == 0); also update callers that log errors to
check m_buf_size==0 and avoid emitting strerror(0) for empty-file cases. Ensure
references to is_open(), m_data, m_buf_size, and m_errno are adjusted
consistently.
LinZhihao-723
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're here, I'd strongly prefer to rewrite this API with result-style error handling for better maintainability. What you will need to do:
- Enable move semantics:
- Add a move constructor which swap the underlying pointer with
nullptr. - In the destructor, if the data is
nullptrthen directly return.
- Add a move constructor which swap the underlying pointer with
- Make the constructor private, and use factory function design pattern that returns a result object indicating the failure.
- For errno, you should be able to directly convert it into std::errc which is
std::error_codecompatible.
- For errno, you should be able to directly convert it into std::errc which is
- Drop
is_openmethod. - Use move semantics without introducing unique pointers.
…ole warning with dev server (fixes y-scope#1892). (y-scope#1949)
…ter startup feedback (fixes y-scope#1495). (y-scope#1940)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@components/core/src/clp/ReadOnlyMemoryMappedFile.cpp`:
- Around line 32-34: The catch block in ReadOnlyMemoryMappedFile.cpp currently
reads errno during stack unwinding (catching TraceableException) which may be
stale; change the flow so FileDescriptor::OperationFailed captures errno at
throw-time (e.g., store the errno value inside the exception object or
ErrorCode_errno payload) and then, in the catch handler for TraceableException
in ReadOnlyMemoryMappedFile (and any similar handlers), use the stored errno
value from the exception (accessor on TraceableException or OperationFailed) to
construct the std::error_code instead of reading the global errno; update the
OperationFailed constructor to accept/capture ::errno and expose it via a method
or member that the catch site can call.
In `@components/core/src/clp/ReadOnlyMemoryMappedFile.hpp`:
- Around line 40-46: The move-assignment operator for ReadOnlyMemoryMappedFile
currently overwrites m_data and m_buf_size without unmapping the existing
mapping, leaking the mmap; update operator=(ReadOnlyMemoryMappedFile&&) to first
check if m_data != nullptr and call munmap(m_data, m_buf_size) (or the class's
unmap helper) on the current object before taking ownership from rhs, then move
m_data and m_buf_size using std::exchange as before; ensure munmap is available
by adding `#include` <sys/mman.h> to the header or move the operator
implementation into the .cpp where munmap is already included.
In `@components/core/src/clp/streaming_compression/zstd/Decompressor.cpp`:
- Around line 178-192: The SPDLOG_ERROR call inside Decompressor::create (the
block that handles ReadOnlyMemoryMappedFile::create failure where
m_memory_mapped_file is set and file_view is obtained) uses the prefix
"ZstdDecompressor"; update that log message to use the consistent prefix
"streaming_compression::zstd::Decompressor" to match other logs in this file
(e.g., the messages near lines with SPDLOG_ERROR/SPDLOG_WARN at the top of
Decompressor) so all log entries reference
streaming_compression::zstd::Decompressor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@components/core/src/clp/ReadOnlyMemoryMappedFile.hpp`:
- Around line 34-37: The [[nodiscard]] attribute on the deleted copy-assignment
operator is redundant; remove the attribute so the declaration reads simply
"auto operator=(ReadOnlyMemoryMappedFile const&) -> ReadOnlyMemoryMappedFile& =
delete;" (or even "ReadOnlyMemoryMappedFile& operator=(ReadOnlyMemoryMappedFile
const&) = delete;") next to the already-deleted copy constructor in the
ReadOnlyMemoryMappedFile class to clean up dead code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@components/core/src/clp/ReadOnlyMemoryMappedFile.hpp`:
- Around line 43-53: The move-assignment operator in ReadOnlyMemoryMappedFile
uses munmap but the header lacks `#include` <sys/mman.h>, causing compilation when
included transitively; either add `#include` <sys/mman.h> to this header or move
the operator=(ReadOnlyMemoryMappedFile&&) implementation into the .cpp where
sys/mman.h is already included, and while updating, make the null check style
consistent by using nullptr != m_data (instead of m_data != nullptr) and keep
the uses of m_data and m_buf_size intact when calling munmap and exchanging
values from rhs.
clp::ReadOnlyMemoryMappedFile:clp::ReadOnlyMemoryMappedFile; Remove boost::iostreams from clp-s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/core/src/clp_s/ZstdDecompressor.cpp (1)
243-269:⚠️ Potential issue | 🟡 Minor
m_input_typeis set beforecreate()— failed open leaves the object in a partially-initialized state.If
ReadOnlyMemoryMappedFile::createfails (line 251),m_input_typeis already set toMemoryMappedCompressedFile(line 247), but the function returnsErrorCodeFailurewithout resetting it. A subsequent call toopen()will throwErrorCodeNotReadybecausem_input_type != NotInitialized. The caller must explicitlyclose()before retrying.This same pattern exists in
clp::streaming_compression::zstd::Decompressor::open(line 175 of that file), so it's a pre-existing convention. However, resettingm_input_typeon the error path would make the API more forgiving:Proposed fix (applies to both decompressors)
if (result.has_error()) { SPDLOG_ERROR( "ZstdDecompressor: Unable to memory map the compressed file with path: {}. Error: " "{}.", compressed_file_path.c_str(), result.error().message() ); + m_input_type = InputType::NotInitialized; return ErrorCodeFailure; }
| // TODO: Rewrite `FileDescriptor` constructor and `get_size()` | ||
| // with `ystdlib::error_handling::result` for capturing errno at the point of failure and | ||
| // better error propagation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai
Create an issue for FileDescriptor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
@Bill-hbrhbr, I'll create an issue for the FileDescriptor refactoring.
🏁 Script executed:
#!/bin/bash
# Create GitHub issue for FileDescriptor refactoring
gh issue create \
--repo y-scope/clp \
--title "Refactor FileDescriptor to use error_handling::Result instead of exceptions" \
--body "## Description
Refactor \`FileDescriptor\` constructor and \`get_size()\` method to use \`ystdlib::error_handling::Result\` instead of throwing exceptions. This will enable capturing errno at the point of failure and provide better error propagation.
## Context
This refactoring was identified during the removal of Boost dependencies from clp-s (PR `#1963`). The current exception-based approach requires catching and converting to error codes, losing errno context.
## Requested by
`@Bill-hbrhbr`
## References
- PR: https://github.com/y-scope/clp/pull/1963
- Comment: https://github.com/y-scope/clp/pull/1963#discussion_r2790740039" \
--assignee Bill-hbrhbr
Length of output: 42
@Bill-hbrhbr, I've created issue #1971 for the FileDescriptor refactoring. The issue describes the need to replace exception-based error handling with ystdlib::error_handling::Result for better errno capture and error propagation, and includes references back to this PR and comment.
🧬 Code Graph Analysis Results
### File: `components/core/src/clp/ReadOnlyMemoryMappedFile.hpp`<possibly_relevant_code_snippets>
Snippet 1 (lines 30-31)
create(std::string_view path)
-> ystdlib::error_handling::Result<ReadOnlyMemoryMappedFile>
</possibly_relevant_code_snippets>
<possibly_relevant_code_snippets>
Snippet 2 (lines 34-34)
~ReadOnlyMemoryMappedFile()
</possibly_relevant_code_snippets>
<possibly_relevant_code_snippets>
Snippet 3 (lines 37-37)
ReadOnlyMemoryMappedFile(ReadOnlyMemoryMappedFile const&) = delete;
</possibly_relevant_code_snippets>
<possibly_relevant_code_snippets>
Snippet 4 (lines 41-43)
ReadOnlyMemoryMappedFile(ReadOnlyMemoryMappedFile&& rhs) noexcept
: m_data{std::exchange(rhs.m_data, nullptr)},
m_buf_size{std::exchange(rhs.m_buf_size, 0)} {}
</possibly_relevant_code_snippets>
<possibly_relevant_code_snippets>
Snippet 5 (lines 69-69)
ReadOnlyMemoryMappedFile(void* data, size_t buf_size) : m_data{data}, m_buf_size{buf_size} {}
</possibly_relevant_code_snippets>
🧠 Learnings used
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Learnt from: haiqi96
Repo: y-scope/clp PR: 523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1405
File: components/clp-package-utils/pyproject.toml:5-15
Timestamp: 2025-10-13T03:24:35.074Z
Learning: In the y-scope/clp repository, the Python 3.9 to 3.10 version requirement change was intentionally deferred to a separate PR (after PR `#1405`) to reduce review effort, as decided in an offline discussion between junhaoliao and kirkrodrigues.
| [[nodiscard]] auto operator=(ReadOnlyMemoryMappedFile&& rhs) noexcept | ||
| -> ReadOnlyMemoryMappedFile& { | ||
| if (this != &rhs) { | ||
| if (m_data != nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (m_data != nullptr) { | |
| if (nullptr != m_data) { |
| return ReadOnlyMemoryMappedFile{nullptr, file_size}; | ||
| } | ||
|
|
||
| // TODO: optionally provide a hint for the mmap location |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't add any random TODO without an issue to track it.
Is this important? I think we should either drop it and add it when we need, or create an issue for it.
|
|
||
| return ReadOnlyMemoryMappedFile{mmap_ptr, file_size}; | ||
| } catch (TraceableException const& ex) { | ||
| // TODO: Rewrite `FileDescriptor` constructor and `get_size()` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, need an issue to track this.
| // TODO: optionally provide a hint for the mmap location | ||
| auto* mmap_ptr{mmap(nullptr, file_size, PROT_READ, MAP_PRIVATE, fd.get_raw_fd(), 0)}; | ||
| if (MAP_FAILED == mmap_ptr) { | ||
| return std::error_code(errno, std::system_category()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As indicated in the previous comment, there's no need to construct std::error_code explicitly this way. Instead, we should use std::errc.
| return std::error_code(errno, std::system_category()); | |
| return static_cast<std::errc>(errno); |
| // TODO: Rewrite `FileDescriptor` constructor and `get_size()` | ||
| // with `ystdlib::error_handling::result` for capturing errno at the point of failure and | ||
| // better error propagation. | ||
| return std::error_code(errno, std::system_category()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly.
| return std::error_code(errno, std::system_category()); | |
| return static_cast<std::errc>(errno); |
| : m_data{std::exchange(rhs.m_data, nullptr)}, | ||
| m_buf_size{std::exchange(rhs.m_buf_size, 0)} {} | ||
|
|
||
| [[nodiscard]] auto operator=(ReadOnlyMemoryMappedFile&& rhs) noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to let u know: u don't need to implement this method. To enable the move semantics we need, the move constructor is probably already sufficient.
| // Create read-only memory mapping for compressed_file_path | ||
| m_memory_mapped_file = std::make_unique<ReadOnlyMemoryMappedFile>(compressed_file_path); | ||
| auto const file_view{m_memory_mapped_file->get_view()}; | ||
| auto result{clp::ReadOnlyMemoryMappedFile::create(compressed_file_path)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| auto result{clp::ReadOnlyMemoryMappedFile::create(compressed_file_path)}; | |
| auto result{ReadOnlyMemoryMappedFile::create(compressed_file_path)}; |
| SPDLOG_ERROR( | ||
| "streaming_compression::zstd::Decompressor: Unable to memory map the compressed " | ||
| "file with path: {}. Error: {}", | ||
| compressed_file_path.c_str(), | ||
| result.error().message() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| SPDLOG_ERROR( | |
| "streaming_compression::zstd::Decompressor: Unable to memory map the compressed " | |
| "file with path: {}. Error: {}", | |
| compressed_file_path.c_str(), | |
| result.error().message() | |
| auto const error{result.error()}; | |
| SPDLOG_ERROR( | |
| "streaming_compression::zstd::Decompressor: Unable to memory map the compressed " | |
| "file with path: {}. Error: {} - {}", | |
| compressed_file_path.c_str(), | |
| error.category().name(), | |
| error.message() |
The convention is to print both the error category name and the error message.
| ); | ||
| return ErrorCode_Failure; | ||
| } | ||
| m_memory_mapped_file = std::move(result.value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| m_memory_mapped_file = std::move(result.value()); | |
| m_memory_mapped_file.emplace(std::move(result.value())); |
Prefer to use emplace to enforce move construction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mirror changes in clp side.
Description
This PR follows #445 and #450.
The
clp-ffi-jsproject will need to supportclp-scode in the near future. Removingboost::iostreamsfromclp-swill help simplify the dependency resolution on the ffi side.However,
clp::ReadOnlyMemoryMappedFileusedclp::TraceableExceptionandclp::ErrorCodeinstead of theclp-sversions, resulting in unnecessary complexities when implementing error handling. Hence, we switch to the preferred rust-style error handling withystdlib::error_handling::Resultforclp::ReadOnlyMemoryMappedFile.Checklist
breaking change.
Validation performed
Summary by CodeRabbit
Refactor
API
Tests