-
Notifications
You must be signed in to change notification settings - Fork 0
Fix thread safety in SerializationProxy #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: claude/dev-ergonomics-improvements-011CUVRN18PCYh3xd7J7c1td
Are you sure you want to change the base?
Fix thread safety in SerializationProxy #8
Conversation
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Comprehensive thread-safety improvements to prevent race conditions: 1. Global schema cache protection: - Added RLock for _wrapped_schema_cache - Store (original_schema, wrapped_schema) tuples to prevent id() reuse bugs - Verify schema identity on cache hits 2. Class-level proxy type cache protection: - Added RLock for _proxy_type_cache - Guard against id() reuse with identity checks - Double-check locking pattern to prevent duplicate proxy type creation 3. Per-instance attribute cache protection: - Added RLock for _attr_cache in each proxy instance - Protected __getattr__ and __getitem__ cache access - Use re-entrant locks to prevent deadlocks from __repr__ 4. Serialized data immutability: - Wrap dict results in MappingProxyType in build() - Prevents accidental mutation from multiple threads All OrderedDict operations (move_to_end, popitem) are now protected by appropriate locks. Critical sections minimized by building schemas and proxy types outside locks when possible. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Summary of ChangesHello @srnnkls, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the thread safety of the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request does a great job of adding thread safety to the SerializationProxy by introducing locks for shared caches. The changes correctly address potential race conditions during cache access and even include a guard against id() reuse, which is excellent. The pattern of minimizing the critical section by performing expensive operations outside the lock is also well-applied. My review includes a few suggestions to further refine the double-checked locking pattern in a few places. These changes will prevent redundant work in highly concurrent scenarios and ensure the implementation is both correct and efficient, consistent with the robust pattern already used in the _build method.
| with _WRAPPED_SCHEMA_CACHE_LOCK: | ||
| _wrapped_schema_cache[schema_id] = (schema, wrapped_schema) | ||
| _wrapped_schema_cache.move_to_end(schema_id) | ||
|
|
||
| # Evict oldest entry if cache is too large | ||
| if len(_wrapped_schema_cache) > _WRAPPED_SCHEMA_CACHE_SIZE: | ||
| _wrapped_schema_cache.popitem(last=False) | ||
| # Evict oldest entry if cache is too large | ||
| if len(_wrapped_schema_cache) > _WRAPPED_SCHEMA_CACHE_SIZE: | ||
| _wrapped_schema_cache.popitem(last=False) | ||
|
|
||
| return wrapped_schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While adding a lock here prevents race conditions, this implementation of double-checked locking can be made more efficient. If two threads call this function with the same uncached schema, both will perform the expensive schema wrapping. The second thread to acquire the lock will simply overwrite the result of the first.
To avoid this redundant work, you can re-check the cache after acquiring the lock for writing. If another thread has populated the cache in the meantime, you can use its result and discard your own. This pattern is already correctly implemented in the _build method. Applying it here would improve efficiency and consistency.
| with _WRAPPED_SCHEMA_CACHE_LOCK: | |
| _wrapped_schema_cache[schema_id] = (schema, wrapped_schema) | |
| _wrapped_schema_cache.move_to_end(schema_id) | |
| # Evict oldest entry if cache is too large | |
| if len(_wrapped_schema_cache) > _WRAPPED_SCHEMA_CACHE_SIZE: | |
| _wrapped_schema_cache.popitem(last=False) | |
| # Evict oldest entry if cache is too large | |
| if len(_wrapped_schema_cache) > _WRAPPED_SCHEMA_CACHE_SIZE: | |
| _wrapped_schema_cache.popitem(last=False) | |
| return wrapped_schema | |
| with _WRAPPED_SCHEMA_CACHE_LOCK: | |
| # Re-check if another thread has already wrapped this schema while we were working | |
| tup = _wrapped_schema_cache.get(schema_id) | |
| if tup is None or tup[0] is not schema: | |
| _wrapped_schema_cache[schema_id] = (schema, wrapped_schema) | |
| _wrapped_schema_cache.move_to_end(schema_id) | |
| # Evict oldest entry if cache is too large | |
| if len(_wrapped_schema_cache) > _WRAPPED_SCHEMA_CACHE_SIZE: | |
| _wrapped_schema_cache.popitem(last=False) | |
| return wrapped_schema | |
| else: | |
| # Another thread won the race, use its result and update LRU | |
| _wrapped_schema_cache.move_to_end(schema_id) | |
| return tup[1] |
| # Cache the built proxy | ||
| self._attr_cache[name] = proxy | ||
| with self._attr_cache_lock: | ||
| self._attr_cache[name] = proxy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be made more efficient and concise. If two threads access the same uncached attribute, both will build a proxy. By using setdefault, you can atomically (within the lock) check for and set the value, ensuring that only the first thread's proxy is cached and subsequent threads use the cached version. This avoids redundant work in highly concurrent scenarios.
| self._attr_cache[name] = proxy | |
| proxy = self._attr_cache.setdefault(name, proxy) |
| # Cache the built proxy | ||
| self._attr_cache[cache_key] = proxy | ||
| with self._attr_cache_lock: | ||
| self._attr_cache[cache_key] = proxy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to my other comments, you can make this caching more efficient in a concurrent environment by using setdefault. This will prevent redundant work if multiple threads try to access the same uncached item simultaneously, ensuring only one proxy is created and cached.
| self._attr_cache[cache_key] = proxy | |
| proxy = self._attr_cache.setdefault(cache_key, proxy) |
immutability - Add lock-free cache fast path; guard OrderedDict.get and fall back to locked path on race - Replace RLock with Lock for cache guards - Broaden serialized type to Mapping | Iterable | object - Wrap dicts in MappingProxyType to prevent mutation leaks, including child values - Prefer underlying child object when available in __getitem__
proxies - Return non-dict/list/tuple children directly to preserve field serializers (e.g., PlainSerializer) in __getattr__ and __getitem__ - Accept Mapping and MappingProxyType when checking serialized mappings
No description provided.