Skip to content

Conversation

@srnnkls
Copy link
Contributor

@srnnkls srnnkls commented Oct 26, 2025

No description provided.

srnnkls and others added 2 commits October 26, 2025 09:15
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Comprehensive thread-safety improvements to prevent race conditions:

1. Global schema cache protection:
   - Added RLock for _wrapped_schema_cache
   - Store (original_schema, wrapped_schema) tuples to prevent id() reuse bugs
   - Verify schema identity on cache hits

2. Class-level proxy type cache protection:
   - Added RLock for _proxy_type_cache
   - Guard against id() reuse with identity checks
   - Double-check locking pattern to prevent duplicate proxy type creation

3. Per-instance attribute cache protection:
   - Added RLock for _attr_cache in each proxy instance
   - Protected __getattr__ and __getitem__ cache access
   - Use re-entrant locks to prevent deadlocks from __repr__

4. Serialized data immutability:
   - Wrap dict results in MappingProxyType in build()
   - Prevents accidental mutation from multiple threads

All OrderedDict operations (move_to_end, popitem) are now protected
by appropriate locks. Critical sections minimized by building schemas
and proxy types outside locks when possible.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@gemini-code-assist
Copy link

Summary of Changes

Hello @srnnkls, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the thread safety of the SerializationProxy class by introducing reentrant locks around all shared mutable state and cache operations. It addresses potential race conditions in multi-threaded environments by carefully managing access to internal caches and guarding against Python's id() reuse behavior. Additionally, it promotes data integrity by ensuring that serialized dictionary objects are immutable, preventing unintended modifications across threads.

Highlights

  • Thread Safety: Introduced RLocks to protect shared caches (_wrapped_schema_cache, _proxy_type_cache, _attr_cache) from concurrent access, preventing race conditions.
  • Cache Robustness: Implemented checks to guard against id() reuse in Python's object identity system when accessing _wrapped_schema_cache and _proxy_type_cache, ensuring correct cache hits.
  • Immutability: Wrapped serialized dictionaries in MappingProxyType to make them read-only, preventing accidental mutation by other threads after serialization.
  • Optimized Locking: Refactored cache logic to minimize the critical sections under locks, allowing expensive operations like schema building to occur outside the lock.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request does a great job of adding thread safety to the SerializationProxy by introducing locks for shared caches. The changes correctly address potential race conditions during cache access and even include a guard against id() reuse, which is excellent. The pattern of minimizing the critical section by performing expensive operations outside the lock is also well-applied. My review includes a few suggestions to further refine the double-checked locking pattern in a few places. These changes will prevent redundant work in highly concurrent scenarios and ensure the implementation is both correct and efficient, consistent with the robust pattern already used in the _build method.

Comment on lines +115 to 123
with _WRAPPED_SCHEMA_CACHE_LOCK:
_wrapped_schema_cache[schema_id] = (schema, wrapped_schema)
_wrapped_schema_cache.move_to_end(schema_id)

# Evict oldest entry if cache is too large
if len(_wrapped_schema_cache) > _WRAPPED_SCHEMA_CACHE_SIZE:
_wrapped_schema_cache.popitem(last=False)
# Evict oldest entry if cache is too large
if len(_wrapped_schema_cache) > _WRAPPED_SCHEMA_CACHE_SIZE:
_wrapped_schema_cache.popitem(last=False)

return wrapped_schema

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While adding a lock here prevents race conditions, this implementation of double-checked locking can be made more efficient. If two threads call this function with the same uncached schema, both will perform the expensive schema wrapping. The second thread to acquire the lock will simply overwrite the result of the first.

To avoid this redundant work, you can re-check the cache after acquiring the lock for writing. If another thread has populated the cache in the meantime, you can use its result and discard your own. This pattern is already correctly implemented in the _build method. Applying it here would improve efficiency and consistency.

Suggested change
with _WRAPPED_SCHEMA_CACHE_LOCK:
_wrapped_schema_cache[schema_id] = (schema, wrapped_schema)
_wrapped_schema_cache.move_to_end(schema_id)
# Evict oldest entry if cache is too large
if len(_wrapped_schema_cache) > _WRAPPED_SCHEMA_CACHE_SIZE:
_wrapped_schema_cache.popitem(last=False)
# Evict oldest entry if cache is too large
if len(_wrapped_schema_cache) > _WRAPPED_SCHEMA_CACHE_SIZE:
_wrapped_schema_cache.popitem(last=False)
return wrapped_schema
with _WRAPPED_SCHEMA_CACHE_LOCK:
# Re-check if another thread has already wrapped this schema while we were working
tup = _wrapped_schema_cache.get(schema_id)
if tup is None or tup[0] is not schema:
_wrapped_schema_cache[schema_id] = (schema, wrapped_schema)
_wrapped_schema_cache.move_to_end(schema_id)
# Evict oldest entry if cache is too large
if len(_wrapped_schema_cache) > _WRAPPED_SCHEMA_CACHE_SIZE:
_wrapped_schema_cache.popitem(last=False)
return wrapped_schema
else:
# Another thread won the race, use its result and update LRU
_wrapped_schema_cache.move_to_end(schema_id)
return tup[1]

# Cache the built proxy
self._attr_cache[name] = proxy
with self._attr_cache_lock:
self._attr_cache[name] = proxy

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This can be made more efficient and concise. If two threads access the same uncached attribute, both will build a proxy. By using setdefault, you can atomically (within the lock) check for and set the value, ensuring that only the first thread's proxy is cached and subsequent threads use the cached version. This avoids redundant work in highly concurrent scenarios.

Suggested change
self._attr_cache[name] = proxy
proxy = self._attr_cache.setdefault(name, proxy)

# Cache the built proxy
self._attr_cache[cache_key] = proxy
with self._attr_cache_lock:
self._attr_cache[cache_key] = proxy

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to my other comments, you can make this caching more efficient in a concurrent environment by using setdefault. This will prevent redundant work if multiple threads try to access the same uncached item simultaneously, ensuring only one proxy is created and cached.

Suggested change
self._attr_cache[cache_key] = proxy
proxy = self._attr_cache.setdefault(cache_key, proxy)

immutability

- Add lock-free cache fast path; guard OrderedDict.get and fall back to locked path on race
- Replace RLock with Lock for cache guards
- Broaden serialized type to Mapping | Iterable | object
- Wrap dicts in MappingProxyType to prevent mutation leaks, including child values
- Prefer underlying child object when available in __getitem__
proxies

- Return non-dict/list/tuple children directly to preserve field serializers (e.g., PlainSerializer) in __getattr__ and __getitem__
- Accept Mapping and MappingProxyType when checking serialized mappings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants