Fix thread safety in SerializationProxy #8

srnnkls · 2025-10-26T10:17:43Z

No description provided.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Comprehensive thread-safety improvements to prevent race conditions: 1. Global schema cache protection: - Added RLock for _wrapped_schema_cache - Store (original_schema, wrapped_schema) tuples to prevent id() reuse bugs - Verify schema identity on cache hits 2. Class-level proxy type cache protection: - Added RLock for _proxy_type_cache - Guard against id() reuse with identity checks - Double-check locking pattern to prevent duplicate proxy type creation 3. Per-instance attribute cache protection: - Added RLock for _attr_cache in each proxy instance - Protected __getattr__ and __getitem__ cache access - Use re-entrant locks to prevent deadlocks from __repr__ 4. Serialized data immutability: - Wrap dict results in MappingProxyType in build() - Prevents accidental mutation from multiple threads All OrderedDict operations (move_to_end, popitem) are now protected by appropriate locks. Critical sections minimized by building schemas and proxy types outside locks when possible. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

gemini-code-assist · 2025-10-26T10:17:58Z

Summary of Changes

Hello @srnnkls, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the thread safety of the SerializationProxy class by introducing reentrant locks around all shared mutable state and cache operations. It addresses potential race conditions in multi-threaded environments by carefully managing access to internal caches and guarding against Python's id() reuse behavior. Additionally, it promotes data integrity by ensuring that serialized dictionary objects are immutable, preventing unintended modifications across threads.

Highlights

Thread Safety: Introduced RLocks to protect shared caches (_wrapped_schema_cache, _proxy_type_cache, _attr_cache) from concurrent access, preventing race conditions.
Cache Robustness: Implemented checks to guard against id() reuse in Python's object identity system when accessing _wrapped_schema_cache and _proxy_type_cache, ensuring correct cache hits.
Immutability: Wrapped serialized dictionaries in MappingProxyType to make them read-only, preventing accidental mutation by other threads after serialization.
Optimized Locking: Refactored cache logic to minimize the critical sections under locks, allowing expensive operations like schema building to occur outside the lock.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request does a great job of adding thread safety to the SerializationProxy by introducing locks for shared caches. The changes correctly address potential race conditions during cache access and even include a guard against id() reuse, which is excellent. The pattern of minimizing the critical section by performing expensive operations outside the lock is also well-applied. My review includes a few suggestions to further refine the double-checked locking pattern in a few places. These changes will prevent redundant work in highly concurrent scenarios and ensure the implementation is both correct and efficient, consistent with the robust pattern already used in the _build method.

gemini-code-assist · 2025-10-26T10:19:38Z

src/deigma/proxy.py

+    with _WRAPPED_SCHEMA_CACHE_LOCK:
+        _wrapped_schema_cache[schema_id] = (schema, wrapped_schema)
+        _wrapped_schema_cache.move_to_end(schema_id)

-    # Evict oldest entry if cache is too large
-    if len(_wrapped_schema_cache) > _WRAPPED_SCHEMA_CACHE_SIZE:
-        _wrapped_schema_cache.popitem(last=False)
+        # Evict oldest entry if cache is too large
+        if len(_wrapped_schema_cache) > _WRAPPED_SCHEMA_CACHE_SIZE:
+            _wrapped_schema_cache.popitem(last=False)

    return wrapped_schema


While adding a lock here prevents race conditions, this implementation of double-checked locking can be made more efficient. If two threads call this function with the same uncached schema, both will perform the expensive schema wrapping. The second thread to acquire the lock will simply overwrite the result of the first.

To avoid this redundant work, you can re-check the cache after acquiring the lock for writing. If another thread has populated the cache in the meantime, you can use its result and discard your own. This pattern is already correctly implemented in the _build method. Applying it here would improve efficiency and consistency.

Suggested change

with _WRAPPED_SCHEMA_CACHE_LOCK:

_wrapped_schema_cache[schema_id] = (schema, wrapped_schema)

_wrapped_schema_cache.move_to_end(schema_id)

# Evict oldest entry if cache is too large

if len(_wrapped_schema_cache) > _WRAPPED_SCHEMA_CACHE_SIZE:

_wrapped_schema_cache.popitem(last=False)

# Evict oldest entry if cache is too large

if len(_wrapped_schema_cache) > _WRAPPED_SCHEMA_CACHE_SIZE:

_wrapped_schema_cache.popitem(last=False)

return wrapped_schema

with _WRAPPED_SCHEMA_CACHE_LOCK:

# Re-check if another thread has already wrapped this schema while we were working

tup = _wrapped_schema_cache.get(schema_id)

if tup is None or tup[0] is not schema:

_wrapped_schema_cache[schema_id] = (schema, wrapped_schema)

_wrapped_schema_cache.move_to_end(schema_id)

# Evict oldest entry if cache is too large

if len(_wrapped_schema_cache) > _WRAPPED_SCHEMA_CACHE_SIZE:

_wrapped_schema_cache.popitem(last=False)

return wrapped_schema

else:

# Another thread won the race, use its result and update LRU

_wrapped_schema_cache.move_to_end(schema_id)

return tup[1]

gemini-code-assist · 2025-10-26T10:19:38Z

src/deigma/proxy.py

            # Cache the built proxy
-            self._attr_cache[name] = proxy
+            with self._attr_cache_lock:
+                self._attr_cache[name] = proxy


This can be made more efficient and concise. If two threads access the same uncached attribute, both will build a proxy. By using setdefault, you can atomically (within the lock) check for and set the value, ensuring that only the first thread's proxy is cached and subsequent threads use the cached version. This avoids redundant work in highly concurrent scenarios.

Suggested change

self._attr_cache[name] = proxy

proxy = self._attr_cache.setdefault(name, proxy)

gemini-code-assist · 2025-10-26T10:19:38Z

src/deigma/proxy.py

        # Cache the built proxy
-        self._attr_cache[cache_key] = proxy
+        with self._attr_cache_lock:
+            self._attr_cache[cache_key] = proxy


Similar to my other comments, you can make this caching more efficient in a concurrent environment by using setdefault. This will prevent redundant work if multiple threads try to access the same uncached item simultaneously, ensuring only one proxy is created and cached.

Suggested change

self._attr_cache[cache_key] = proxy

proxy = self._attr_cache.setdefault(cache_key, proxy)

immutability - Add lock-free cache fast path; guard OrderedDict.get and fall back to locked path on race - Replace RLock with Lock for cache guards - Broaden serialized type to Mapping | Iterable | object - Wrap dicts in MappingProxyType to prevent mutation leaks, including child values - Prefer underlying child object when available in __getitem__

proxies - Return non-dict/list/tuple children directly to preserve field serializers (e.g., PlainSerializer) in __getattr__ and __getitem__ - Accept Mapping and MappingProxyType when checking serialized mappings

srnnkls and others added 2 commits October 26, 2025 09:15

Update proxy.py

9ed509c

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

gemini-code-assist bot reviewed Oct 26, 2025

View reviewed changes

srnnkls added 2 commits October 26, 2025 13:02

fix(proxy): Handle primitive fields and mapping

da380eb

proxies - Return non-dict/list/tuple children directly to preserve field serializers (e.g., PlainSerializer) in __getattr__ and __getitem__ - Accept Mapping and MappingProxyType when checking serialized mappings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix thread safety in SerializationProxy #8

Fix thread safety in SerializationProxy #8

Uh oh!

srnnkls commented Oct 26, 2025

Uh oh!

gemini-code-assist bot commented Oct 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 26, 2025

Uh oh!

gemini-code-assist bot Oct 26, 2025

Uh oh!

gemini-code-assist bot Oct 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	self._attr_cache[name] = proxy
	proxy = self._attr_cache.setdefault(name, proxy)

	self._attr_cache[cache_key] = proxy
	proxy = self._attr_cache.setdefault(cache_key, proxy)

Fix thread safety in SerializationProxy #8

Are you sure you want to change the base?

Fix thread safety in SerializationProxy #8

Uh oh!

Conversation

srnnkls commented Oct 26, 2025

Uh oh!

gemini-code-assist bot commented Oct 26, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants