-
Notifications
You must be signed in to change notification settings - Fork 0
Add benches for SerializationProxy overhead #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add benches for SerializationProxy overhead #4
Conversation
This commit adds extensive benchmark tests using pytest-benchmark to measure the performance overhead introduced by SerializationProxy across various operations: - Proxy creation overhead for BaseModel, dataclass, and nested structures - Attribute access overhead (single and nested) - Iteration overhead for collections - Serialization operations via proxy's built-in serializer - Custom field serializer overhead - Memory access patterns (repeated vs different attributes) - String representation (__str__ and __repr__) - End-to-end workflow scenarios Each benchmark includes a corresponding baseline test (direct operations without proxy) to measure the actual overhead introduced by the proxy layer. The benchmarks cover: - Simple and nested BaseModel instances - Simple and nested dataclass instances - Models with custom field serializers - Dictionary and list comparisons Dependencies: - Added pytest-benchmark>=5.1.0 to dev dependencies All 33 benchmark tests pass successfully. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Summary of ChangesHello @srnnkls, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the project's testing infrastructure by adding a comprehensive set of benchmark tests for the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a comprehensive suite of benchmark tests for the SerializationProxy to measure its overhead. The overall structure is well-organized, with clear baseline comparisons for various operations. I've identified a couple of critical issues in the benchmark setup where TypeAdapter instantiation was included in the measurement, which would skew the results. I've provided suggestions to correct this for more accurate benchmarking. Additionally, I've pointed out a couple of minor code quality improvements.
| def test_benchmark_build_vs_typeadapter_dump(self, benchmark, simple_model): | ||
| """Compare proxy build time with direct TypeAdapter.dump_python.""" | ||
|
|
||
| def direct_serialize(): | ||
| adapter = TypeAdapter(type(simple_model)) | ||
| return adapter.dump_python(simple_model) | ||
|
|
||
| # This benchmark measures the baseline serialization time | ||
| benchmark(direct_serialize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating the TypeAdapter inside the direct_serialize function includes its instantiation overhead in the benchmark, which can skew the results. To get a more accurate baseline for serialization, the adapter should be created once outside the benchmarked function. The test can also be simplified by passing the adapter.dump_python method directly to the benchmark.
| def test_benchmark_build_vs_typeadapter_dump(self, benchmark, simple_model): | |
| """Compare proxy build time with direct TypeAdapter.dump_python.""" | |
| def direct_serialize(): | |
| adapter = TypeAdapter(type(simple_model)) | |
| return adapter.dump_python(simple_model) | |
| # This benchmark measures the baseline serialization time | |
| benchmark(direct_serialize) | |
| def test_benchmark_build_vs_typeadapter_dump(self, benchmark, simple_model): | |
| """Compare proxy build time with direct TypeAdapter.dump_python.""" | |
| adapter = TypeAdapter(type(simple_model)) | |
| # This benchmark measures the baseline serialization time | |
| benchmark(adapter.dump_python, simple_model) |
| def test_benchmark_direct_complete_workflow(self, benchmark, nested_model): | ||
| """Benchmark complete workflow without proxy (baseline).""" | ||
|
|
||
| def complete_workflow(): | ||
| # Direct access | ||
| _ = nested_model.id | ||
| _ = nested_model.data.name | ||
| items_len = len(nested_model.items) | ||
| # Iterate | ||
| count = 0 | ||
| for item in nested_model.items: | ||
| count += 1 | ||
| if count >= 5: # Just sample a few to test iteration | ||
| break | ||
| # Serialize | ||
| adapter = TypeAdapter(NestedModel) | ||
| return adapter.dump_python(nested_model) | ||
|
|
||
| benchmark(complete_workflow) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TypeAdapter is created inside the complete_workflow function, which is benchmarked. This includes the adapter's creation overhead in the measurement, which is not ideal for a baseline comparison. The adapter should be created once, outside the function passed to benchmark.
| def test_benchmark_direct_complete_workflow(self, benchmark, nested_model): | |
| """Benchmark complete workflow without proxy (baseline).""" | |
| def complete_workflow(): | |
| # Direct access | |
| _ = nested_model.id | |
| _ = nested_model.data.name | |
| items_len = len(nested_model.items) | |
| # Iterate | |
| count = 0 | |
| for item in nested_model.items: | |
| count += 1 | |
| if count >= 5: # Just sample a few to test iteration | |
| break | |
| # Serialize | |
| adapter = TypeAdapter(NestedModel) | |
| return adapter.dump_python(nested_model) | |
| benchmark(complete_workflow) | |
| def test_benchmark_direct_complete_workflow(self, benchmark, nested_model): | |
| """Benchmark complete workflow without proxy (baseline).""" | |
| adapter = TypeAdapter(NestedModel) | |
| def complete_workflow(): | |
| # Direct access | |
| _ = nested_model.id | |
| _ = nested_model.data.name | |
| items_len = len(nested_model.items) | |
| # Iterate | |
| count = 0 | |
| for item in nested_model.items: | |
| count += 1 | |
| if count >= 5: # Just sample a few to test iteration | |
| break | |
| # Serialize | |
| return adapter.dump_python(nested_model) | |
| benchmark(complete_workflow) |
| """ | ||
|
|
||
| from dataclasses import dataclass | ||
| from typing import Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| proxy = SerializationProxy.build(model_with_serializer) | ||
|
|
||
| def access_and_serialize(): | ||
| _ = proxy.name # Should apply the serializer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment # Should apply the serializer is likely incorrect. Attribute access on the proxy does not trigger Pydantic's @field_serializer. The serializer is applied during the actual serialization process, which is correctly benchmarked by the proxy.__pydantic_serializer__.to_python(proxy) call. This line is unnecessary for the benchmark and the comment is misleading, so it can be removed.
No description provided.