-
Notifications
You must be signed in to change notification settings - Fork 173
Add partial failure mode support to vMCP aggregator #3533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: issue-3036-v1
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## issue-3036-v1 #3533 +/- ##
=================================================
- Coverage 65.40% 65.35% -0.05%
=================================================
Files 401 401
Lines 39288 39317 +29
=================================================
Hits 25695 25695
- Misses 11610 11639 +29
Partials 1983 1983 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements configurable failure handling for the vMCP aggregator when backend MCP servers become unavailable. It introduces two operational modes: "fail" (default) which returns an error only when all backends are unavailable, and "best_effort" which continues with available backends regardless of failures. This addresses a key acceptance criterion from issue #3036.
Changes:
- Added
PartialFailureModeFailandPartialFailureModeBestEffortconstants to define backend failure handling behavior - Extended
NewDefaultAggregatorto accept afailureModeparameter and implemented failure mode enforcement logic inMergeCapabilities - Updated all aggregator instantiations across tests and production code to explicitly specify the failure mode
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/vmcp/config/config.go | Adds partial failure mode constants (fail and best_effort) with documentation |
| pkg/vmcp/aggregator/default_aggregator.go | Implements failure mode support in aggregator with health tracking and enforcement logic |
| pkg/vmcp/aggregator/default_aggregator_test.go | Adds comprehensive test coverage for both failure modes with various backend health scenarios |
| cmd/vmcp/app/commands.go | Integrates failure mode configuration from operational config into aggregator initialization |
| pkg/vmcp/server/integration_test.go | Updates all test aggregator instantiations to use explicit fail mode |
| test/integration/vmcp/helpers/vmcp_server.go | Updates test helper to use fail mode by default for test consistency |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Implements configurable failure handling when backend MCP servers
become unavailable. The aggregator now supports two modes:
- fail (default): Returns error if all backends unavailable; logs
warning if some backends are down but continues with healthy ones
- best_effort: Continues with available backends regardless of
failures; logs info about unavailable backends
Related-to: #3036
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Track backend health | ||
| if backend.HealthStatus.IsHealthyForRouting() { | ||
| healthyBackends[backend.ID] = true | ||
| } else { | ||
| unhealthyBackends[backend.ID] = string(backend.HealthStatus) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: The factoring of the defaultAggregator leaves a bit to be desired and I think this change motivates improving it.
Background
The defaultAggregator implements the Aggregator interface, which has 4 unused public methods: QueryCapabilities, QueryAllCapabilities, ResolveConflicts, and MergeCapabilities.
Suggestion
Rather than inlining more logic into defaultAggregator, lets:
- Remove the unused methods from the interface.
- Implement a
circuitBreakerAggregatorfor your new logic. This implements the new and improvedAggregatorinterface and decorates any arbitrary aggregator:
type circuitBreakerAggregator struct {
// this would really just be the defaultAggregator
inner Aggregator
}
func (c circuitBreakerAggregator) AggregateCapabilities(...) {
// check all the backends for health
err := c.enforceFailureMode(...)
if err != nil {
return nil, err
}
return inner.AggregateCapabilities(...)
}I think this makes your change cleaner, because it doesn't add any complexity to the existing defaultAggregator's construction or runtime behavior. It would also enable pretty straightforward unit testing of the circuitBreakerAggregator, because you don't need the whole defaultAggregator to test it.
What do you think?
Implements configurable failure handling when backend MCP servers become unavailable. The aggregator now supports two modes:
Related-to: #3036