Add partial failure mode support to vMCP aggregator #3533

yrobla · 2026-01-30T15:11:30Z

Implements configurable failure handling when backend MCP servers become unavailable. The aggregator now supports two modes:

fail (default): Returns error if all backends unavailable; logs warning if some backends are down but continues with healthy ones
best_effort: Continues with available backends regardless of failures; logs info about unavailable backends

Related-to: #3036

codecov · 2026-01-30T15:20:09Z

Codecov Report

❌ Patch coverage is 87.87879% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.35%. Comparing base (d5c6c36) to head (1602ebe).

Files with missing lines	Patch %	Lines
cmd/vmcp/app/commands.go	0.00%	4 Missing ⚠️

Additional details and impacted files

@@                Coverage Diff                @@
##           issue-3036-v1    #3533      +/-   ##
=================================================
- Coverage          65.40%   65.35%   -0.05%     
=================================================
  Files                401      401              
  Lines              39288    39317      +29     
=================================================
  Hits               25695    25695              
- Misses             11610    11639      +29     
  Partials            1983     1983

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR implements configurable failure handling for the vMCP aggregator when backend MCP servers become unavailable. It introduces two operational modes: "fail" (default) which returns an error only when all backends are unavailable, and "best_effort" which continues with available backends regardless of failures. This addresses a key acceptance criterion from issue #3036.

Changes:

Added PartialFailureModeFail and PartialFailureModeBestEffort constants to define backend failure handling behavior
Extended NewDefaultAggregator to accept a failureMode parameter and implemented failure mode enforcement logic in MergeCapabilities
Updated all aggregator instantiations across tests and production code to explicitly specify the failure mode

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
pkg/vmcp/config/config.go	Adds partial failure mode constants (fail and best_effort) with documentation
pkg/vmcp/aggregator/default_aggregator.go	Implements failure mode support in aggregator with health tracking and enforcement logic
pkg/vmcp/aggregator/default_aggregator_test.go	Adds comprehensive test coverage for both failure modes with various backend health scenarios
cmd/vmcp/app/commands.go	Integrates failure mode configuration from operational config into aggregator initialization
pkg/vmcp/server/integration_test.go	Updates all test aggregator instantiations to use explicit fail mode
test/integration/vmcp/helpers/vmcp_server.go	Updates test helper to use fail mode by default for test consistency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/vmcp/config/config.go

pkg/vmcp/aggregator/default_aggregator.go

Implements configurable failure handling when backend MCP servers become unavailable. The aggregator now supports two modes: - fail (default): Returns error if all backends unavailable; logs warning if some backends are down but continues with healthy ones - best_effort: Continues with available backends regardless of failures; logs info about unavailable backends Related-to: #3036

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jerm-dro · 2026-01-30T21:59:33Z

pkg/vmcp/aggregator/default_aggregator.go

+		// Track backend health
+		if backend.HealthStatus.IsHealthyForRouting() {
+			healthyBackends[backend.ID] = true
+		} else {
+			unhealthyBackends[backend.ID] = string(backend.HealthStatus)
+		}


Suggestion: The factoring of the defaultAggregator leaves a bit to be desired and I think this change motivates improving it.

Background

The defaultAggregator implements the Aggregator interface, which has 4 unused public methods: QueryCapabilities, QueryAllCapabilities, ResolveConflicts, and MergeCapabilities.

Suggestion

Rather than inlining more logic into defaultAggregator, lets:

Remove the unused methods from the interface.

Implement a circuitBreakerAggregator for your new logic. This implements the new and improved Aggregator interface and decorates any arbitrary aggregator:

type circuitBreakerAggregator struct { // this would really just be the defaultAggregator inner Aggregator } func (c circuitBreakerAggregator) AggregateCapabilities(...) { // check all the backends for health err := c.enforceFailureMode(...) if err != nil { return nil, err } return inner.AggregateCapabilities(...) }

I think this makes your change cleaner, because it doesn't add any complexity to the existing defaultAggregator's construction or runtime behavior. It would also enable pretty straightforward unit testing of the circuitBreakerAggregator, because you don't need the whole defaultAggregator to test it.

What do you think?

yrobla requested review from JAORMX, amirejaz, jerm-dro and jhrozek as code owners January 30, 2026 15:11

github-actions bot added the size/S Small PR: 100-299 lines changed label Jan 30, 2026

yrobla requested a review from Copilot January 30, 2026 15:28

Copilot started reviewing on behalf of yrobla January 30, 2026 15:29 View session

Copilot AI reviewed Jan 30, 2026

View reviewed changes

pkg/vmcp/config/config.go Show resolved Hide resolved

pkg/vmcp/config/config.go Outdated Show resolved Hide resolved

pkg/vmcp/aggregator/default_aggregator.go Outdated Show resolved Hide resolved

yrobla force-pushed the issue-3036-v2 branch from 3748109 to 1602ebe Compare January 30, 2026 15:58

yrobla requested a review from ChrisJBurns as a code owner January 30, 2026 15:58

github-actions bot added size/M Medium PR: 300-599 lines changed and removed size/S Small PR: 100-299 lines changed labels Jan 30, 2026

yrobla self-assigned this Jan 30, 2026

yrobla requested a review from Copilot January 30, 2026 16:01

Copilot started reviewing on behalf of yrobla January 30, 2026 16:01 View session

Copilot AI reviewed Jan 30, 2026

View reviewed changes

jerm-dro reviewed Jan 30, 2026

View reviewed changes

jerm-dro mentioned this pull request Jan 30, 2026

Integrate circuit breaker with vMCP routing and aggregation #3530

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add partial failure mode support to vMCP aggregator #3533

Add partial failure mode support to vMCP aggregator #3533

yrobla commented Jan 30, 2026

Uh oh!

codecov bot commented Jan 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

jerm-dro Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add partial failure mode support to vMCP aggregator #3533

Are you sure you want to change the base?

Add partial failure mode support to vMCP aggregator #3533

Conversation

yrobla commented Jan 30, 2026

Uh oh!

codecov bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

jerm-dro Jan 30, 2026

Choose a reason for hiding this comment

Background

Suggestion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Jan 30, 2026 •

edited

Loading