Skip to content

Conversation

@yrobla
Copy link
Contributor

@yrobla yrobla commented Jan 30, 2026

Implements configurable failure handling when backend MCP servers become unavailable. The aggregator now supports two modes:

  • fail (default): Returns error if all backends unavailable; logs warning if some backends are down but continues with healthy ones
  • best_effort: Continues with available backends regardless of failures; logs info about unavailable backends

Related-to: #3036

@github-actions github-actions bot added the size/S Small PR: 100-299 lines changed label Jan 30, 2026
@codecov
Copy link

codecov bot commented Jan 30, 2026

Codecov Report

❌ Patch coverage is 87.87879% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.35%. Comparing base (d5c6c36) to head (1602ebe).

Files with missing lines Patch % Lines
cmd/vmcp/app/commands.go 0.00% 4 Missing ⚠️
Additional details and impacted files
@@                Coverage Diff                @@
##           issue-3036-v1    #3533      +/-   ##
=================================================
- Coverage          65.40%   65.35%   -0.05%     
=================================================
  Files                401      401              
  Lines              39288    39317      +29     
=================================================
  Hits               25695    25695              
- Misses             11610    11639      +29     
  Partials            1983     1983              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements configurable failure handling for the vMCP aggregator when backend MCP servers become unavailable. It introduces two operational modes: "fail" (default) which returns an error only when all backends are unavailable, and "best_effort" which continues with available backends regardless of failures. This addresses a key acceptance criterion from issue #3036.

Changes:

  • Added PartialFailureModeFail and PartialFailureModeBestEffort constants to define backend failure handling behavior
  • Extended NewDefaultAggregator to accept a failureMode parameter and implemented failure mode enforcement logic in MergeCapabilities
  • Updated all aggregator instantiations across tests and production code to explicitly specify the failure mode

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pkg/vmcp/config/config.go Adds partial failure mode constants (fail and best_effort) with documentation
pkg/vmcp/aggregator/default_aggregator.go Implements failure mode support in aggregator with health tracking and enforcement logic
pkg/vmcp/aggregator/default_aggregator_test.go Adds comprehensive test coverage for both failure modes with various backend health scenarios
cmd/vmcp/app/commands.go Integrates failure mode configuration from operational config into aggregator initialization
pkg/vmcp/server/integration_test.go Updates all test aggregator instantiations to use explicit fail mode
test/integration/vmcp/helpers/vmcp_server.go Updates test helper to use fail mode by default for test consistency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Implements configurable failure handling when backend MCP servers
become unavailable. The aggregator now supports two modes:

  - fail (default): Returns error if all backends unavailable; logs
    warning if some backends are down but continues with healthy ones
  - best_effort: Continues with available backends regardless of
    failures; logs info about unavailable backends

Related-to: #3036
@yrobla yrobla requested a review from ChrisJBurns as a code owner January 30, 2026 15:58
@github-actions github-actions bot added size/M Medium PR: 300-599 lines changed and removed size/S Small PR: 100-299 lines changed labels Jan 30, 2026
@yrobla yrobla self-assigned this Jan 30, 2026
@yrobla yrobla requested a review from Copilot January 30, 2026 16:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +326 to +331
// Track backend health
if backend.HealthStatus.IsHealthyForRouting() {
healthyBackends[backend.ID] = true
} else {
unhealthyBackends[backend.ID] = string(backend.HealthStatus)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The factoring of the defaultAggregator leaves a bit to be desired and I think this change motivates improving it.

Background

The defaultAggregator implements the Aggregator interface, which has 4 unused public methods: QueryCapabilities, QueryAllCapabilities, ResolveConflicts, and MergeCapabilities.

Suggestion

Rather than inlining more logic into defaultAggregator, lets:

  1. Remove the unused methods from the interface.
  2. Implement a circuitBreakerAggregator for your new logic. This implements the new and improved Aggregator interface and decorates any arbitrary aggregator:
type circuitBreakerAggregator struct {
   // this would really just be the defaultAggregator
    inner Aggregator
}


func (c circuitBreakerAggregator) AggregateCapabilities(...) {
    // check all the backends for health
    err := c.enforceFailureMode(...)
    if err != nil {
        return nil, err
    }
    return inner.AggregateCapabilities(...)
}

I think this makes your change cleaner, because it doesn't add any complexity to the existing defaultAggregator's construction or runtime behavior. It would also enable pretty straightforward unit testing of the circuitBreakerAggregator, because you don't need the whole defaultAggregator to test it.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Medium PR: 300-599 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants