Skip to content

feat(core): introduce a legacy validator registry system #622

@AlessandroPomponio

Description

@AlessandroPomponio

Legacy Validator Registry System - Design Plan

Problem Statement

There's a critical issue with ado's upgrade mechanism: if a user hasn't run
ado upgrade in a while, they may need upgrade paths for resources that have
been removed due to ado having progressed to the version in which the upgrade
path was removed. This means that the resource cannot be restored in any way:

  1. The current version doesn't allow the upgrade
  2. The user doesn't know what previous version provided an upgrade path for it
  3. Going back to the previous version likely won't work either, because of the
    newer resources which now would fail to load on the previous version

Current State Analysis

Existing Upgrade Mechanism

  • handle_ado_upgrade()
    simply loads and re-saves resources, triggering pydantic validators
  • Migration logic lives in @pydantic.model_validator decorators (e.g.,
    CSVSampleStoreDescription.migrate_old_format())
  • When validators are removed, upgrade paths disappear permanently
  • No mechanism to track or preserve historical migration functions

Key Issues Identified

  1. Validator Removal = Lost Upgrade Path: Once a validator is removed from
    the codebase, users with old resources are stuck
  2. Version Skipping Problem: Users who skip versions can't upgrade through
    intermediate steps
  3. No Discoverability: Users don't know which legacy validators exist or how
    to use them
  4. Backward Incompatibility: Loading newer resources in older ado versions
    fails

Proposed Solution: Legacy Validator Registry

Architecture Overview

graph TB
    A[User runs ado command] --> B{Validation Error?}
    B -->|No| C[Success]
    B -->|Yes| D[Error Analysis]
    D --> E{Deprecated Field?}
    E -->|Yes| F[Query Legacy Registry]
    E -->|No| G[Standard Error]
    F --> H[List Available Validators]
    H --> I[User selects validator]
    I --> J[Run ado upgrade --apply-legacy-validator validator_name]
    J --> K[Apply Legacy Validator]
    K --> L[Re-save Resource]
    L --> C
Loading

File Structure

orchestrator/
├── core/
│   └── legacy/                      # Part of core package
│       ├── __init__.py
│       ├── registry.py              # Legacy validator registry
│       ├── metadata.py              # Validator metadata models
│       └── validators/
│           ├── __init__.py
│           ├── resource/
│           │   ├── __init__.py
│           │   └── entitysource_to_samplestore.py
│           ├── samplestore/
│           │   ├── __init__.py
│           │   └── v1_to_v2_csv_migration.py
│           ├── discoveryspace/
│           │   └── __init__.py
│           └── operation/
│               └── __init__.py

Installation:

# Install ado with legacy validator support
uv pip install ado[legacy]
# or
pip install ado[legacy]

Core Components

1. Validator Metadata Model

# orchestrator/core/legacy/metadata.py
from typing import Callable, Annotated
import pydantic
from enum import Enum

class CoreResourceKinds(Enum):
    SAMPLESTORE = "samplestore"
    DISCOVERYSPACE = "discoveryspace"
    OPERATION = "operation"
    # ... others

class LegacyValidatorMetadata(pydantic.BaseModel):
    """Metadata for a legacy validator function"""

    identifier: Annotated[str, pydantic.Field(
        description="Unique identifier for this validator (e.g., 'csv_constitutive_columns_migration')"
    )]

    resource_type: Annotated[CoreResourceKinds, pydantic.Field(
        description="Resource type this validator applies to"
    )]

    deprecated_fields: Annotated[list[str], pydantic.Field(
        description="Fields that this validator handles"
    )]

    deprecated_from_version: Annotated[str, pydantic.Field(
        description="ADO version when these fields were deprecated"
    )]

    removed_from_version: Annotated[str, pydantic.Field(
        description="ADO version when automatic upgrade was removed"
    )]

    description: Annotated[str, pydantic.Field(
        description="Human-readable description of what this validator does"
    )]

    validator_function: Annotated[Callable[[dict], dict], pydantic.Field(
        description="The actual migration function",
        exclude=True  # Don't serialize the function
    )]

2. Legacy Validator Registry

# orchestrator/core/legacy/registry.py
from typing import Dict, List, Optional
from orchestrator.core.legacy.metadata import LegacyValidatorMetadata, CoreResourceKinds

class LegacyValidatorRegistry:
    """Registry for legacy validators that have been removed from active code"""

    _validators: Dict[str, LegacyValidatorMetadata] = {}

    @classmethod
    def register(cls, metadata: LegacyValidatorMetadata) -> None:
        """Register a legacy validator"""
        cls._validators[metadata.identifier] = metadata

    @classmethod
    def get_validator(cls, identifier: str) -> Optional[LegacyValidatorMetadata]:
        """Get a specific validator by identifier"""
        return cls._validators.get(identifier)

    @classmethod
    def get_validators_for_resource(
        cls, resource_type: CoreResourceKinds
    ) -> List[LegacyValidatorMetadata]:
        """Get all validators for a specific resource type"""
        return [
            v for v in cls._validators.values()
            if v.resource_type == resource_type
        ]

    @classmethod
    def find_validators_for_fields(
        cls,
        resource_type: CoreResourceKinds,
        field_names: List[str]
    ) -> List[LegacyValidatorMetadata]:
        """Find validators that handle specific deprecated fields"""
        return [
            v for v in cls.get_validators_for_resource(resource_type)
            if any(field in v.deprecated_fields for field in field_names)
        ]

    @classmethod
    def list_all(cls) -> List[LegacyValidatorMetadata]:
        """List all registered validators"""
        return list(cls._validators.values())

3. Decorator for Easy Registration

# orchestrator/core/legacy/registry.py (continued)
from functools import wraps

def legacy_validator(
    identifier: str,
    resource_type: CoreResourceKinds,
    deprecated_fields: List[str],
    deprecated_from_version: str,
    removed_from_version: str,
    description: str
):
    """Decorator to register a legacy validator function"""
    def decorator(func: Callable[[dict], dict]):
        metadata = LegacyValidatorMetadata(
            identifier=identifier,
            resource_type=resource_type,
            deprecated_fields=deprecated_fields,
            deprecated_from_version=deprecated_from_version,
            removed_from_version=removed_from_version,
            description=description,
            validator_function=func
        )
        LegacyValidatorRegistry.register(metadata)

        @wraps(func)
        def wrapper(*args, **kwargs):
            return func(*args, **kwargs)
        return wrapper
    return decorator

CLI Enhancements

Enhanced ado upgrade Command

# orchestrator/cli/commands/upgrade.py (enhanced)

def upgrade_resource(
    ctx: typer.Context,
    resource_type: AdoUpgradeSupportedResourceTypes,
    apply_legacy_validator: Annotated[
        Optional[List[str]],
        typer.Option(
            "--apply-legacy-validator",
            help="Apply legacy validators by identifier (e.g., 'csv_constitutive_columns_migration')"
        )
    ] = None,
    list_legacy: Annotated[
        bool,
        typer.Option(
            "--list-legacy",
            help="List available legacy validators for this resource type"
        )
    ] = False,
) -> None:
    """Upgrade resources with optional legacy validator support"""

    if list_legacy:
        # Show available legacy validators for this resource type
        # Display includes: identifier, description, deprecated fields,
        # version information, and usage example
        list_legacy_validators(resource_type)
        return

    # Normal upgrade with optional legacy validators
    handle_ado_upgrade(
        parameters=parameters,
        resource_type=resource_type,
        legacy_validators=apply_legacy_validator
    )

Note: The --list-legacy flag provides all necessary information about
available validators, including their identifiers, descriptions, deprecated
fields, and version information. A separate ado legacy command is not needed
as this single flag covers all discovery needs.

Example --list-legacy Output

$ ado upgrade sample_stores --list-legacy

Available legacy validators for sample_stores:

┌─────────────────────────────────────────────────────────────────────────────┐
│ csv_constitutive_columns_migration                                          │
├─────────────────────────────────────────────────────────────────────────────┤
│ Description:                                                                │
│   Migrates CSV sample stores from v1 format (constitutivePropertyColumns    │
│   at top level) to v2 format (per-experiment constitutivePropertyMap)       │
│                                                                             │
│ Handles deprecated fields:                                                  │
│   • constitutivePropertyColumns                                             │
│   • propertyMap                                                             │
│                                                                             │
│ Version info:                                                               │
│   Deprecated from: v1.3.5                                                   │
│   Removed from: v1.6.0                                                      │
│                                                                             │
│ Usage:                                                                      │
│   ado upgrade sample_stores --apply-legacy-validator csv_constitutive_columns_migration │
└─────────────────────────────────────────────────────────────────────────────┘

Found 1 legacy validator(s) for sample_stores

Error Detection and Suggestion Mechanism

Smart Error Handler

# orchestrator/cli/exceptions/handlers.py (enhanced)

from pydantic import ValidationError
from orchestrator.core.legacy.registry import LegacyValidatorRegistry

def handle_validation_error_with_legacy_suggestions(
    error: ValidationError,
    resource_type: CoreResourceKinds,
    resource_identifier: str
) -> None:
    """
    Analyze validation errors and suggest legacy validators if applicable
    """

    # Extract field names from validation error
    deprecated_fields = extract_deprecated_fields_from_error(error)

    if not deprecated_fields:
        # Standard error handling
        raise error

    # Find applicable legacy validators
    validators = LegacyValidatorRegistry.find_validators_for_fields(
        resource_type=resource_type,
        field_names=deprecated_fields
    )

    if not validators:
        # No legacy validators available
        raise error

    # Display helpful error message with suggestions
    console = Console()
    console.print(f"\n[bold red]Validation Error[/bold red] in {resource_type.value} '{resource_identifier}'")
    console.print(f"\nDeprecated fields detected: [yellow]{', '.join(deprecated_fields)}[/yellow]")
    console.print("\n[bold cyan]Available legacy validators:[/bold cyan]")

    for validator in validators:
        console.print(f"  • [green]{validator.identifier}[/green]")
        console.print(f"    {validator.description}")
        console.print(f"    Handles: {', '.join(validator.deprecated_fields)}")
        console.print(f"    Deprecated: v{validator.deprecated_from_version}")
        console.print()

    console.print("[bold magenta]To upgrade using a legacy validator:[/bold magenta]")
    console.print(f"  ado upgrade {resource_type.value}s --apply-legacy-validator {validators[0].identifier}")
    console.print()
    console.print("[bold magenta]To list all legacy validators:[/bold magenta]")
    console.print(f"  ado upgrade {resource_type.value}s --list-legacy")

    raise typer.Exit(1)

Example Legacy Validators

Example 1: Entity Source to Sample Store Migration

This validator handles the renaming of kind=entitysource to kind=samplestore
in ADO resources.

# orchestrator/core/legacy/validators/resource/entitysource_to_samplestore.py

from orchestrator.core.legacy.registry import legacy_validator
from orchestrator.core.resources import CoreResourceKinds

@legacy_validator(
    identifier="entitysource_to_samplestore",
    resource_type=CoreResourceKinds.SAMPLESTORE,
    deprecated_fields=["kind"],
    deprecated_from_version="1.2.0",
    removed_from_version="1.5.0",
    description="Migrates resources with kind='entitysource' to kind='samplestore'"
)
def migrate_entitysource_to_samplestore(data: dict) -> dict:
    """
    Migrate old entitysource kind to samplestore

    Old format:
    - kind: "entitysource"

    New format:
    - kind: "samplestore"
    """

    if not isinstance(data, dict):
        return data

    # Check if this is an entitysource that needs migration
    if data.get("kind") == "entitysource":
        data["kind"] = "samplestore"

    return data

Example 2: CSV Sample Store Migration

This validator handles the CSV sample store format changes.

# orchestrator/core/legacy/validators/samplestore/v1_to_v2_csv_migration.py

from orchestrator.core.legacy.registry import legacy_validator
from orchestrator.core.resources import CoreResourceKinds

@legacy_validator(
    identifier="csv_constitutive_columns_migration",
    resource_type=CoreResourceKinds.SAMPLESTORE,
    deprecated_fields=["constitutivePropertyColumns", "propertyMap"],
    deprecated_from_version="1.3.5",
    removed_from_version="1.6.0",
    description="Migrates CSV sample stores from v1 format (constitutivePropertyColumns at top level) to v2 format (per-experiment constitutivePropertyMap)"
)
def migrate_csv_v1_to_v2(data: dict) -> dict:
    """
    Migrate old CSVSampleStoreDescription format to new format

    Old format:
    - constitutivePropertyColumns at top level (list)
    - experiments list with propertyMap (not observedPropertyMap)

    New format:
    - No constitutivePropertyColumns at top level
    - experiments with observedPropertyMap and constitutivePropertyMap
    """

    if not isinstance(data, dict):
        return data

    if "constitutivePropertyColumns" not in data:
        return data

    constitutive_columns = data.pop("constitutivePropertyColumns")

    if "experiments" in data:
        for exp_desc in data["experiments"]:
            # Add constitutivePropertyMap
            exp_desc["constitutivePropertyMap"] = constitutive_columns

            # Rename propertyMap to observedPropertyMap
            if "propertyMap" in exp_desc:
                exp_desc["observedPropertyMap"] = exp_desc.pop("propertyMap")

    return data

User Experience Flow

Scenario: User with old resources tries to upgrade

# User runs upgrade
$ ado upgrade sample_stores

# Error occurs with helpful message
Validation Error in samplestore 'store-abc123'

Deprecated fields detected: constitutivePropertyColumns, propertyMap

Available legacy validators:
  • csv_constitutive_columns_migration
    Migrates CSV sample stores from v1 format to v2 format
    Handles: constitutivePropertyColumns, propertyMap
    Deprecated: v1.3.5

To upgrade using a legacy validator:
  ado upgrade sample_stores --apply-legacy-validator csv_constitutive_columns_migration

To list all legacy validators:
  ado upgrade sample_stores --list-legacy

# User can first inspect available validators
$ ado upgrade sample_stores --list-legacy
[Shows detailed information about all available validators]

# User applies the fix
$ ado upgrade sample_stores --apply-legacy-validator csv_constitutive_columns_migration
✓ Upgraded 3 sample stores successfully

Implementation Strategy

Phase 1: Foundation (Week 1-2)

  1. Create legacy package structure

    • Create orchestrator/core/legacy/ directory
    • Implement LegacyValidatorMetadata model
    • Implement LegacyValidatorRegistry class
    • Add @legacy_validator decorator
  2. Migrate existing validator

    • Extract CSV migration from CSVSampleStoreDescription.migrate_old_format()
    • Register as first legacy validator
    • Keep original validator in place (backward compatibility)

Phase 2: CLI Integration (Week 3)

  1. Enhance upgrade command

    • Add --apply-legacy-validator option to upgrade_resource()
    • Add --list-legacy option with comprehensive output
    • Modify handle_ado_upgrade() to accept legacy validators
  2. Implement list functionality

    • Create list_legacy_validators() function
    • Format output with rich tables/panels
    • Include all metadata and usage examples

Phase 3: Error Handling (Week 4)

  1. Smart error detection

    • Enhance validation error handler
    • Add field extraction from pydantic errors
    • Implement suggestion mechanism
  2. Testing

    • Unit tests for registry
    • Integration tests for CLI
    • End-to-end upgrade scenarios

Phase 4: Migration Path (Week 5-6)

  1. Deprecation workflow

    • When removing a validator from active code:
      1. Move to legacy package
      2. Register with metadata
      3. Update documentation
      4. Add to changelog
  2. Documentation

    • User guide for legacy validators
    • Developer guide for creating validators
    • Migration checklist

Benefits of This Approach

  1. Permanent Upgrade Paths: Validators never truly disappear
  2. Discoverability: Users can find and apply validators easily via
    --list-legacy
  3. Version Skipping: Users can upgrade across multiple versions
  4. Backward Compatibility: Old resources remain accessible
  5. Clear Migration: Structured process for deprecating validators
  6. Self-Documenting: Metadata provides context and history
  7. Testable: Each validator is isolated and testable
  8. Extensible: Easy to add new validators
  9. Simple CLI: Single command with two flags covers all use cases

Additional Considerations

Validator Chaining

For complex migrations spanning multiple versions:

# orchestrator/core/legacy/registry.py (enhanced)

class LegacyValidatorRegistry:
    @classmethod
    def get_upgrade_chain(
        cls,
        resource_type: CoreResourceKinds,
        from_version: str,
        to_version: str
    ) -> List[LegacyValidatorMetadata]:
        """Get ordered list of validators to upgrade from one version to another"""
        # Implementation to find and order validators

Excluded Features

Dry-Run Support (Not Implemented)

Rationale: We cannot predict which pydantic validators will be applied
during resource loading, making it impossible to accurately preview changes
before they occur. The upgrade process involves:

  1. Loading resources (triggers active pydantic validators)
  2. Applying legacy validators (if specified)
  3. Re-saving resources

Since step 1 may already modify data through active validators, a true dry-run
is not feasible.

Recommendation: Users should test upgrades in a non-production environment
first.

Rollback Capability (Not Implemented)

Rationale: Implementing reliable backup/restore for the metastore is
complex:

  • Requires transaction management across multiple resource types
  • Potential for partial failures during restore
  • Storage overhead for maintaining backups
  • Complexity in handling concurrent operations

Recommendation: Users should:

  1. Back up their metastore database before major upgrades
  2. Test upgrade procedures in development environments
  3. Use version control for YAML resource definitions

Implementation Priority

High Priority (MVP)

  • Legacy validator registry and metadata models
  • --apply-legacy-validator flag for ado upgrade
  • --list-legacy flag with comprehensive output
  • Smart error detection with suggestions
  • Create two example validators:
    • entitysource_to_samplestore: Migrate kind field
    • csv_constitutive_columns_migration: Migrate CSV format

Medium Priority

  • Validator chaining for multi-version upgrades
  • Comprehensive documentation

Low Priority (Nice-to-have)

  • Automated validator discovery from error patterns
  • Version-based validator recommendations

Design Decisions

  1. Storage: Validator metadata is code-only (not stored in metastore)

    • Simpler implementation
    • Version controlled with the codebase
    • No database schema changes needed
  2. Versioning: Validators are immutable once created

    • If a validator needs changes, create a new validator with a new identifier
    • Original validator remains for historical compatibility
    • Clear audit trail of migration logic evolution
  3. Distribution: Legacy validators included in core package

    • Part of standard ado installation (no optional extra needed)
    • Always available when needed
    • Minimal overhead (just additional Python modules)
    • Simplifies user experience (no extra installation step)
  4. Automation: Manual validator selection required

    • Automatic detection could apply wrong validators
    • Explicit user choice ensures correct migration path
    • Error messages guide users to appropriate validators
  5. Dry-run and Rollback: Not implemented

    • Dry-run: Cannot predict pydantic validator behavior for accurate
      dry-run
    • Rollback: Backup/restore complexity outweighs benefits
    • Users should test in non-production environments and maintain database
      backups

Package Configuration

Legacy validators are part of the core package with no additional configuration
needed in pyproject.toml. The orchestrator/core/legacy/ module is imported
on-demand when users invoke --apply-legacy-validator or --list-legacy flags.

Next Steps

  1. Validate Design: Review this plan with the team to ensure it meets
    requirements
  2. Prototype: Build MVP with CSV validator migration
  3. Test: Verify with real-world upgrade scenarios
  4. Document: Create user and developer guides
  5. Iterate: Refine based on feedback

Document Version: 1.4 Date: 2026-02-26 Status: Draft for Review
Changes:

  • v1.1: Removed separate ado legacy command, renamed --extend-with to
    --apply-legacy-validator
  • v1.2: Added design decisions (code-only metadata, immutable validators,
    optional [legacy] extra)
  • v1.3: Removed dry-run and rollback features with rationale; updated
    implementation priorities
  • v1.4: Changed from optional [legacy] extra to core package inclusion

Originally posted by @AlessandroPomponio in #620

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions