-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Legacy Validator Registry System - Design Plan
Problem Statement
There's a critical issue with ado's upgrade mechanism: if a user hasn't run
ado upgradein a while, they may need upgrade paths for resources that have
been removed due to ado having progressed to the version in which the upgrade
path was removed. This means that the resource cannot be restored in any way:
- The current version doesn't allow the upgrade
- The user doesn't know what previous version provided an upgrade path for it
- Going back to the previous version likely won't work either, because of the
newer resources which now would fail to load on the previous versionCurrent State Analysis
Existing Upgrade Mechanism
handle_ado_upgrade()
simply loads and re-saves resources, triggering pydantic validators- Migration logic lives in
@pydantic.model_validatordecorators (e.g.,
CSVSampleStoreDescription.migrate_old_format())- When validators are removed, upgrade paths disappear permanently
- No mechanism to track or preserve historical migration functions
Key Issues Identified
- Validator Removal = Lost Upgrade Path: Once a validator is removed from
the codebase, users with old resources are stuck- Version Skipping Problem: Users who skip versions can't upgrade through
intermediate steps- No Discoverability: Users don't know which legacy validators exist or how
to use them- Backward Incompatibility: Loading newer resources in older ado versions
failsProposed Solution: Legacy Validator Registry
Architecture Overview
Loadinggraph TB A[User runs ado command] --> B{Validation Error?} B -->|No| C[Success] B -->|Yes| D[Error Analysis] D --> E{Deprecated Field?} E -->|Yes| F[Query Legacy Registry] E -->|No| G[Standard Error] F --> H[List Available Validators] H --> I[User selects validator] I --> J[Run ado upgrade --apply-legacy-validator validator_name] J --> K[Apply Legacy Validator] K --> L[Re-save Resource] L --> CFile Structure
orchestrator/ ├── core/ │ └── legacy/ # Part of core package │ ├── __init__.py │ ├── registry.py # Legacy validator registry │ ├── metadata.py # Validator metadata models │ └── validators/ │ ├── __init__.py │ ├── resource/ │ │ ├── __init__.py │ │ └── entitysource_to_samplestore.py │ ├── samplestore/ │ │ ├── __init__.py │ │ └── v1_to_v2_csv_migration.py │ ├── discoveryspace/ │ │ └── __init__.py │ └── operation/ │ └── __init__.pyInstallation:
# Install ado with legacy validator support uv pip install ado[legacy] # or pip install ado[legacy]Core Components
1. Validator Metadata Model
# orchestrator/core/legacy/metadata.py from typing import Callable, Annotated import pydantic from enum import Enum class CoreResourceKinds(Enum): SAMPLESTORE = "samplestore" DISCOVERYSPACE = "discoveryspace" OPERATION = "operation" # ... others class LegacyValidatorMetadata(pydantic.BaseModel): """Metadata for a legacy validator function""" identifier: Annotated[str, pydantic.Field( description="Unique identifier for this validator (e.g., 'csv_constitutive_columns_migration')" )] resource_type: Annotated[CoreResourceKinds, pydantic.Field( description="Resource type this validator applies to" )] deprecated_fields: Annotated[list[str], pydantic.Field( description="Fields that this validator handles" )] deprecated_from_version: Annotated[str, pydantic.Field( description="ADO version when these fields were deprecated" )] removed_from_version: Annotated[str, pydantic.Field( description="ADO version when automatic upgrade was removed" )] description: Annotated[str, pydantic.Field( description="Human-readable description of what this validator does" )] validator_function: Annotated[Callable[[dict], dict], pydantic.Field( description="The actual migration function", exclude=True # Don't serialize the function )]2. Legacy Validator Registry
# orchestrator/core/legacy/registry.py from typing import Dict, List, Optional from orchestrator.core.legacy.metadata import LegacyValidatorMetadata, CoreResourceKinds class LegacyValidatorRegistry: """Registry for legacy validators that have been removed from active code""" _validators: Dict[str, LegacyValidatorMetadata] = {} @classmethod def register(cls, metadata: LegacyValidatorMetadata) -> None: """Register a legacy validator""" cls._validators[metadata.identifier] = metadata @classmethod def get_validator(cls, identifier: str) -> Optional[LegacyValidatorMetadata]: """Get a specific validator by identifier""" return cls._validators.get(identifier) @classmethod def get_validators_for_resource( cls, resource_type: CoreResourceKinds ) -> List[LegacyValidatorMetadata]: """Get all validators for a specific resource type""" return [ v for v in cls._validators.values() if v.resource_type == resource_type ] @classmethod def find_validators_for_fields( cls, resource_type: CoreResourceKinds, field_names: List[str] ) -> List[LegacyValidatorMetadata]: """Find validators that handle specific deprecated fields""" return [ v for v in cls.get_validators_for_resource(resource_type) if any(field in v.deprecated_fields for field in field_names) ] @classmethod def list_all(cls) -> List[LegacyValidatorMetadata]: """List all registered validators""" return list(cls._validators.values())3. Decorator for Easy Registration
# orchestrator/core/legacy/registry.py (continued) from functools import wraps def legacy_validator( identifier: str, resource_type: CoreResourceKinds, deprecated_fields: List[str], deprecated_from_version: str, removed_from_version: str, description: str ): """Decorator to register a legacy validator function""" def decorator(func: Callable[[dict], dict]): metadata = LegacyValidatorMetadata( identifier=identifier, resource_type=resource_type, deprecated_fields=deprecated_fields, deprecated_from_version=deprecated_from_version, removed_from_version=removed_from_version, description=description, validator_function=func ) LegacyValidatorRegistry.register(metadata) @wraps(func) def wrapper(*args, **kwargs): return func(*args, **kwargs) return wrapper return decoratorCLI Enhancements
Enhanced
ado upgradeCommand# orchestrator/cli/commands/upgrade.py (enhanced) def upgrade_resource( ctx: typer.Context, resource_type: AdoUpgradeSupportedResourceTypes, apply_legacy_validator: Annotated[ Optional[List[str]], typer.Option( "--apply-legacy-validator", help="Apply legacy validators by identifier (e.g., 'csv_constitutive_columns_migration')" ) ] = None, list_legacy: Annotated[ bool, typer.Option( "--list-legacy", help="List available legacy validators for this resource type" ) ] = False, ) -> None: """Upgrade resources with optional legacy validator support""" if list_legacy: # Show available legacy validators for this resource type # Display includes: identifier, description, deprecated fields, # version information, and usage example list_legacy_validators(resource_type) return # Normal upgrade with optional legacy validators handle_ado_upgrade( parameters=parameters, resource_type=resource_type, legacy_validators=apply_legacy_validator )Note: The
--list-legacyflag provides all necessary information about
available validators, including their identifiers, descriptions, deprecated
fields, and version information. A separateado legacycommand is not needed
as this single flag covers all discovery needs.Example
--list-legacyOutput$ ado upgrade sample_stores --list-legacy Available legacy validators for sample_stores: ┌─────────────────────────────────────────────────────────────────────────────┐ │ csv_constitutive_columns_migration │ ├─────────────────────────────────────────────────────────────────────────────┤ │ Description: │ │ Migrates CSV sample stores from v1 format (constitutivePropertyColumns │ │ at top level) to v2 format (per-experiment constitutivePropertyMap) │ │ │ │ Handles deprecated fields: │ │ • constitutivePropertyColumns │ │ • propertyMap │ │ │ │ Version info: │ │ Deprecated from: v1.3.5 │ │ Removed from: v1.6.0 │ │ │ │ Usage: │ │ ado upgrade sample_stores --apply-legacy-validator csv_constitutive_columns_migration │ └─────────────────────────────────────────────────────────────────────────────┘ Found 1 legacy validator(s) for sample_storesError Detection and Suggestion Mechanism
Smart Error Handler
# orchestrator/cli/exceptions/handlers.py (enhanced) from pydantic import ValidationError from orchestrator.core.legacy.registry import LegacyValidatorRegistry def handle_validation_error_with_legacy_suggestions( error: ValidationError, resource_type: CoreResourceKinds, resource_identifier: str ) -> None: """ Analyze validation errors and suggest legacy validators if applicable """ # Extract field names from validation error deprecated_fields = extract_deprecated_fields_from_error(error) if not deprecated_fields: # Standard error handling raise error # Find applicable legacy validators validators = LegacyValidatorRegistry.find_validators_for_fields( resource_type=resource_type, field_names=deprecated_fields ) if not validators: # No legacy validators available raise error # Display helpful error message with suggestions console = Console() console.print(f"\n[bold red]Validation Error[/bold red] in {resource_type.value} '{resource_identifier}'") console.print(f"\nDeprecated fields detected: [yellow]{', '.join(deprecated_fields)}[/yellow]") console.print("\n[bold cyan]Available legacy validators:[/bold cyan]") for validator in validators: console.print(f" • [green]{validator.identifier}[/green]") console.print(f" {validator.description}") console.print(f" Handles: {', '.join(validator.deprecated_fields)}") console.print(f" Deprecated: v{validator.deprecated_from_version}") console.print() console.print("[bold magenta]To upgrade using a legacy validator:[/bold magenta]") console.print(f" ado upgrade {resource_type.value}s --apply-legacy-validator {validators[0].identifier}") console.print() console.print("[bold magenta]To list all legacy validators:[/bold magenta]") console.print(f" ado upgrade {resource_type.value}s --list-legacy") raise typer.Exit(1)Example Legacy Validators
Example 1: Entity Source to Sample Store Migration
This validator handles the renaming of
kind=entitysourcetokind=samplestore
in ADO resources.# orchestrator/core/legacy/validators/resource/entitysource_to_samplestore.py from orchestrator.core.legacy.registry import legacy_validator from orchestrator.core.resources import CoreResourceKinds @legacy_validator( identifier="entitysource_to_samplestore", resource_type=CoreResourceKinds.SAMPLESTORE, deprecated_fields=["kind"], deprecated_from_version="1.2.0", removed_from_version="1.5.0", description="Migrates resources with kind='entitysource' to kind='samplestore'" ) def migrate_entitysource_to_samplestore(data: dict) -> dict: """ Migrate old entitysource kind to samplestore Old format: - kind: "entitysource" New format: - kind: "samplestore" """ if not isinstance(data, dict): return data # Check if this is an entitysource that needs migration if data.get("kind") == "entitysource": data["kind"] = "samplestore" return dataExample 2: CSV Sample Store Migration
This validator handles the CSV sample store format changes.
# orchestrator/core/legacy/validators/samplestore/v1_to_v2_csv_migration.py from orchestrator.core.legacy.registry import legacy_validator from orchestrator.core.resources import CoreResourceKinds @legacy_validator( identifier="csv_constitutive_columns_migration", resource_type=CoreResourceKinds.SAMPLESTORE, deprecated_fields=["constitutivePropertyColumns", "propertyMap"], deprecated_from_version="1.3.5", removed_from_version="1.6.0", description="Migrates CSV sample stores from v1 format (constitutivePropertyColumns at top level) to v2 format (per-experiment constitutivePropertyMap)" ) def migrate_csv_v1_to_v2(data: dict) -> dict: """ Migrate old CSVSampleStoreDescription format to new format Old format: - constitutivePropertyColumns at top level (list) - experiments list with propertyMap (not observedPropertyMap) New format: - No constitutivePropertyColumns at top level - experiments with observedPropertyMap and constitutivePropertyMap """ if not isinstance(data, dict): return data if "constitutivePropertyColumns" not in data: return data constitutive_columns = data.pop("constitutivePropertyColumns") if "experiments" in data: for exp_desc in data["experiments"]: # Add constitutivePropertyMap exp_desc["constitutivePropertyMap"] = constitutive_columns # Rename propertyMap to observedPropertyMap if "propertyMap" in exp_desc: exp_desc["observedPropertyMap"] = exp_desc.pop("propertyMap") return dataUser Experience Flow
Scenario: User with old resources tries to upgrade
# User runs upgrade $ ado upgrade sample_stores # Error occurs with helpful message Validation Error in samplestore 'store-abc123' Deprecated fields detected: constitutivePropertyColumns, propertyMap Available legacy validators: • csv_constitutive_columns_migration Migrates CSV sample stores from v1 format to v2 format Handles: constitutivePropertyColumns, propertyMap Deprecated: v1.3.5 To upgrade using a legacy validator: ado upgrade sample_stores --apply-legacy-validator csv_constitutive_columns_migration To list all legacy validators: ado upgrade sample_stores --list-legacy # User can first inspect available validators $ ado upgrade sample_stores --list-legacy [Shows detailed information about all available validators] # User applies the fix $ ado upgrade sample_stores --apply-legacy-validator csv_constitutive_columns_migration ✓ Upgraded 3 sample stores successfullyImplementation Strategy
Phase 1: Foundation (Week 1-2)
Create legacy package structure
- Create
orchestrator/core/legacy/directory- Implement
LegacyValidatorMetadatamodel- Implement
LegacyValidatorRegistryclass- Add
@legacy_validatordecoratorMigrate existing validator
- Extract CSV migration from
CSVSampleStoreDescription.migrate_old_format()- Register as first legacy validator
- Keep original validator in place (backward compatibility)
Phase 2: CLI Integration (Week 3)
Enhance upgrade command
- Add
--apply-legacy-validatoroption toupgrade_resource()- Add
--list-legacyoption with comprehensive output- Modify
handle_ado_upgrade()to accept legacy validatorsImplement list functionality
- Create
list_legacy_validators()function- Format output with rich tables/panels
- Include all metadata and usage examples
Phase 3: Error Handling (Week 4)
Smart error detection
- Enhance validation error handler
- Add field extraction from pydantic errors
- Implement suggestion mechanism
Testing
- Unit tests for registry
- Integration tests for CLI
- End-to-end upgrade scenarios
Phase 4: Migration Path (Week 5-6)
Deprecation workflow
- When removing a validator from active code:
- Move to legacy package
- Register with metadata
- Update documentation
- Add to changelog
Documentation
- User guide for legacy validators
- Developer guide for creating validators
- Migration checklist
Benefits of This Approach
- Permanent Upgrade Paths: Validators never truly disappear
- Discoverability: Users can find and apply validators easily via
--list-legacy- Version Skipping: Users can upgrade across multiple versions
- Backward Compatibility: Old resources remain accessible
- Clear Migration: Structured process for deprecating validators
- Self-Documenting: Metadata provides context and history
- Testable: Each validator is isolated and testable
- Extensible: Easy to add new validators
- Simple CLI: Single command with two flags covers all use cases
Additional Considerations
Validator Chaining
For complex migrations spanning multiple versions:
# orchestrator/core/legacy/registry.py (enhanced) class LegacyValidatorRegistry: @classmethod def get_upgrade_chain( cls, resource_type: CoreResourceKinds, from_version: str, to_version: str ) -> List[LegacyValidatorMetadata]: """Get ordered list of validators to upgrade from one version to another""" # Implementation to find and order validatorsExcluded Features
Dry-Run Support (Not Implemented)
Rationale: We cannot predict which pydantic validators will be applied
during resource loading, making it impossible to accurately preview changes
before they occur. The upgrade process involves:
- Loading resources (triggers active pydantic validators)
- Applying legacy validators (if specified)
- Re-saving resources
Since step 1 may already modify data through active validators, a true dry-run
is not feasible.Recommendation: Users should test upgrades in a non-production environment
first.Rollback Capability (Not Implemented)
Rationale: Implementing reliable backup/restore for the metastore is
complex:
- Requires transaction management across multiple resource types
- Potential for partial failures during restore
- Storage overhead for maintaining backups
- Complexity in handling concurrent operations
Recommendation: Users should:
- Back up their metastore database before major upgrades
- Test upgrade procedures in development environments
- Use version control for YAML resource definitions
Implementation Priority
High Priority (MVP)
- Legacy validator registry and metadata models
--apply-legacy-validatorflag forado upgrade--list-legacyflag with comprehensive output- Smart error detection with suggestions
- Create two example validators:
entitysource_to_samplestore: Migrate kind fieldcsv_constitutive_columns_migration: Migrate CSV formatMedium Priority
- Validator chaining for multi-version upgrades
- Comprehensive documentation
Low Priority (Nice-to-have)
- Automated validator discovery from error patterns
- Version-based validator recommendations
Design Decisions
Storage: Validator metadata is code-only (not stored in metastore)
- Simpler implementation
- Version controlled with the codebase
- No database schema changes needed
Versioning: Validators are immutable once created
- If a validator needs changes, create a new validator with a new identifier
- Original validator remains for historical compatibility
- Clear audit trail of migration logic evolution
Distribution: Legacy validators included in core package
- Part of standard ado installation (no optional extra needed)
- Always available when needed
- Minimal overhead (just additional Python modules)
- Simplifies user experience (no extra installation step)
Automation: Manual validator selection required
- Automatic detection could apply wrong validators
- Explicit user choice ensures correct migration path
- Error messages guide users to appropriate validators
Dry-run and Rollback: Not implemented
- Dry-run: Cannot predict pydantic validator behavior for accurate
dry-run- Rollback: Backup/restore complexity outweighs benefits
- Users should test in non-production environments and maintain database
backupsPackage Configuration
Legacy validators are part of the core package with no additional configuration
needed inpyproject.toml. Theorchestrator/core/legacy/module is imported
on-demand when users invoke--apply-legacy-validatoror--list-legacyflags.Next Steps
- Validate Design: Review this plan with the team to ensure it meets
requirements- Prototype: Build MVP with CSV validator migration
- Test: Verify with real-world upgrade scenarios
- Document: Create user and developer guides
- Iterate: Refine based on feedback
Document Version: 1.4 Date: 2026-02-26 Status: Draft for Review
Changes:
- v1.1: Removed separate
ado legacycommand, renamed--extend-withto
--apply-legacy-validator- v1.2: Added design decisions (code-only metadata, immutable validators,
optional [legacy] extra)- v1.3: Removed dry-run and rollback features with rationale; updated
implementation priorities- v1.4: Changed from optional [legacy] extra to core package inclusion
Originally posted by @AlessandroPomponio in #620