Skip to content

🐟 ituna – tune machine learning models for empirical identifiability and consistency

License

Notifications You must be signed in to change notification settings

dynamical-inference/ituna

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🐟iTuna

iTuna Documentation PyPI version Python versions License Build

Tune machine learning models for empirical identifiability and consistency

Why 🐟iTuna?

Applying machine learning to scientific data analysis often suffers from an identifiability gap: many models along the data-to-analysis pipeline lack statistical guarantees about the uniqueness of their learned representations. This means that re-running the same algorithm can yield different embeddings, making downstream interpretation unreliable without manual verification.

Identifiable representation learning addresses this by ensuring models recover representations that are unique up to a known class of transformations (permutation, linear, affine, etc.). However, even theoretically identifiable models need empirical validation to confirm they behave consistently in practice.

🐟iTuna closes this gap by providing a lightweight, model-agnostic framework to:

  1. Train multiple instances of a model with different random seeds
  2. Align their embeddings under the appropriate indeterminacy class
  3. Measure how consistent the learned representations are

Think of it as a unit test for reproducibility of learned embeddings.

Features

  • sklearn-compatible: Works with any transformer implementing fit, transform, and standard sklearn conventions
  • Built-in indeterminacy classes:
    • Identity - no transformation needed (model is already fully identifiable)
    • Permutation - handles sign flips and component reordering (e.g., FastICA)
    • Linear - linear transformation alignment (e.g., PCA)
    • Affine - linear transformation with intercept (e.g., CEBRA)
  • Consistency scoring: Quantifies how stable embeddings are across runs
  • Embedding alignment: Returns aligned embeddings for downstream analysis
  • Flexible backends: In-memory, disk caching, distributed execution, and DataJoint support

Installation

pip install ituna

or alternative install from source

pip install git+https://github.com/dynamical-inference/ituna.git

Optional extras:

pip install "git+https://github.com/dynamical-inference/ituna.git#egg=ituna[datajoint]"  # DataJoint backend for database-backed caching
pip install "git+https://github.com/dynamical-inference/ituna.git#egg=ituna[dev]"        # Development dependencies (pytest, etc.)

Quickstart

import numpy as np
from sklearn.decomposition import FastICA

from ituna import ConsistencyEnsemble, metrics

# Generate sample data
X = np.random.randn(1000, 64)

# Create a consistency ensemble
ensemble = ConsistencyEnsemble(
    estimator=FastICA(n_components=16, max_iter=500),
    consistency_transform=metrics.PairwiseConsistency(
        indeterminacy=metrics.Permutation(),  # FastICA is identifiable up to permutation
        symmetric=False,
        include_diagonal=True,
    ),
    random_states=5,  # Train 5 instances with different seeds
)

# Fit and evaluate
ensemble.fit(X)
print("Consistency score:", ensemble.score(X))

# Get aligned embeddings
emb = ensemble.transform(X)
print("Embedding shape:", emb.shape)

Documentation

Full documentation is available at dynamical-inference.github.io/ituna.

Backends

🐟iTuna supports different backends for caching and distributed computation:

from ituna import ConsistencyEnsemble, config, metrics
from sklearn.decomposition import FastICA

ensemble = ConsistencyEnsemble(
    estimator=FastICA(n_components=16, max_iter=500),
    consistency_transform=metrics.PairwiseConsistency(
        indeterminacy=metrics.Permutation(),
    ),
    random_states=10,
)

# Enable disk caching (avoids re-fitting identical models)
with config.config_context(DEFAULT_BACKEND="disk_cache"):
    ensemble.fit(X)

# Distributed execution with multiple workers
with config.config_context(
    DEFAULT_BACKEND="disk_cache_distributed",
    BACKEND_KWARGS={"trigger_type": "auto", "num_workers": 4},
):
    ensemble.fit(X)

CLI Commands

For large-scale experiments, use the command-line tools:

# Local distributed backend
ituna-fit-distributed --sweep-name <sweep-uuid> --cache-dir ./cache

# DataJoint backend
ituna-fit-distributed-datajoint --sweep-name <sweep-uuid> --schema-name myschema

Development

# Clone and install in development mode
git clone https://github.com/dynamical-inference/ituna.git
cd ituna
pip install -e .[dev]

# Run tests
pytest tests -v

# Setup pre-commit hooks
pre-commit install

For the full development guide β€” branching conventions, code style, building docs, and the release process β€” see CONTRIBUTING.md.

Citation

If you use 🐟iTuna in your research, please cite:

@software{ituna,
  author = {Schmidt, Tobias and Schneider, Steffen},
  title = {iTuna: Tune machine learning models for empirical identifiability and consistency},
  url = {https://github.com/dynamical-inference/ituna},
  version = {0.1.0},
}

License

🐟iTuna is released under the MIT License. If you re-use parts of the iTuna code in your own package, please make sure to copy & paste the contents of the LICENSE file into a NOTICE in your repository.

About

🐟 ituna – tune machine learning models for empirical identifiability and consistency

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors 2

  •  
  •