Tune machine learning models for empirical identifiability and consistency
Applying machine learning to scientific data analysis often suffers from an identifiability gap: many models along the data-to-analysis pipeline lack statistical guarantees about the uniqueness of their learned representations. This means that re-running the same algorithm can yield different embeddings, making downstream interpretation unreliable without manual verification.
Identifiable representation learning addresses this by ensuring models recover representations that are unique up to a known class of transformations (permutation, linear, affine, etc.). However, even theoretically identifiable models need empirical validation to confirm they behave consistently in practice.
πiTuna closes this gap by providing a lightweight, model-agnostic framework to:
- Train multiple instances of a model with different random seeds
- Align their embeddings under the appropriate indeterminacy class
- Measure how consistent the learned representations are
Think of it as a unit test for reproducibility of learned embeddings.
- sklearn-compatible: Works with any transformer implementing
fit,transform, and standard sklearn conventions - Built-in indeterminacy classes:
Identity- no transformation needed (model is already fully identifiable)Permutation- handles sign flips and component reordering (e.g., FastICA)Linear- linear transformation alignment (e.g., PCA)Affine- linear transformation with intercept (e.g., CEBRA)
- Consistency scoring: Quantifies how stable embeddings are across runs
- Embedding alignment: Returns aligned embeddings for downstream analysis
- Flexible backends: In-memory, disk caching, distributed execution, and DataJoint support
pip install itunaor alternative install from source
pip install git+https://github.com/dynamical-inference/ituna.gitOptional extras:
pip install "git+https://github.com/dynamical-inference/ituna.git#egg=ituna[datajoint]" # DataJoint backend for database-backed caching
pip install "git+https://github.com/dynamical-inference/ituna.git#egg=ituna[dev]" # Development dependencies (pytest, etc.)import numpy as np
from sklearn.decomposition import FastICA
from ituna import ConsistencyEnsemble, metrics
# Generate sample data
X = np.random.randn(1000, 64)
# Create a consistency ensemble
ensemble = ConsistencyEnsemble(
estimator=FastICA(n_components=16, max_iter=500),
consistency_transform=metrics.PairwiseConsistency(
indeterminacy=metrics.Permutation(), # FastICA is identifiable up to permutation
symmetric=False,
include_diagonal=True,
),
random_states=5, # Train 5 instances with different seeds
)
# Fit and evaluate
ensemble.fit(X)
print("Consistency score:", ensemble.score(X))
# Get aligned embeddings
emb = ensemble.transform(X)
print("Embedding shape:", emb.shape)Full documentation is available at dynamical-inference.github.io/ituna.
- Quickstart notebook:
docs/tutorials/quickstart.ipynb- minimal working example - Core concepts:
docs/tutorials/core.ipynb- in-depth walkthrough - Backends:
docs/tutorials/backends.ipynb- caching and distributed execution
πiTuna supports different backends for caching and distributed computation:
from ituna import ConsistencyEnsemble, config, metrics
from sklearn.decomposition import FastICA
ensemble = ConsistencyEnsemble(
estimator=FastICA(n_components=16, max_iter=500),
consistency_transform=metrics.PairwiseConsistency(
indeterminacy=metrics.Permutation(),
),
random_states=10,
)
# Enable disk caching (avoids re-fitting identical models)
with config.config_context(DEFAULT_BACKEND="disk_cache"):
ensemble.fit(X)
# Distributed execution with multiple workers
with config.config_context(
DEFAULT_BACKEND="disk_cache_distributed",
BACKEND_KWARGS={"trigger_type": "auto", "num_workers": 4},
):
ensemble.fit(X)For large-scale experiments, use the command-line tools:
# Local distributed backend
ituna-fit-distributed --sweep-name <sweep-uuid> --cache-dir ./cache
# DataJoint backend
ituna-fit-distributed-datajoint --sweep-name <sweep-uuid> --schema-name myschema# Clone and install in development mode
git clone https://github.com/dynamical-inference/ituna.git
cd ituna
pip install -e .[dev]
# Run tests
pytest tests -v
# Setup pre-commit hooks
pre-commit installFor the full development guide β branching conventions, code style, building docs, and the release process β see CONTRIBUTING.md.
If you use πiTuna in your research, please cite:
@software{ituna,
author = {Schmidt, Tobias and Schneider, Steffen},
title = {iTuna: Tune machine learning models for empirical identifiability and consistency},
url = {https://github.com/dynamical-inference/ituna},
version = {0.1.0},
}πiTuna is released under the MIT License. If you re-use parts of the iTuna code in your own package, please make sure to copy & paste the contents of the LICENSE file into a NOTICE in your repository.