🐟iTuna

Tune machine learning models for empirical identifiability and consistency

Why 🐟iTuna?

Applying machine learning to scientific data analysis often suffers from an identifiability gap: many models along the data-to-analysis pipeline lack statistical guarantees about the uniqueness of their learned representations. This means that re-running the same algorithm can yield different embeddings, making downstream interpretation unreliable without manual verification.

Identifiable representation learning addresses this by ensuring models recover representations that are unique up to a known class of transformations (permutation, linear, affine, etc.). However, even theoretically identifiable models need empirical validation to confirm they behave consistently in practice.

🐟iTuna closes this gap by providing a lightweight, model-agnostic framework to:

Train multiple instances of a model with different random seeds
Align their embeddings under the appropriate indeterminacy class
Measure how consistent the learned representations are

Think of it as a unit test for reproducibility of learned embeddings.

Features

sklearn-compatible: Works with any transformer implementing fit, transform, and standard sklearn conventions
Built-in indeterminacy classes:
- Identity - no transformation needed (model is already fully identifiable)
- Permutation - handles sign flips and component reordering (e.g., FastICA)
- Linear - linear transformation alignment (e.g., PCA)
- Affine - linear transformation with intercept (e.g., CEBRA)
Consistency scoring: Quantifies how stable embeddings are across runs
Embedding alignment: Returns aligned embeddings for downstream analysis
Flexible backends: In-memory, disk caching, distributed execution, and DataJoint support

Installation

pip install ituna

or alternative install from source

pip install git+https://github.com/dynamical-inference/ituna.git

Optional extras:

pip install "git+https://github.com/dynamical-inference/ituna.git#egg=ituna[datajoint]"  # DataJoint backend for database-backed caching
pip install "git+https://github.com/dynamical-inference/ituna.git#egg=ituna[dev]"        # Development dependencies (pytest, etc.)

Quickstart

import numpy as np
from sklearn.decomposition import FastICA

from ituna import ConsistencyEnsemble, metrics

# Generate sample data
X = np.random.randn(1000, 64)

# Create a consistency ensemble
ensemble = ConsistencyEnsemble(
    estimator=FastICA(n_components=16, max_iter=500),
    consistency_transform=metrics.PairwiseConsistency(
        indeterminacy=metrics.Permutation(),  # FastICA is identifiable up to permutation
        symmetric=False,
        include_diagonal=True,
    ),
    random_states=5,  # Train 5 instances with different seeds
)

# Fit and evaluate
ensemble.fit(X)
print("Consistency score:", ensemble.score(X))

# Get aligned embeddings
emb = ensemble.transform(X)
print("Embedding shape:", emb.shape)

Documentation

Full documentation is available at dynamical-inference.github.io/ituna.

Quickstart notebook: docs/tutorials/quickstart.ipynb - minimal working example
Core concepts: docs/tutorials/core.ipynb - in-depth walkthrough
Backends: docs/tutorials/backends.ipynb - caching and distributed execution

Backends

🐟iTuna supports different backends for caching and distributed computation:

from ituna import ConsistencyEnsemble, config, metrics
from sklearn.decomposition import FastICA

ensemble = ConsistencyEnsemble(
    estimator=FastICA(n_components=16, max_iter=500),
    consistency_transform=metrics.PairwiseConsistency(
        indeterminacy=metrics.Permutation(),
    ),
    random_states=10,
)

# Enable disk caching (avoids re-fitting identical models)
with config.config_context(DEFAULT_BACKEND="disk_cache"):
    ensemble.fit(X)

# Distributed execution with multiple workers
with config.config_context(
    DEFAULT_BACKEND="disk_cache_distributed",
    BACKEND_KWARGS={"trigger_type": "auto", "num_workers": 4},
):
    ensemble.fit(X)

CLI Commands

For large-scale experiments, use the command-line tools:

# Local distributed backend
ituna-fit-distributed --sweep-name <sweep-uuid> --cache-dir ./cache

# DataJoint backend
ituna-fit-distributed-datajoint --sweep-name <sweep-uuid> --schema-name myschema

Development

# Clone and install in development mode
git clone https://github.com/dynamical-inference/ituna.git
cd ituna
pip install -e .[dev]

# Run tests
pytest tests -v

# Setup pre-commit hooks
pre-commit install

For the full development guide — branching conventions, code style, building docs, and the release process — see CONTRIBUTING.md.

Citation

If you use 🐟iTuna in your research, please cite:

@software{ituna,
  author = {Schmidt, Tobias and Schneider, Steffen},
  title = {iTuna: Tune machine learning models for empirical identifiability and consistency},
  url = {https://github.com/dynamical-inference/ituna},
  version = {0.1.0},
}

License

🐟iTuna is released under the MIT License. If you re-use parts of the iTuna code in your own package, please make sure to copy & paste the contents of the LICENSE file into a NOTICE in your repository.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github		.github
docs		docs
ituna		ituna
slurm		slurm
tests		tests
third_party		third_party
.env.template		.env.template
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AUTHORS.md		AUTHORS.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
NOTICE		NOTICE
README.md		README.md
_config.yml		_config.yml
_toc.yml		_toc.yml
build.sh		build.sh
dashboard.py		dashboard.py
entrypoint.sh		entrypoint.sh
example.py		example.py
pyproject.toml		pyproject.toml
requirements-docs.txt		requirements-docs.txt
requirements.txt		requirements.txt
requirements3.10.txt		requirements3.10.txt
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐟iTuna

Why 🐟iTuna?

Features

Installation

Quickstart

Documentation

Backends

CLI Commands

Development

Citation

License

About

Uh oh!

Releases

Contributors 2

Uh oh!

Languages

License

dynamical-inference/ituna

Folders and files

Latest commit

History

Repository files navigation

🐟iTuna

Why 🐟iTuna?

Features

Installation

Quickstart

Documentation

Backends

CLI Commands

Development

Citation

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Contributors 2

Uh oh!

Languages