Infinite Context

Give any local LLM unlimited memory. Today.

Turn your 8K context Gemma or 16K Phi into a model with 11 million+ token memory — with 100% retrieval accuracy and sub-millisecond latency.

Try It NOW (Pick Your Favorite)

Zero Install - Just Run

# Docker (one command, works everywhere)
docker run -it --rm --network host andrewmang/infinite-context

# Or with docker-compose for full stack
curl -O https://raw.githubusercontent.com/Lumi-node/infinite-context/main/docker-compose.yml
docker-compose up -d

Live Demo (No Install At All)

Try it on Hugging Face Spaces - See HAT in action right in your browser!

One-Line Installer

# Linux/macOS - installs everything automatically
curl -sSL https://raw.githubusercontent.com/Lumi-node/infinite-context/main/install.sh | bash

Install from Source

# Clone the repo
git clone https://github.com/Lumi-node/infinite-context
cd infinite-context

# Install Python package (recommended - full HAT support)
pip install maturin sentence-transformers
maturin develop --release

# Or build Rust CLI (benchmarks only)
cargo build --release

Real Benchmarks

Model	Native Context	With HAT	Extension
gemma3:1b	8K	11.3M+	1,413x
phi4	16K	11.3M+	706x
llama3.2	8K	11.3M+	1,413x

The Problem

Local models like Gemma 3 (8K) and Phi 4 (16K) are powerful — but they forget everything outside their tiny context window. RAG systems try to help but deliver ~70% accuracy at best, losing critical information.

The Solution

Hierarchical Attention Tree (HAT) — exploits the natural hierarchy of conversations:

Instead of searching all chunks O(n), HAT does O(log n) beam search through the hierarchy — giving sub-millisecond queries on millions of tokens with 100% accuracy.

Detailed Setup

Docker Usage

# Pull and run immediately
docker run -it --rm --network host andrewmang/infinite-context

# Run benchmark
docker run -it --rm andrewmang/infinite-context infinite-context bench --chunks 100000

# Full stack with Ollama
docker-compose up -d
docker-compose exec infinite-context infinite-context chat --model gemma3:1b

Python API (Recommended - Full HAT Support)

This is the production-ready interface. The Python API uses real embeddings + HAT retrieval + Ollama.

# From the repo (after cloning)
pip install maturin sentence-transformers
maturin develop --release

from infinite_context import InfiniteContext

# Initialize - connects to Ollama
ctx = InfiniteContext(model="gemma3:1b")

# Add information (automatically embedded with sentence-transformers and indexed in HAT)
ctx.add("My name is Alex and I work on quantum computing.")
ctx.add("The latest experiment showed 47% improvement in coherence.")

# Chat - HAT retrieves relevant context, injects it into prompt, queries Ollama
response = ctx.chat("What were the quantum experiment results?")
print(response)  # References the 47% improvement

# Save memory to disk
ctx.save("my_memory.hat")

# Load later
ctx = InfiniteContext.load("my_memory.hat", model="gemma3:1b")

Low-Level API

from infinite_context import HatIndex
from sentence_transformers import SentenceTransformer

# Setup
embedder = SentenceTransformer('all-MiniLM-L6-v2')
index = HatIndex.cosine(384)

# Add embeddings
embedding = embedder.encode("Important info", normalize_embeddings=True)
index.add(embedding.tolist())

# Query
query_emb = embedder.encode("What's important?", normalize_embeddings=True)
results = index.near(query_emb.tolist(), k=10)

# Persist
index.save("index.hat")

Rust CLI (Benchmarks & Testing)

The Rust CLI is useful for benchmarking HAT performance and testing Ollama connectivity.

Note: For actual chat with HAT memory retrieval, use the Python API above.

# Build the CLI
cargo build --release

# Run HAT performance benchmark
./target/release/infinite-context bench --chunks 100000

# Test Ollama connection
./target/release/infinite-context test --model gemma3:1b

# List available models
./target/release/infinite-context models

System Requirements

Rust: 1.70+ (for CLI)
Python: 3.9+ (for Python API)
Ollama: Any version
RAM: 4GB minimum

Building from Source

git clone https://github.com/Lumi-node/infinite-context
cd infinite-context

# Rust CLI
cargo build --release
./target/release/infinite-context --help

# Python wheel
pip install maturin
maturin develop --release

Why This Exists

Big AI companies charge $20+/month for extended context. Cloud APIs cost money per token. Your data goes to their servers.

Infinite Context is different:

Free: Runs on your hardware
Private: Nothing leaves your machine
Unlimited: No token limits
Accurate: 100% retrieval (not 70% RAG)
Fast: Sub-millisecond queries

Democratized AI memory. Everyone deserves infinite context.

Research

Based on the Hierarchical Attention Tree (HAT) algorithm. Key insight: conversations naturally form hierarchies (sessions → documents → chunks). HAT exploits this structure for O(log n) retrieval with perfect accuracy.

License

MIT

Get Started in 10 Seconds

Method	Command	Notes
Docker	`docker run -it --rm --network host andrewmang/infinite-context`	Full setup
Browser	Hugging Face Spaces	Try HAT live
Source	`git clone ... && maturin develop --release`	Python API (recommended)

Stop forgetting. Start remembering.
Democratized AI memory. Everyone deserves infinite context.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
huggingface-space		huggingface-space
python/infinite_context		python/infinite_context
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
install.sh		install.sh
pyproject.toml		pyproject.toml
realistic_demo_20260112_144744.json		realistic_demo_20260112_144744.json
wow_demo_results_20260112_144007.json		wow_demo_results_20260112_144007.json
wow_demo_v2_20260112_144122.json		wow_demo_v2_20260112_144122.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Infinite Context

Try It NOW (Pick Your Favorite)

Zero Install - Just Run

Live Demo (No Install At All)

One-Line Installer

Install from Source

Real Benchmarks

The Problem

The Solution

Detailed Setup

Docker Usage

Python API (Recommended - Full HAT Support)

Low-Level API

Rust CLI (Benchmarks & Testing)

System Requirements

Building from Source

Why This Exists

Research

License

Get Started in 10 Seconds

About

Uh oh!

Releases

Packages

Languages

License

Lumi-node/infinite-context

Folders and files

Latest commit

History

Repository files navigation

Infinite Context

Try It NOW (Pick Your Favorite)

Zero Install - Just Run

Live Demo (No Install At All)

One-Line Installer

Install from Source

Real Benchmarks

The Problem

The Solution

Detailed Setup

Docker Usage

Python API (Recommended - Full HAT Support)

Low-Level API

Rust CLI (Benchmarks & Testing)

System Requirements

Building from Source

Why This Exists

Research

License

Get Started in 10 Seconds

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages