UPDATE: THIS REPO WILL BE ARCHIVED> THE UPDATED REPO CAN BE FOUND AT https://github.com/RegularJoe-CEO/Geodesic-Attention-Engine-GAE-
The Waller Operator eliminates O(NΒ²) memory bottlenecks in transformer attention, achieving O(N log N) memory and constant ~14ms latency across 4K-2.6M token sequences on NVIDIA H100.
Download waller_eval_x86 and run:
chmod +x waller_eval_x86
./waller_eval_x86 --helpAdjust CLI parameters for custom sequence lengths, batch sizes, head dimensions, and more:
./waller_eval_x86 --seq_len 131072 --batch_size 1 --num_heads 32 --head_dim 128Requirements:
- NVIDIA H100 GPU (sm_90) or A100 (sm_80)
- CUDA 12.8
- x86_64 Linux
See full results at: waller-eval/RESULTS.md
Key Performance:
- Memory: O(N log N) vs O(NΒ²) standard attention
- Latency: ~14ms constant across 4K-2.6M tokens
- Equivalence: Mathematically identical to standard softmax attention
- No retraining required
- Built: Feb 2026
- Platform: x86_64 Linux
- CUDA Compute: sm_80 (A100), sm_90 (H100)
- Size: 0.97 MB
Patent Pending: US Provisional Application filed Feb 2026
Title: Memory-Efficient Attention Computation System and Method
Licensing inquiries: e@ewaller.com
Breakthrough IP in AI inference, energy efficiency, and materials science.
Contact: e@ewaller.com
Web: luxiedge.com