VisionQuery is a high-performance visual discovery tool designed to query massive image datasets using natural language. By leveraging Modal for serverless GPU orchestration and LanceDB for multi-modal vector indexing, the goal is to enable sub-second retrieval across millions of assets.
Status: Work in Progress Current build supports high-performance indexing for locally saved videos and CIFAR-10 images. Support for direct S3 video streaming is under development.
Ensure you have Python 3.10+ and the Modal CLI installed.
pip install -r requirements.txt
modal authDeploy the Moondream2 model to Modal's cloud GPU infrastructure:
modal deploy src/vlm_worker.pypython main.pyRun the search script to query your indexed frames via natural language:
python search.py --query "a blue truck"[ ] S3 Integration: Direct ingestion from AWS S3 buckets.
[ ] Visual Interface: A web-based UI to browse and filter indexed frames visually.
[ ] LoRA Fine-tuning: Implementing Low-Rank Adaptation to enforce specific captioning styles for niche domains.