A comprehensive collection of the best tools, frameworks, models, and resources for Large Language Model Operations (LLMOps)
- What's New
- What is LLMOps?
- LLMOps vs MLOps
- Models
- Inference & Serving
- Orchestration
- Training & Fine-Tuning
- Prompt Engineering
- Vector Search & RAG
- Observability & Monitoring
- Security & Safety
- Data Management
- Optimization & Performance
- Development Tools
- LLMOps Platforms
- Resources & Learning
- Contributing
Infrastructure & Deployment:
Evaluation & Testing:
Agent Frameworks:
- Phidata - Build AI assistants with memory and knowledge
- Composio - Integration platform for AI agents
Monitoring & Observability:
- vLLM continues to dominate high-throughput inference
- LangGraph gaining traction for stateful agent workflows
- Ollama becoming the go-to for local LLM deployment
- DeepSeek models showing impressive cost-performance ratios
LLMOps (Large Language Model Operations) is a set of practices, tools, and workflows designed to deploy, monitor, and maintain large language models in production environments. It encompasses the entire lifecycle of LLM applications, from development and training to deployment, monitoring, and continuous improvement.
- Model Development: Training, fine-tuning, and optimizing LLMs
- Deployment: Serving models efficiently at scale
- Monitoring: Tracking performance, costs, and quality
- Prompt Management: Version control and optimization of prompts
- Security: Ensuring safe and responsible AI usage
- Evaluation: Testing and validating model outputs
- Data Management: Handling training data and embeddings
| Aspect | MLOps | LLMOps |
|---|---|---|
| Model Size | Typically smaller models | Very large models (billions of parameters) |
| Training | Full model training common | Fine-tuning and prompt engineering preferred |
| Deployment | Standard serving infrastructure | Specialized inference optimization required |
| Monitoring | Metrics-focused | Quality, safety, and cost-focused |
| Versioning | Model versions | Model + prompt + configuration versions |
| Cost | Moderate compute costs | High compute and inference costs |
| Latency | Milliseconds | Seconds (streaming helps) |
| Data | Structured/tabular data | Unstructured text, multimodal data |
| Model | Description | Stars | License |
|---|---|---|---|
| LLaMA | Meta's foundational large language models | Research | |
| Mistral | High-performance open models from Mistral AI | Apache 2.0 | |
| Gemma | Google's lightweight open models | N/A | Gemma License |
| Qwen | Alibaba's multilingual LLM series | Apache 2.0 | |
| DeepSeek | Cost-effective open-source LLMs | MIT | |
| Phi | Microsoft's small language models | N/A | MIT |
| ChatGLM | Bilingual conversational language model | Apache 2.0 | |
| Alpaca | Stanford's instruction-following model | Apache 2.0 | |
| Vicuna | Open chatbot trained by fine-tuning LLaMA | Apache 2.0 | |
| BELLE | Chinese language model based on LLaMA | Apache 2.0 | |
| Falcon | TII's high-performance open models | N/A | Apache 2.0 |
| Bloom | Multilingual LLM from BigScience | RAIL |
| Model | Description | Stars |
|---|---|---|
| LLaVA | Large Language and Vision Assistant | |
| MiniCPM-V | Efficient multimodal model | |
| Qwen-VL | Vision-language model from Alibaba |
| Model | Description | Stars |
|---|---|---|
| Whisper | OpenAI's speech recognition model | |
| Faster Whisper | Fast inference engine for Whisper |
| Tool | Description | Stars |
|---|---|---|
| vLLM | High-throughput and memory-efficient inference engine | |
| llama.cpp | LLM inference in C/C++ | |
| TensorRT-LLM | NVIDIA's optimized inference library | |
| LMDeploy | Toolkit for compressing and deploying LLMs | |
| DeepSpeed-MII | Low-latency inference powered by DeepSpeed | |
| CTranslate2 | Fast inference engine for Transformer models | |
| Cortex.cpp | Local AI API Platform | |
| LoRAX | Multi-LoRA inference server | |
| MInference | Speed up long-context LLM inference | |
| ipex-llm | Accelerate LLM inference on Intel hardware |
| Platform | Description | Stars |
|---|---|---|
| Ollama | Run LLMs locally with ease | |
| LocalAI | OpenAI-compatible API for local models | |
| LM Studio | Desktop app for running LLMs locally | N/A |
| GPUStack | Manage GPU clusters for LLM inference | |
| OpenLLM | Operating LLMs in production | |
| Ray Serve | Scalable model serving with Ray |
| Framework | Description | Stars |
|---|---|---|
| BentoML | Unified model serving framework | |
| Triton Inference Server | NVIDIA's optimized inference solution | |
| TorchServe | Serve PyTorch models in production | |
| TensorFlow Serving | Flexible ML serving system | |
| Jina | Build multimodal AI services | |
| Mosec | Model serving with dynamic batching | |
| Infinity | REST API for text embeddings |
| Framework | Description | Stars |
|---|---|---|
| LangChain | Framework for developing LLM applications | |
| LlamaIndex | Data framework for LLM applications | |
| Haystack | End-to-end NLP framework | |
| Semantic Kernel | Microsoft's SDK for AI orchestration | |
| Langfuse | Open-source LLM engineering platform | |
| Neurolink | Universal AI development platform |
| Framework | Description | Stars |
|---|---|---|
| AutoGPT | Autonomous AI agent framework | |
| CrewAI | Framework for orchestrating AI agents | |
| AutoGen | Multi-agent conversation framework | |
| LangGraph | Build stateful multi-actor applications | |
| AgentMark | Type-safe Markdown-based agents |
| Tool | Description | Stars |
|---|---|---|
| Prefect | Modern workflow orchestration | |
| Airflow | Platform to programmatically author workflows | |
| Flyte | Kubernetes-native workflow automation | |
| Flowise | Drag & drop UI for LLM flows |
| Framework | Description | Stars |
|---|---|---|
| DeepSpeed | Deep learning optimization library | |
| Megatron-LM | Large-scale transformer training | |
| PyTorch FSDP | Fully Sharded Data Parallel | N/A |
| Colossal-AI | Unified deep learning system | |
| Accelerate | Simple way to train on distributed setups |
| Tool | Description | Stars |
|---|---|---|
| Axolotl | Streamlined LLM fine-tuning | |
| LLaMA-Factory | Unified fine-tuning framework | |
| PEFT | Parameter-Efficient Fine-Tuning | |
| Unsloth | 2x faster LLM fine-tuning | |
| TRL | Transformer Reinforcement Learning | |
| LitGPT | Pretrain, fine-tune, deploy LLMs |
| Tool | Description | Stars |
|---|---|---|
| Weights & Biases | ML experiment tracking | |
| MLflow | Open-source ML lifecycle platform | |
| TensorBoard | TensorFlow's visualization toolkit | |
| Aim | Easy-to-use experiment tracker |
| Tool | Description | Link |
|---|---|---|
| PromptBase | Marketplace for prompt engineering | π |
| PromptHero | Prompt engineering resources | π |
| Prompt Perfect | Auto prompt optimizer | π |
| Learn Prompting | Prompt engineering tutorials | π |
| LangSmith | Debug and test LLM applications | π |
| PromptLayer | Prompt engineering platform | π |
| Tool | Description | Stars |
|---|---|---|
| Chroma | AI-native embedding database | |
| Weaviate | Vector search engine | |
| Qdrant | Vector similarity search engine | |
| Milvus | Cloud-native vector database | |
| Pinecone | Managed vector database | N/A |
| FAISS | Efficient similarity search library | |
| pgvector | Vector similarity search for Postgres | |
| LanceDB | Developer-friendly vector database |
| Tool | Description | Stars |
|---|---|---|
| Langfuse | Open-source LLM observability | |
| Phoenix | AI observability & evaluation | |
| Helicone | Open-source LLM observability | |
| Lunary | Production toolkit for LLMs | N/A |
| OpenLIT | OpenTelemetry-native LLM observability | |
| Evidently | ML and LLM observability framework | |
| DeepEval | LLM evaluation framework | |
| PostHog | Product analytics and feature flags |
| Tool | Description | Stars |
|---|---|---|
| NeMo Guardrails | Programmable guardrails for LLM apps | |
| Guardrails AI | Add guardrails to LLM applications | |
| LLM Guard | Security toolkit for LLM interactions | |
| Rebuff | Prompt injection detection | |
| LangKit | LLM monitoring toolkit |
| Tool | Description | Stars |
|---|---|---|
| DVC | Data version control | |
| LakeFS | Git for data lakes | |
| Pachyderm | Data versioning and pipelines | |
| Delta Lake | Storage framework for data lakes |
| Tool | Description | Stars |
|---|---|---|
| ONNX Runtime | Cross-platform ML accelerator | |
| TVM | ML compiler framework | |
| BitsAndBytes | 8-bit optimizers and quantization | |
| AutoGPTQ | Easy-to-use LLM quantization | |
| GPTQ-for-LLaMa | 4-bit quantization for LLaMA |
| Tool | Description | Stars |
|---|---|---|
| GitHub Copilot | AI pair programmer | N/A |
| Cursor | AI-first code editor | N/A |
| Continue | Open-source AI code assistant | |
| Cody | AI coding assistant | N/A |
| Tabby | Self-hosted AI coding assistant |
| Tool | Description | Stars |
|---|---|---|
| Jupyter | Interactive computing environment | |
| Google Colab | Free cloud notebooks | N/A |
| Gradient | Managed notebooks and workflows | N/A |
| Platform | Description | Stars |
|---|---|---|
| Agenta | LLMOps platform for building robust apps | |
| Dify | LLM app development platform | |
| Pezzo | Open-source LLMOps platform | |
| Humanloop | Prompt management and evaluation | N/A |
| PromptLayer | Prompt engineering platform | N/A |
| Weights & Biases | ML platform with LLM support | N/A |
- OpenAI Cookbook - Examples and guides for OpenAI API
- LLM University - Cohere's LLM learning resources
- Hugging Face Course - NLP with Transformers
- Full Stack LLM Bootcamp - Comprehensive LLM course
- Awesome LLM - Curated list of LLM resources
- Awesome ChatGPT Prompts - Prompt examples
- Awesome AI Agents - AI agent resources
- Awesome LangChain - LangChain resources
- Attention Is All You Need - Original Transformer paper
- BERT: Pre-training of Deep Bidirectional Transformers
- GPT-3: Language Models are Few-Shot Learners
- LLaMA: Open and Efficient Foundation Language Models
- LoRA: Low-Rank Adaptation of Large Language Models
We welcome contributions from the community! Here's how you can help:
- Fork the repository
- Create a new branch (
git checkout -b feature/amazing-tool) - Add your contribution following our guidelines
- Commit your changes (
git commit -m 'Add amazing tool') - Push to the branch (
git push origin feature/amazing-tool) - Open a Pull Request
- Quality over quantity: Only add tools/resources you've personally used or thoroughly researched
- Keep descriptions concise: 1-2 sentences maximum
- Include GitHub stars badge: Use the format shown in existing entries
- Maintain alphabetical order: Within each category
- Check for duplicates: Search before adding
- Update the Table of Contents: If adding new sections
- Follow the existing format: Match the style of current entries
- β New tools, frameworks, or platforms
- β Useful resources, tutorials, or guides
- β Bug fixes or improvements to existing entries
- β Better descriptions or categorizations
- β Promotional content or spam
- β Outdated or unmaintained projects (unless historically significant)
See CONTRIBUTING.md for detailed guidelines.
This project is licensed under CC0 1.0 Universal. See LICENSE for details.
This repository is inspired by and builds upon several excellent awesome lists:
Special thanks to all contributors who help maintain and improve this resource!
If you find this repository helpful, please consider giving it a βοΈ
Made with β€οΈ by the community
