AI Infrastructure — Let's Open

Building blocks of open source AI

AI infrastructure is the plumbing that makes everything else possible. It’s the tools for training, fine-tuning, serving, monitoring, and deploying AI systems — and the open source ecosystem here is remarkably strong.

Key categories

Inference engines

vLLM — High-throughput LLM serving TGI (Hugging Face) — Text Generation Inference SGLang — Fast serving with structured generation Triton — NVIDIA’s inference server

Training & fine-tuning

Axolotl — Easy fine-tuning with multiple methods Unsloth — Fast, memory-efficient fine-tuning TRL — Transformer Reinforcement Learning torchtune — PyTorch-native fine-tuning

Orchestration & pipelines

LangChain / LangGraph — LLM application framework LlamaIndex — Data framework for LLM applications Haystack — End-to-end NLP/LLM framework

Vector stores & retrieval

Qdrant — High-performance vector database Chroma — Open source embedding database Weaviate — Vector search engine Milvus — Scalable vector database

Evaluation & monitoring

LangSmith — LLM observability (partially open) Phoenix (Arize) — ML observability Promptfoo — LLM output testing

Infrastructure decisions for agent apps

Most AI infrastructure choices are trade-offs between control, latency, cost, and operational burden. For agent apps, the most important question is not “what is the best tool?” It is “which boundary needs to stay inspectable and replaceable?”

Serving boundary

Choose local inference when privacy, offline work, or cost control matters most. Choose hosted open-weight serving when throughput and operational simplicity matter more. Choose a provider fallback when a task needs stronger reasoning or long-context quality than your open stack can reliably provide.

Retrieval boundary

Start with the data system you already operate. Postgres plus pgvector is enough for many products. Move to Qdrant, Weaviate, or Milvus when retrieval quality, scale, filtering, or operations become real constraints.

Orchestration boundary

Use direct model calls for simple flows. Use LangGraph or similar state-machine tooling when the agent needs explicit state, retries, approvals, and tool sequencing. Use multi-agent frameworks only when separate roles create real leverage.

Evaluation boundary

Do not wait for a perfect benchmark. Save real tasks, expected behaviors, traces, and failure cases from the beginning. For agents, regression tests should cover tool use, approval behavior, citation quality, and recovery from tool errors.

Minimum production checklist

Before an AI app or agent becomes daily infrastructure, the stack should have:

a known model route for default, fallback, and failure cases
logs for prompts, model responses, tool calls, and errors
permission boundaries between read tools and write tools
a retrieval plan that explains what is indexed, refreshed, and excluded
an evaluation set built from real user tasks
deployment rollback or model rollback
license and data-policy review for models, tools, and hosted services
a human approval path for irreversible actions

If this checklist feels too heavy, the product is probably not ready for autonomy. Narrow the job before adding more infrastructure.

What to watch

Inference optimization making larger models practical on smaller hardware
Unified serving platforms that handle multiple model types
Evaluation becoming a first-class concern (not an afterthought)
Cost of self-hosting continuing to drop

The open source AI stack, explained — Layer-by-layer stack overview
Open Models — Model routing and readiness checks
Open Source AI Agents — Agent builder paths and maturity signals