Turn spare GPU capacity into a shared inference mesh. Serve many models across machines, run models larger than any single device, and scale capacity to meet demand. OpenAI-compatible API on every node.
No coordinator, no cloud, no API keys. Machines pool their VRAM over QUIC.
Pick a model. mesh-llm downloads it, starts serving, prints an invite token.
Paste the token or use --auto to discover via Nostr. The mesh auto-assigns models and splits layers by VRAM.
Every node gets localhost:9337/v1 — standard OpenAI API. Works with any tool.
Model doesn't fit? Layers split across nodes by VRAM. Peers selected by lowest RTT first — 80ms hard cap keeps splits fast. Solo mode when a model fits on one machine.
Different nodes serve different models. API proxy routes by model field. Nodes auto-assigned based on what's needed and what's on disk.
Request rates tracked per model, shared via gossip. Standby nodes promote to serve hot models. Dead hosts replaced within 60 seconds.
Publish your mesh to Nostr relays. Others find it with --auto. Smart scoring: region match, VRAM, health probe before joining.
Weights read from local GGUF files, not sent over the network. Model load: 111s → 5s. Per-token RPC round-trips: 558 → 8.
GPU nodes gossip. Clients use lightweight routing tables — zero per-client server state. Event-driven: cost proportional to topology changes, not node count.
Draft model runs locally, proposes tokens verified in one batched pass. +38% throughput on code. Auto-detected from catalog.
Live topology, VRAM bars, model picker, built-in chat. API-driven — everything the console shows comes from JSON endpoints.
OpenAI-compatible API on localhost:9337. Use with goose, pi, opencode, or any tool that supports custom OpenAI endpoints.
macOS Apple Silicon. One command to install, one to run.
Standard OpenAI API on localhost:9337. Works with anything.
Add to ~/.pi/agent/models.json:
Claude Code uses Anthropic's API format. Use claude-code-proxy or litellm to translate.
One binary. macOS Apple Silicon and Linux. MIT licensed.