Open Source Models
AgentCrew includes built-in support for running open source models locally via Ollama. When you select Ollama as your model provider, AgentCrew automatically manages the entire lifecycle: starting the Ollama container, pulling models, warming them up, and stopping the container when no teams need it anymore.
This means you can run AI agent teams entirely on your own hardware, with no external API keys required and full data privacy.
How It Works
Shared Infrastructure
Unlike team containers (which are isolated per team), Ollama runs as
shared infrastructure. A single
agentcrew-ollama container serves all teams that use the
Ollama provider. This avoids duplicating large model files and reduces
resource usage.
- Reference counting: AgentCrew tracks how many teams are using Ollama. The container starts when the first Ollama team deploys and stops when the last one is removed.
- Persistent storage: Downloaded models are stored in a
Docker volume (
agentcrew-ollama-models) that persists even when the container stops. Models only need to be downloaded once. - Multi-network: The Ollama container connects to each
team's Docker network, so agent containers can reach it via DNS
(
agentcrew-ollama:11434).
Automatic Lifecycle
When you deploy a team with the Ollama provider, AgentCrew automatically:
- Starts the
agentcrew-ollamacontainer (or reuses it if already running). - Connects it to the team's Docker network.
- Pulls the selected model (if not already downloaded).
- Warms up the model by loading weights into RAM, avoiding cold-start delays on the first message.
- Deploys the team's agent containers with
OLLAMA_BASE_URLpre-configured.
When you stop a team, AgentCrew disconnects Ollama from that team's network and decrements the reference count. If no other teams are using Ollama, the container is stopped (but the volume with downloaded models is preserved).
GPU Support
AgentCrew automatically detects NVIDIA GPUs on the host machine. If
nvidia-smi is found in the system PATH, GPU passthrough is
enabled for the Ollama container, giving models access to all available
GPUs for dramatically faster inference.
No manual configuration is needed. If a GPU is available, it will be used automatically. You can verify GPU status via the status endpoint.
Using Ollama in AgentCrew
Creating a Team
- In the team creation wizard, select OpenCode as the provider.
- Choose Ollama as the model provider.
-
Select a model for your agents. The default model is
qwen3:4b, but you can use any model available in the Ollama model library. - Configure your agents as usual. All agents in the team will use the selected Ollama model provider.
Model Format
When specifying agent models, use the ollama/ prefix followed
by the model name and optional tag:
ollama/qwen3:4bollama/llama3.3:8bollama/codellama:13bollama/mistral:7bollama/devstral
You can also use inherit to let the agent use the team's
default model.
Model Provider Constraint
When a team's model provider is set to Ollama, all agents in that team must use Ollama models. You cannot mix providers within a single OpenCode team (e.g., one agent using Ollama and another using OpenAI). This constraint ensures consistent runtime behavior since all agents share the same container environment.
If you change the model provider on an existing team, all agent model
selections are automatically reset to inherit.
Status Endpoint
You can check the current state of the Ollama infrastructure via the API:
GET /api/ollama/status Response example:
{
"running": true,
"container_id": "abc123...",
"models_pulled": ["qwen3:4b", "codellama:13b"],
"ref_count": 2,
"gpu_available": true
} | Field | Description |
|---|---|
running | Whether the Ollama container is currently running. |
container_id | Docker container ID (empty if not running). |
models_pulled | List of models already downloaded and available. |
ref_count | Number of active teams using Ollama. |
gpu_available | Whether NVIDIA GPU passthrough is available. |
Requirements
- Docker: Ollama runs as a Docker container, so Docker must be available on the host.
- Disk space: Models range from ~2 GB (small 4B parameter models) to ~10+ GB (larger 13B+ models). The persistent volume stores all downloaded models.
- RAM: Models are loaded into RAM (or VRAM if GPU is available). Ensure your host has enough memory for the selected model size.
- GPU (optional): NVIDIA GPU with
nvidia-smiand the NVIDIA Container Toolkit installed for GPU acceleration.
Next Steps
- Providers: Learn about all supported providers and how they compare.
- Configuration: Review environment variables and application settings.
- Architecture: Understand how containers, sidecars, and networking work together.