Skip to content

Open Source Models

AgentCrew includes built-in support for running open source models locally via Ollama. When you select Ollama as your model provider, AgentCrew automatically manages the entire lifecycle: starting the Ollama container, pulling models, warming them up, and stopping the container when no teams need it anymore.

This means you can run AI agent teams entirely on your own hardware, with no external API keys required and full data privacy.

How It Works

Shared Infrastructure

Unlike team containers (which are isolated per team), Ollama runs as shared infrastructure. A single agentcrew-ollama container serves all teams that use the Ollama provider. This avoids duplicating large model files and reduces resource usage.

  • Reference counting: AgentCrew tracks how many teams are using Ollama. The container starts when the first Ollama team deploys and stops when the last one is removed.
  • Persistent storage: Downloaded models are stored in a Docker volume (agentcrew-ollama-models) that persists even when the container stops. Models only need to be downloaded once.
  • Multi-network: The Ollama container connects to each team's Docker network, so agent containers can reach it via DNS (agentcrew-ollama:11434).

Automatic Lifecycle

When you deploy a team with the Ollama provider, AgentCrew automatically:

  1. Starts the agentcrew-ollama container (or reuses it if already running).
  2. Connects it to the team's Docker network.
  3. Pulls the selected model (if not already downloaded).
  4. Warms up the model by loading weights into RAM, avoiding cold-start delays on the first message.
  5. Deploys the team's agent containers with OLLAMA_BASE_URL pre-configured.

When you stop a team, AgentCrew disconnects Ollama from that team's network and decrements the reference count. If no other teams are using Ollama, the container is stopped (but the volume with downloaded models is preserved).

GPU Support

AgentCrew automatically detects NVIDIA GPUs on the host machine. If nvidia-smi is found in the system PATH, GPU passthrough is enabled for the Ollama container, giving models access to all available GPUs for dramatically faster inference.

No manual configuration is needed. If a GPU is available, it will be used automatically. You can verify GPU status via the status endpoint.

Using Ollama in AgentCrew

Creating a Team

  1. In the team creation wizard, select OpenCode as the provider.
  2. Choose Ollama as the model provider.
  3. Select a model for your agents. The default model is qwen3:4b, but you can use any model available in the Ollama model library.
  4. Configure your agents as usual. All agents in the team will use the selected Ollama model provider.

Model Format

When specifying agent models, use the ollama/ prefix followed by the model name and optional tag:

  • ollama/qwen3:4b
  • ollama/llama3.3:8b
  • ollama/codellama:13b
  • ollama/mistral:7b
  • ollama/devstral

You can also use inherit to let the agent use the team's default model.

Model Provider Constraint

When a team's model provider is set to Ollama, all agents in that team must use Ollama models. You cannot mix providers within a single OpenCode team (e.g., one agent using Ollama and another using OpenAI). This constraint ensures consistent runtime behavior since all agents share the same container environment.

If you change the model provider on an existing team, all agent model selections are automatically reset to inherit.

Status Endpoint

You can check the current state of the Ollama infrastructure via the API:

GET /api/ollama/status

Response example:

{
  "running": true,
  "container_id": "abc123...",
  "models_pulled": ["qwen3:4b", "codellama:13b"],
  "ref_count": 2,
  "gpu_available": true
}
Field Description
running Whether the Ollama container is currently running.
container_id Docker container ID (empty if not running).
models_pulled List of models already downloaded and available.
ref_count Number of active teams using Ollama.
gpu_available Whether NVIDIA GPU passthrough is available.

Requirements

  • Docker: Ollama runs as a Docker container, so Docker must be available on the host.
  • Disk space: Models range from ~2 GB (small 4B parameter models) to ~10+ GB (larger 13B+ models). The persistent volume stores all downloaded models.
  • RAM: Models are loaded into RAM (or VRAM if GPU is available). Ensure your host has enough memory for the selected model size.
  • GPU (optional): NVIDIA GPU with nvidia-smi and the NVIDIA Container Toolkit installed for GPU acceleration.

Next Steps

  • Providers: Learn about all supported providers and how they compare.
  • Configuration: Review environment variables and application settings.
  • Architecture: Understand how containers, sidecars, and networking work together.