Particle: Build a Fast, Private Local AI Coding Setup With Ollama

Overview

Ollama is presented as the preferred local runner, exposing an OpenAI‑compatible API at localhost:11434 so most IDE plugins can connect with no code changes.
DeepSeek‑Coder‑V2 16B is the default pick for balanced speed and skill on machines with 16–32GB memory or a mid‑range NVIDIA GPU, with Apple Silicon laptops running even larger models via unified memory.
A step‑by‑step checklist covers installing Ollama, pulling models like qwen2.5‑coder:7b or deepseek‑coder‑v2:16b, and running a quick local prompt to confirm everything works.
Continue.dev in VS Code is the main interface for chat and inline autocomplete, with optional Open WebUI for longer architecture sessions and Aider for commit‑aware, multi‑file edits.
Speed tips include using quantized weights for smaller loads, keeping the context window tight, pre‑warming the model to avoid first‑response delays, and enabling GPU offloading or flash attention where supported.