Civilization Node — Offline-First RAG System
By Emin Can Başkaya
2026-04-21
Project
A self-hosted, fully offline knowledge retrieval and LLM reasoning system. Local inference engine grounded in compressed static knowledge bases (Wikipedia, StackOverflow, iFixit) via RAG, served through a web interface. Designed for air-gapped environments, digital preservation, and infrastructure that still works when the wider internet doesn’t.
Why build this
Cloud LLMs fail the moment connectivity does, and the knowledge encoded inside them is frozen at their training cutoff and inaccessible without a round-trip to someone else’s servers. Civilization Node is the inverse: models, data, and interface all running on local hardware, with the entire stack reproducible from a Docker compose file and a handful of ZIM archives on disk. Useful for remote work, resilience planning, regulated environments where data cannot leave the machine, and as a personal reference system that stays functional regardless of what happens upstream.
Architecture
Three components, each deliberately chosen for independence:
Inference engine. Ollama running dolphin-llama3, customized through a Modelfile into a domain-specific “Librarian” system prompt tuned for technical reasoning over the offline corpus. Embedding model nomic-embed-text available for retrieval-side work. Ollama configured with OLLAMA_HOST=0.0.0.0 via systemd override so the containerized frontend can reach it on the Docker network.
Knowledge base. Kiwix toolchain reading compressed ZIM archives — wikipedia_en_all_nopic, ifixit_en_all, stackoverflow.com_en_all as the recommended base load. ZIM is the right format for this: single-file, compressed, random-access, designed for offline library use. The full English Wikipedia without images fits inside a manageable archive and queries at local-disk speed.
Interface and orchestration. Open WebUI as the frontend, with a custom Kiwix tool definition wired into the LLM’s workflow so retrieval happens inline during generation. The tool is registered through Open WebUI’s workspace and activated per-session, which keeps the RAG layer opt-in rather than forced.
Engineering choices worth flagging
Docker compose over bare installs
The entire stack ships as docker compose up -d plus an environment setup script. Reproducibility matters more than runtime efficiency for a system whose whole point is being restorable from first principles.
Systemd override for Ollama network binding
Default Ollama binds to localhost, which breaks the Docker-to-host boundary. The override config at /etc/systemd/system/ollama.service.d/override.conf is the correct, upgrade-safe way to handle this — edits to the main service file get clobbered on updates.
Resource envelope
12GB RAM minimum, 16GB recommended. OOM behavior on large ZIM reads plus an active LLM context was the main tuning constraint. GPU optional but helps; CPU-only inference works at usable latency for a single user.
Storage planning
~100GB+ depending on archive selection. The Wikipedia-all archive alone is substantial; layering on iFixit and StackOverflow pushes the library into real disk territory. A maintenance script lists available content so archive selection is deliberate rather than blind.
Failure modes handled
Corrupt ZIM downloads crash the reader — checksum verification and a defined recovery path (delete the latest file, re-download) are built into the troubleshooting flow. Connection-refused issues on Ollama have a documented diagnostic path via netstat and systemctl status.
What this demonstrates
Systems thinking about dependencies and failure modes. Comfort at the boundary between Docker, systemd, and host-level services — the layer where most self-hosted projects break and where the fix requires understanding the whole stack rather than one piece of it. Willingness to build for resilience rather than convenience, which is a different design discipline than most cloud-native work encourages.
Stack
Docker, docker compose, Ollama, Kiwix, Open WebUI, systemd service management, Linux (Ubuntu 22.04+ / WSL2), bash setup scripts, ZIM archive format.