DocRAG — System Architecture

Architecture Pattern

Self-Hosted RAG with Direct Service Access

FastAPI handles all API requests. Next.js calls the backend directly on port 8000 — no reverse proxy. All data stays local: Qdrant stores vectors, PostgreSQL stores chat history.

note

DocRAG exposes services on individual ports directly. There is no nginx — the frontend calls the backend at http://localhost:8000.


Services

ServiceImagePortPurpose
frontendcustom3000Next.js App Router — chat UI, settings, sidebar
backendcustom8000FastAPI REST API + SSE streaming
qdrantqdrant/qdrant:v1.17.06333 / 6334Vector store for document embeddings
postgrespostgres:16-alpine5432Chat session + message persistence

Data Flow: Document Ingestion


Data Flow: Chat (Streaming)


LLM Setup

tip

Run Ollama natively on your machine for full privacy. Pull a model with ollama pull llama3.2, set LLM__OLLAMA_BASE_URL=http://host.docker.internal:11434 in .env, then run docker compose up.

info

Configure API keys in the Settings page — no .env edits needed. Keys are stored in the database and returned masked.

warning

Changing EMBED_MODEL after indexing documents requires clearing the vector store (Settings → Storage → Clear Vector Store). The default model nomic-ai/nomic-embed-text-v1.5 produces 768-dim vectors — incompatible with the previous default all-MiniLM-L6-v2 (384-dim).


Configuration: Two-Tier

TierLocationWhat lives here
Infrastructure.envDB password, Qdrant host, embed model, Ollama URL
User configDB (appsetting table)API keys (masked), active LLM provider + model, RAG params

Monorepo Layout

DocRAG/
├── frontend/
│   ├── Dockerfile
│   └── src/
│       ├── app/                  # Next.js App Router pages
│       │   ├── page.tsx          # Main chat page
│       │   ├── chat/[session_id]/page.tsx
│       │   └── settings/page.tsx
│       ├── components/           # shadcn/ui + custom components
│       ├── hooks/
│       │   ├── use-chat-store.ts # Zustand — sessions, currentSessionId
│       │   └── use-chat-stream.ts# SSE streaming handler
│       └── lib/
│           └── api.ts            # Typed API client (apiRequest, apiStream, apiUpload)
├── backend/
│   ├── Dockerfile
│   └── app/
│       ├── main.py               # FastAPI app, CORS, router mounting
│       ├── api/v1/               # Route handlers (chat, ingest, documents, query, models, settings, data)
│       ├── models/               # SQLModel models (ChatSession, ChatMessage, AppSetting)
│       ├── services/             # Business logic (llm, retrieval, vector, chunking, file)
│       └── core/                 # Config (pydantic-settings), DB engine, exceptions
├── docker-compose.yml
└── .env.example

Health Checks & Startup Order

qdrant  (healthy) ─┐
                   ├─→ backend ─→ frontend
postgres (healthy) ┘

Backend exposes GET /health — returns Qdrant connection status, active LLM provider, and app version.


Supported Document Types

FormatProcessing Engine
PDF, DOCX, PNG, JPGDocling DocumentConverter
CSV, XLSXpandas
JSON, TXT, MD, PUML, source codeNative UTF-8 read

All documents chunked via HybridChunker (512 tokens max). Embedding model: nomic-ai/nomic-embed-text-v1.5 (768 dimensions).


LLM Provider Support

ProviderModeNotes
OllamaLocalDefault; Ollama must run on host (host.docker.internal:11434)
OpenAICloudAPI key stored in DB via Settings page
AnthropicCloudAPI key stored in DB via Settings page
GeminiCloudAPI key stored in DB via Settings page