DocRAG: Multimodal Self-Hosted RAG
An open-source RAG engine that lets you chat with PDFs, spreadsheets, diagrams, and source code using local or cloud LLMs — fully self-hosted with Qdrant and PostgreSQL.
The Problem
Most RAG tools force you to choose between privacy and capability — cloud-based tools send your documents to third-party servers, while local alternatives are painful to set up and lack multimodal support.
DocRAG is a self-hosted RAG engine that handles PDFs, Excel sheets, PlantUML diagrams, images, and source code in one place. It runs entirely via Docker — your documents never leave your machine.
Features
- Multimodal ingestion — PDF, DOCX, images (via Docling layout-aware parsing), CSV, XLSX, JSON, Markdown, PUML, and source code
- Streaming chat — SSE-based streaming responses with intent classification (Document Analyst, Code Architect, Summarizer, and more)
- Source citations — every assistant response shows the exact document chunks used as context
- Multi-provider LLM — switch between Ollama (local), OpenAI, Anthropic, and Gemini from the Settings page — no
.envedits required - Self-hosted vector store — Qdrant v1.17.0 with gRPC support for fast high-throughput ingestion
- Session history — full chat history persisted in PostgreSQL with export/import support
- Runtime RAG tuning — adjust top-k and score threshold from the Settings page without restarting
System Architecture
Key Technical Decisions
| Decision | Chosen | Rejected | Why |
|---|---|---|---|
| Document parsing | Docling | PyMuPDF / pdfplumber | Layout-aware parsing preserves reading order and handles multi-column PDFs |
| Vector store | Qdrant | ChromaDB / FAISS | gRPC support, built-in dashboard, production-grade persistence |
| Embedding model | nomic-embed-text-v1.5 (768-dim) | all-MiniLM-L6-v2 (384-dim) | 8192-token context window — better for long document chunks |
| LLM key storage | PostgreSQL (appsetting table) | .env file | Runtime key management via Settings UI without container restarts |
| Streaming | SSE via FastAPI StreamingResponse | WebSocket | Simpler client-side with ReadableStream; no persistent connection needed |
What I'd Do Differently
Add a document deduplication check at ingestion time. Currently uploading the same file twice creates duplicate vector points — the document list shows two entries and retrieval quality degrades. A content hash check before upsert would prevent this with minimal overhead.
I'd also add chunk-level metadata richer than just file_name and page_number — section titles and element types are already extracted by Docling but could be used more aggressively to improve retrieval precision with metadata filters.