DocRAG: Multimodal Self-Hosted RAG

An open-source RAG engine that lets you chat with PDFs, spreadsheets, diagrams, and source code using local or cloud LLMs — fully self-hosted with Qdrant and PostgreSQL.

Solo DeveloperOngoingIn Progress
nextjspythonfastapipostgresqlqdrantdockerollamadocling
10+
Document Formats
4
LLM Providers
768
Embedding Dims
20 MB
Max File Size

View Documentation →

The Problem

Most RAG tools force you to choose between privacy and capability — cloud-based tools send your documents to third-party servers, while local alternatives are painful to set up and lack multimodal support.

DocRAG is a self-hosted RAG engine that handles PDFs, Excel sheets, PlantUML diagrams, images, and source code in one place. It runs entirely via Docker — your documents never leave your machine.

Features

  • Multimodal ingestion — PDF, DOCX, images (via Docling layout-aware parsing), CSV, XLSX, JSON, Markdown, PUML, and source code
  • Streaming chat — SSE-based streaming responses with intent classification (Document Analyst, Code Architect, Summarizer, and more)
  • Source citations — every assistant response shows the exact document chunks used as context
  • Multi-provider LLM — switch between Ollama (local), OpenAI, Anthropic, and Gemini from the Settings page — no .env edits required
  • Self-hosted vector store — Qdrant v1.17.0 with gRPC support for fast high-throughput ingestion
  • Session history — full chat history persisted in PostgreSQL with export/import support
  • Runtime RAG tuning — adjust top-k and score threshold from the Settings page without restarting

System Architecture

Key Technical Decisions

DecisionChosenRejectedWhy
Document parsingDoclingPyMuPDF / pdfplumberLayout-aware parsing preserves reading order and handles multi-column PDFs
Vector storeQdrantChromaDB / FAISSgRPC support, built-in dashboard, production-grade persistence
Embedding modelnomic-embed-text-v1.5 (768-dim)all-MiniLM-L6-v2 (384-dim)8192-token context window — better for long document chunks
LLM key storagePostgreSQL (appsetting table).env fileRuntime key management via Settings UI without container restarts
StreamingSSE via FastAPI StreamingResponseWebSocketSimpler client-side with ReadableStream; no persistent connection needed

What I'd Do Differently

Add a document deduplication check at ingestion time. Currently uploading the same file twice creates duplicate vector points — the document list shows two entries and retrieval quality degrades. A content hash check before upsert would prevent this with minimal overhead.

I'd also add chunk-level metadata richer than just file_name and page_number — section titles and element types are already extracted by Docling but could be used more aggressively to improve retrieval precision with metadata filters.