Replace Cloud AI with Your Own Knowledge Fortress
You’ve got a local Ollama instance. You’re running Mistral or Llama 3 locally. But every time you want an AI to actually understand your PDFs, your docs, your proprietary stuff—you ship it to OpenAI, Claude, or Gemini. The irony is brutal: you’re self-hosting the model but exiling your knowledge to the cloud.
Here’s the thing: AnythingLLM is the closest thing to a real, self-hosted NotebookLM you can build right now. Not a chat wrapper. Not a UI for OpenAI. A full-stack RAG system with workspaces, agents, document chunking, vector storage, and all the privacy guarantees that come with running it in your own server room.
This article walks you through what AnythingLLM is, how it differs from Open WebUI and LibreChat, and how to get it running with local Ollama in about 20 minutes. By the end, you’ll have a private knowledge base that actually talks back—and keeps your documents locked down tight.
What Is AnythingLLM?
AnythingLLM (built by Mintplex Labs, MIT-ish open-source license) is a document-first AI platform, not a chat UI that happens to support documents. The distinction matters.
Open WebUI is excellent at what it does: a web interface for LLMs. You can upload files, and it’ll chat about them. But it’s fundamentally a chat tool with document support bolted on. The files live in a shared bucket; there’s no real RAG pipeline; context management is crude.
LibreChat is similar: a multi-model chat hub. Powerful, flexible, but not built around understanding documents.
AnythingLLM starts with the assumption that your actual value is in documents: PDFs, markdown files, text docs, even web pages you import. Everything else—the LLM, the chat, the agents—flows from that. It chunks documents, embeds them, stores vectors, and retrieves relevant snippets when you ask questions. That’s RAG-first thinking.
The licensing is generous (MIT-ish, with some commercial server features locked behind a license), and the self-hosted Docker version gives you everything you need for a private knowledge base. It’s not free-as-in-speech for the cloud version, but the Docker self-hosted version is effectively open.
Workspaces: Knowledge Silos That Actually Work
Here’s what sold me on AnythingLLM: workspaces.
In Open WebUI, you upload a file and chat. But if you upload your company’s employee handbook and your personal gaming reference docs and your medical records all to the same instance, they all leak into context. You’re polluting your model’s knowledge with unrelated stuff.
AnythingLLM workspaces solve this. Each workspace is a completely isolated knowledge silo:
- Your own embeddings, vector DB, and document set
- Separate chat history
- Different LLM and embedding model choices per workspace
- No cross-contamination between projects
You can have a DigitalGarden workspace for your personal notes, a WorkDocs workspace for company stuff, a RecipeBooks workspace for your food experimentation. Each one is totally separate. Each one pulls only from its own documents when building context.
This is how you actually use RAG in the real world—not one giant blob of everything, but organized silos that respect boundaries.
Document Types and the Ingest Pipeline
AnythingLLM supports a wide range of document types out of the box:
- PDFs (the main event)
- Markdown, plaintext, JSON, CSV
- Word docs (.docx)
- Spreadsheets (.xlsx, .ods)
- Web pages (via URL scraping)
- YouTube transcripts (paste a URL, it fetches the captions)
- Even scrape websites (limited depth, configurable)
When you upload a document, AnythingLLM doesn’t just dump it raw into the vector DB. It:
- Chunks the document intelligently (default 1000 tokens per chunk, configurable overlap)
- Embeds each chunk with a vector model (we’ll talk about choices in a second)
- Stores the vectors in a vector database
- Indexes metadata so you can search and filter later
The UI shows you the chunk count, estimated tokens, and lets you tweak chunking before confirming. You can see exactly how many documents are in your workspace and delete specific files without nuking everything.
Embeddings: Local vs. Cloud, Fast vs. Accurate
The embedding model is the translator between “human question” and “relevant documents”. Pick wrong and your RAG retrieval stinks.
AnythingLLM gives you multiple choices:
Built-in (LanceDB default, no external dependency):
- Fast, zero setup, works offline
- Good enough for basic retrieval
- Not as accurate as specialized models but honest about its limits
Ollama embeddings (if you’re running Ollama locally):
nomic-embed-textormxbai-embed-large— both solid- Zero cost, runs on your GPU
- Slower than LanceDB but more accurate
- Recommended if you’ve got GPU headroom
OpenAI embeddings:
- Overkill and costs money, but undeniably accurate
- Defeats the purpose if your goal is privacy
- Skip this
Other options: Mistral, Cohere, HuggingFace, LM Studio — all pluggable if you’re running those services.
Honest take: Run nomic-embed-text on Ollama if you’ve got the VRAM. It’s ~750MB, fast enough on a GPU, and gives you meaningful improvement over the default. If you’re on CPU only, LanceDB’s default is fine—it’s not fancy, but it works.
Vector Databases: Beyond the Default
By default, AnythingLLM uses LanceDB, an embedded vector database. It’s fast, requires zero extra setup, and works great for knowledge bases up to a few hundred thousand chunks.
But AnythingLLM also supports plugging in other vector stores:
- Qdrant (open-source, standalone server, excellent query performance)
- Weaviate (more complex, good for enterprise scale)
- Pinecone (cloud-hosted, requires API key)
- Milvus (self-hosted, heavyweight)
Unless you’re building something that scales to millions of documents, stick with LanceDB. It’s the right default. If you do want the flexibility of a separate vector DB (for easier backups or multi-instance access), Qdrant is the self-hosted move.
Agents: When Your Knowledge Base Gets Proactive
AnythingLLM has an agent system that lets your knowledge base do more than answer questions. Agents can:
- Browse the web and fetch real-time information (useful when your local docs are stale)
- Run custom tools (hooks into external APIs, databases, scripts)
- Chain reasoning across multiple retrieval steps
This is where AnythingLLM starts looking like NotebookLM: you ask a complex question that requires synthesis, and the agent breaks it down, retrieves from your docs, fetches external context, and builds a coherent answer.
Example: “Summarize our Q3 sales report and compare it to industry benchmarks.” An agent could pull your Q3 report from the workspace, fetch some industry data, and write a comparison.
For most self-hosters, basic RAG (document chat) is enough. But the agent plumbing is there if you want to get fancy later.
Docker Deployment: Up in 20 Minutes
Here’s the docker-compose setup for a fully working AnythingLLM instance pointing at Ollama:
version: '3.8'
services: anythingllm: image: mintplexlabs/anythingllm:latest container_name: anythingllm ports: - "3001:3001" environment: STORAGE_DIR: /app/server/storage LLM_PROVIDER: ollama OLLAMA_BASE_PATH: http://host.docker.internal:11434 EMBEDDING_ENGINE: ollama EMBEDDING_MODEL_PREF: nomic-embed-text EMBEDDING_BASE_PATH: http://host.docker.internal:11434 JWT_SECRET: your-random-secret-here-change-this DISABLE_TELEMETRY: "true" volumes: - anythingllm_storage:/app/server/storage networks: - llm-network restart: unless-stopped
qdrant: image: qdrant/qdrant:latest container_name: qdrant ports: - "6333:6333" volumes: - qdrant_storage:/qdrant/storage networks: - llm-network restart: unless-stopped environment: QDRANT_API_KEY: "your-qdrant-key"
volumes: anythingllm_storage: qdrant_storage:
networks: llm-network: driver: bridgeKey environment variables:
OLLAMA_BASE_PATH: Points to your Ollama instance. If Ollama is on another machine, usehttp://192.168.1.100:11434instead ofhost.docker.internalEMBEDDING_MODEL_PREF: The model to use for embeddings (make sure it’s pulled in Ollama:ollama pull nomic-embed-text)JWT_SECRET: Change this to something random. It’s for session tokens.DISABLE_TELEMETRY: Mingles with your privacy. Set it to true.
To start:
docker-compose up -dWait 30 seconds for startup, then visit http://localhost:3001. You’ll land on a setup wizard. Create an admin account, and you’re in.
Pro tip: If Ollama is on the same machine as Docker, use host.docker.internal:11434 on macOS/Windows or 172.17.0.1:11434 on Linux (or the host’s actual IP if Ollama binds to 0.0.0.0).
Creating Your First Workspace
Once you’re logged in:
- Create a workspace (click the workspace selector, “New workspace”)
- Give it a name (
WorkDocs,PrivateNotes, whatever) - Configure the LLM: Ollama, pick your model (Mistral, Llama 3, whatever you’re running)
- Configure embeddings: Ollama +
nomic-embed-text(assuming you pulled it) - Pick a vector DB: LanceDB (default, zero setup) or Qdrant if you added it
- Save and create
Now you’ve got an empty workspace. Time to feed it documents.
Ingesting Documents: PDFs and Beyond
Click “Upload documents” or “Add files” (the UI varies slightly per version).
Easy path:
- Drag and drop a PDF, markdown file, or text doc
- AnythingLLM chunks it, shows you the chunk count
- Click “Save and ingest”
- Wait for embedding to finish (depends on doc size and your embedding model’s speed)
Folder import:
- Many versions support pointing AnythingLLM at a folder on disk
- It’ll ingest all supported documents in that folder
- Great for bulk uploads (a whole library of docs, for example)
Web scraping:
- Paste a URL, configure depth, let it crawl and ingest
- Useful for documentation sites or long articles
Real-world example: You’ve got 50 PDFs of company policies, engineering docs, and FAQs. Dump them in a folder, point AnythingLLM at it, walk away for 10 minutes. When it finishes, you’ve got a fully searchable, queryable knowledge base.
Chunking Config: The Gotcha That Bites Everyone
Here’s where most people stumble: chunk size and overlap.
The default is usually fine (1000 tokens, ~20% overlap), but here’s what goes wrong:
- Chunks too small (250 tokens): Retrieval gets noisy. You pull snippets that are too granular; context is fractured.
- Chunks too large (3000+ tokens): You lose specificity. A question about page 3 pulls all 20 pages of context.
- No overlap: The LLM misses context that spans chunk boundaries.
Recommendation: Stick with the default. If you’re noticing bad retrieval (questions go unanswered or answers feel disconnected), then tweak:
- Reduce chunk size if context feels bloated
- Increase overlap if you’re missing boundary-spanning info
Don’t obsess over this on day one. Get it running, ask questions, and iterate.
Chatting with Your Documents
Once documents are ingested, click into the workspace and start chatting.
In the chat:
- Type your question
- AnythingLLM retrieves relevant chunks from your documents
- The LLM sees both the question and the retrieved context
- Generates an answer citing your documents
The interface shows you which documents were pulled for context (usually as a sidebar or expandable section). You can see exactly what the model was working with.
Example conversation:
- You: “What’s our disaster recovery policy for database failures?”
- AnythingLLM: [Retrieves your ops handbook, DR procedures document, incident reports]
- Answer: “According to your disaster recovery policy, database failures trigger a 15-minute recovery window using automated snapshots. Your last incident report shows this was tested on 2026-05-15 with a 12-minute actual recovery time.”
That’s RAG working. Your documents are the ground truth, not the model’s training data.
Common Pitfalls and How to Avoid Them
Context bloat: You ask a simple question and get 10 pages of irrelevant context. Usually means:
- Chunk size is too big
- Embedding model is weak (using default LanceDB instead of
nomic-embed-text) - Document corpus is poorly organized (mixing unrelated topics in one workspace)
Embedding model mismatches: You ingest documents with one embedding model, then switch to another. The vectors become useless. Solution: Don’t switch models mid-workspace. Decide upfront, rebuild if you need to change.
No documents actually ingested: The upload UI says “success” but chat finds nothing. Usually means:
- File format isn’t supported (try converting unsupported formats to PDF or markdown)
- Embedding is still processing (wait longer)
- The vector DB failed silently (check logs in
docker logs anythingllm)
Ollama not reachable: AnythingLLM can’t talk to Ollama. Solution: Test the connection from the docker container: docker exec anythingllm curl http://host.docker.internal:11434/api/tags (or use your actual Ollama IP if not local).
Open WebUI vs. AnythingLLM: When to Use Each
Use Open WebUI if:
- You just want a nice chat interface for Ollama
- You’re not focusing on documents
- You want maximum flexibility and community mods
Use AnythingLLM if:
- You’re building a real knowledge base
- Document management and organization matter
- You need workspaces, RAG, or agents
- Privacy and document ownership are non-negotiable
Honestly, if you’re planning to chat with documents regularly, AnythingLLM is the obvious choice. Open WebUI is great, but it’s not built for this use case. AnythingLLM is.
Security and Privacy Wins
Running AnythingLLM locally means:
- No document exfiltration. Your PDFs never touch OpenAI’s servers.
- No data mining. Your documents aren’t used to train anyone else’s model.
- No API costs. Ollama is free; embedding is free if you use Ollama’s models.
- Full audit trail. Everything lives on your hardware. You own the logs.
This is the real payoff of self-hosting. You’re not paying per token or per document. You’re paying in upfront hardware cost and your own labor. That math wins if you’ve got a lot of documents and questions.
The 20-Minute Setup Checklist
- Have Ollama running with at least one LLM model and
nomic-embed-textpulled - Save the docker-compose.yml above to a folder
- Edit OLLAMA_BASE_PATH if Ollama is on a different machine
- Run
docker-compose up -d - Wait 30 seconds
- Visit
http://localhost:3001 - Create an admin account
- Create a workspace
- Upload a test document (a PDF, markdown file, whatever)
- Ask it a question
Done. You now have a private, self-hosted knowledge base that understands your documents. No cloud vendor involved. No license fees. No data exfiltration.
Next Steps
Once the basics are working:
- Organize by workspace. Create separate workspaces for different projects or knowledge domains.
- Experiment with embedding models. Try different Ollama embeddings; see what gives you the best retrieval.
- Set up agents. If you want proactive behavior, enable web browsing or custom tools.
- Back it up. The
anythingllm_storagevolume holds everything. Backup regularly.
AnythingLLM is still maturing, and the self-hosted version gets updated regularly. Check the docs occasionally for new features.
Final Thoughts
This is what self-hosting should feel like. You wanted to use local LLMs without sending your data to Anthropic, OpenAI, or Google. AnythingLLM is the missing piece. It’s the knowledge base layer that makes local models actually useful for your own documents.
Your 2 AM self—the one who’s been meaning to organize all those PDFs for the past year—will appreciate having a tool that actually works and doesn’t phone home.
Get it running. Ingest your documents. Ask it something. You’ll understand immediately why this matters.