Skip to content
Go back

AnythingLLM as Knowledge Base

By SumGuy 13 min read
AnythingLLM as Knowledge Base

Replace Cloud AI with Your Own Knowledge Fortress

You’ve got a local Ollama instance. You’re running Mistral or Llama 3 locally. But every time you want an AI to actually understand your PDFs, your docs, your proprietary stuff—you ship it to OpenAI, Claude, or Gemini. The irony is brutal: you’re self-hosting the model but exiling your knowledge to the cloud.

Here’s the thing: AnythingLLM is the closest thing to a real, self-hosted NotebookLM you can build right now. Not a chat wrapper. Not a UI for OpenAI. A full-stack RAG system with workspaces, agents, document chunking, vector storage, and all the privacy guarantees that come with running it in your own server room.

This article walks you through what AnythingLLM is, how it differs from Open WebUI and LibreChat, and how to get it running with local Ollama in about 20 minutes. By the end, you’ll have a private knowledge base that actually talks back—and keeps your documents locked down tight.


What Is AnythingLLM?

AnythingLLM (built by Mintplex Labs, MIT-ish open-source license) is a document-first AI platform, not a chat UI that happens to support documents. The distinction matters.

Open WebUI is excellent at what it does: a web interface for LLMs. You can upload files, and it’ll chat about them. But it’s fundamentally a chat tool with document support bolted on. The files live in a shared bucket; there’s no real RAG pipeline; context management is crude.

LibreChat is similar: a multi-model chat hub. Powerful, flexible, but not built around understanding documents.

AnythingLLM starts with the assumption that your actual value is in documents: PDFs, markdown files, text docs, even web pages you import. Everything else—the LLM, the chat, the agents—flows from that. It chunks documents, embeds them, stores vectors, and retrieves relevant snippets when you ask questions. That’s RAG-first thinking.

The licensing is generous (MIT-ish, with some commercial server features locked behind a license), and the self-hosted Docker version gives you everything you need for a private knowledge base. It’s not free-as-in-speech for the cloud version, but the Docker self-hosted version is effectively open.


Workspaces: Knowledge Silos That Actually Work

Here’s what sold me on AnythingLLM: workspaces.

In Open WebUI, you upload a file and chat. But if you upload your company’s employee handbook and your personal gaming reference docs and your medical records all to the same instance, they all leak into context. You’re polluting your model’s knowledge with unrelated stuff.

AnythingLLM workspaces solve this. Each workspace is a completely isolated knowledge silo:

You can have a DigitalGarden workspace for your personal notes, a WorkDocs workspace for company stuff, a RecipeBooks workspace for your food experimentation. Each one is totally separate. Each one pulls only from its own documents when building context.

This is how you actually use RAG in the real world—not one giant blob of everything, but organized silos that respect boundaries.


Document Types and the Ingest Pipeline

AnythingLLM supports a wide range of document types out of the box:

When you upload a document, AnythingLLM doesn’t just dump it raw into the vector DB. It:

  1. Chunks the document intelligently (default 1000 tokens per chunk, configurable overlap)
  2. Embeds each chunk with a vector model (we’ll talk about choices in a second)
  3. Stores the vectors in a vector database
  4. Indexes metadata so you can search and filter later

The UI shows you the chunk count, estimated tokens, and lets you tweak chunking before confirming. You can see exactly how many documents are in your workspace and delete specific files without nuking everything.


Embeddings: Local vs. Cloud, Fast vs. Accurate

The embedding model is the translator between “human question” and “relevant documents”. Pick wrong and your RAG retrieval stinks.

AnythingLLM gives you multiple choices:

Built-in (LanceDB default, no external dependency):

Ollama embeddings (if you’re running Ollama locally):

OpenAI embeddings:

Other options: Mistral, Cohere, HuggingFace, LM Studio — all pluggable if you’re running those services.

Honest take: Run nomic-embed-text on Ollama if you’ve got the VRAM. It’s ~750MB, fast enough on a GPU, and gives you meaningful improvement over the default. If you’re on CPU only, LanceDB’s default is fine—it’s not fancy, but it works.


Vector Databases: Beyond the Default

By default, AnythingLLM uses LanceDB, an embedded vector database. It’s fast, requires zero extra setup, and works great for knowledge bases up to a few hundred thousand chunks.

But AnythingLLM also supports plugging in other vector stores:

Unless you’re building something that scales to millions of documents, stick with LanceDB. It’s the right default. If you do want the flexibility of a separate vector DB (for easier backups or multi-instance access), Qdrant is the self-hosted move.


Agents: When Your Knowledge Base Gets Proactive

AnythingLLM has an agent system that lets your knowledge base do more than answer questions. Agents can:

This is where AnythingLLM starts looking like NotebookLM: you ask a complex question that requires synthesis, and the agent breaks it down, retrieves from your docs, fetches external context, and builds a coherent answer.

Example: “Summarize our Q3 sales report and compare it to industry benchmarks.” An agent could pull your Q3 report from the workspace, fetch some industry data, and write a comparison.

For most self-hosters, basic RAG (document chat) is enough. But the agent plumbing is there if you want to get fancy later.


Docker Deployment: Up in 20 Minutes

Here’s the docker-compose setup for a fully working AnythingLLM instance pointing at Ollama:

docker-compose.yml
version: '3.8'
services:
anythingllm:
image: mintplexlabs/anythingllm:latest
container_name: anythingllm
ports:
- "3001:3001"
environment:
STORAGE_DIR: /app/server/storage
LLM_PROVIDER: ollama
OLLAMA_BASE_PATH: http://host.docker.internal:11434
EMBEDDING_ENGINE: ollama
EMBEDDING_MODEL_PREF: nomic-embed-text
EMBEDDING_BASE_PATH: http://host.docker.internal:11434
JWT_SECRET: your-random-secret-here-change-this
DISABLE_TELEMETRY: "true"
volumes:
- anythingllm_storage:/app/server/storage
networks:
- llm-network
restart: unless-stopped
qdrant:
image: qdrant/qdrant:latest
container_name: qdrant
ports:
- "6333:6333"
volumes:
- qdrant_storage:/qdrant/storage
networks:
- llm-network
restart: unless-stopped
environment:
QDRANT_API_KEY: "your-qdrant-key"
volumes:
anythingllm_storage:
qdrant_storage:
networks:
llm-network:
driver: bridge

Key environment variables:

To start:

Terminal window
docker-compose up -d

Wait 30 seconds for startup, then visit http://localhost:3001. You’ll land on a setup wizard. Create an admin account, and you’re in.

Pro tip: If Ollama is on the same machine as Docker, use host.docker.internal:11434 on macOS/Windows or 172.17.0.1:11434 on Linux (or the host’s actual IP if Ollama binds to 0.0.0.0).


Creating Your First Workspace

Once you’re logged in:

  1. Create a workspace (click the workspace selector, “New workspace”)
  2. Give it a name (WorkDocs, PrivateNotes, whatever)
  3. Configure the LLM: Ollama, pick your model (Mistral, Llama 3, whatever you’re running)
  4. Configure embeddings: Ollama + nomic-embed-text (assuming you pulled it)
  5. Pick a vector DB: LanceDB (default, zero setup) or Qdrant if you added it
  6. Save and create

Now you’ve got an empty workspace. Time to feed it documents.


Ingesting Documents: PDFs and Beyond

Click “Upload documents” or “Add files” (the UI varies slightly per version).

Easy path:

Folder import:

Web scraping:

Real-world example: You’ve got 50 PDFs of company policies, engineering docs, and FAQs. Dump them in a folder, point AnythingLLM at it, walk away for 10 minutes. When it finishes, you’ve got a fully searchable, queryable knowledge base.


Chunking Config: The Gotcha That Bites Everyone

Here’s where most people stumble: chunk size and overlap.

The default is usually fine (1000 tokens, ~20% overlap), but here’s what goes wrong:

Recommendation: Stick with the default. If you’re noticing bad retrieval (questions go unanswered or answers feel disconnected), then tweak:

Don’t obsess over this on day one. Get it running, ask questions, and iterate.


Chatting with Your Documents

Once documents are ingested, click into the workspace and start chatting.

In the chat:

The interface shows you which documents were pulled for context (usually as a sidebar or expandable section). You can see exactly what the model was working with.

Example conversation:

That’s RAG working. Your documents are the ground truth, not the model’s training data.


Common Pitfalls and How to Avoid Them

Context bloat: You ask a simple question and get 10 pages of irrelevant context. Usually means:

Embedding model mismatches: You ingest documents with one embedding model, then switch to another. The vectors become useless. Solution: Don’t switch models mid-workspace. Decide upfront, rebuild if you need to change.

No documents actually ingested: The upload UI says “success” but chat finds nothing. Usually means:

Ollama not reachable: AnythingLLM can’t talk to Ollama. Solution: Test the connection from the docker container: docker exec anythingllm curl http://host.docker.internal:11434/api/tags (or use your actual Ollama IP if not local).


Open WebUI vs. AnythingLLM: When to Use Each

Use Open WebUI if:

Use AnythingLLM if:

Honestly, if you’re planning to chat with documents regularly, AnythingLLM is the obvious choice. Open WebUI is great, but it’s not built for this use case. AnythingLLM is.


Security and Privacy Wins

Running AnythingLLM locally means:

This is the real payoff of self-hosting. You’re not paying per token or per document. You’re paying in upfront hardware cost and your own labor. That math wins if you’ve got a lot of documents and questions.


The 20-Minute Setup Checklist

  1. Have Ollama running with at least one LLM model and nomic-embed-text pulled
  2. Save the docker-compose.yml above to a folder
  3. Edit OLLAMA_BASE_PATH if Ollama is on a different machine
  4. Run docker-compose up -d
  5. Wait 30 seconds
  6. Visit http://localhost:3001
  7. Create an admin account
  8. Create a workspace
  9. Upload a test document (a PDF, markdown file, whatever)
  10. Ask it a question

Done. You now have a private, self-hosted knowledge base that understands your documents. No cloud vendor involved. No license fees. No data exfiltration.


Next Steps

Once the basics are working:

AnythingLLM is still maturing, and the self-hosted version gets updated regularly. Check the docs occasionally for new features.


Final Thoughts

This is what self-hosting should feel like. You wanted to use local LLMs without sending your data to Anthropic, OpenAI, or Google. AnythingLLM is the missing piece. It’s the knowledge base layer that makes local models actually useful for your own documents.

Your 2 AM self—the one who’s been meaning to organize all those PDFs for the past year—will appreciate having a tool that actually works and doesn’t phone home.

Get it running. Ingest your documents. Ask it something. You’ll understand immediately why this matters.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Previous Post
Gemma 4 vs Qwen3.6
Next Post
FRR vs BIRD

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts