AnythingLLM as Knowledge Base

Replace Cloud AI with Your Own Knowledge Fortress

You’ve got a local Ollama instance. You’re running Mistral or Llama 3 locally. But every time you want an AI to actually understand your PDFs, your docs, your proprietary stuff—you ship it to OpenAI, Claude, or Gemini. The irony is brutal: you’re self-hosting the model but exiling your knowledge to the cloud.

Here’s the thing: AnythingLLM is the closest thing to a real, self-hosted NotebookLM you can build right now. Not a chat wrapper. Not a UI for OpenAI. A full-stack RAG system with workspaces, agents, document chunking, vector storage, and all the privacy guarantees that come with running it in your own server room.

This article walks you through what AnythingLLM is, how it differs from Open WebUI and LibreChat, and how to get it running with local Ollama in about 20 minutes. By the end, you’ll have a private knowledge base that actually talks back—and keeps your documents locked down tight.

What Is AnythingLLM?

AnythingLLM (built by Mintplex Labs, MIT-ish open-source license) is a document-first AI platform, not a chat UI that happens to support documents. The distinction matters.

Open WebUI is excellent at what it does: a web interface for LLMs. You can upload files, and it’ll chat about them. But it’s fundamentally a chat tool with document support bolted on. The files live in a shared bucket; there’s no real RAG pipeline; context management is crude.

LibreChat is similar: a multi-model chat hub. Powerful, flexible, but not built around understanding documents.

AnythingLLM starts with the assumption that your actual value is in documents: PDFs, markdown files, text docs, even web pages you import. Everything else—the LLM, the chat, the agents—flows from that. It chunks documents, embeds them, stores vectors, and retrieves relevant snippets when you ask questions. That’s RAG-first thinking.

The licensing is generous (MIT-ish, with some commercial server features locked behind a license), and the self-hosted Docker version gives you everything you need for a private knowledge base. It’s not free-as-in-speech for the cloud version, but the Docker self-hosted version is effectively open.

Workspaces: Knowledge Silos That Actually Work

Here’s what sold me on AnythingLLM: workspaces.

In Open WebUI, you upload a file and chat. But if you upload your company’s employee handbook and your personal gaming reference docs and your medical records all to the same instance, they all leak into context. You’re polluting your model’s knowledge with unrelated stuff.

AnythingLLM workspaces solve this. Each workspace is a completely isolated knowledge silo:

Your own embeddings, vector DB, and document set
Separate chat history
Different LLM and embedding model choices per workspace
No cross-contamination between projects

You can have a DigitalGarden workspace for your personal notes, a WorkDocs workspace for company stuff, a RecipeBooks workspace for your food experimentation. Each one is totally separate. Each one pulls only from its own documents when building context.

This is how you actually use RAG in the real world—not one giant blob of everything, but organized silos that respect boundaries.

Document Types and the Ingest Pipeline

AnythingLLM supports a wide range of document types out of the box:

PDFs (the main event)
Markdown, plaintext, JSON, CSV
Word docs (.docx)
Spreadsheets (.xlsx, .ods)
Web pages (via URL scraping)
YouTube transcripts (paste a URL, it fetches the captions)
Even scrape websites (limited depth, configurable)

When you upload a document, AnythingLLM doesn’t just dump it raw into the vector DB. It:

Chunks the document intelligently (default 1000 tokens per chunk, configurable overlap)
Embeds each chunk with a vector model (we’ll talk about choices in a second)
Stores the vectors in a vector database
Indexes metadata so you can search and filter later

The UI shows you the chunk count, estimated tokens, and lets you tweak chunking before confirming. You can see exactly how many documents are in your workspace and delete specific files without nuking everything.

Embeddings: Local vs. Cloud, Fast vs. Accurate

The embedding model is the translator between “human question” and “relevant documents”. Pick wrong and your RAG retrieval stinks.

AnythingLLM gives you multiple choices:

Built-in (LanceDB default, no external dependency):

Fast, zero setup, works offline
Good enough for basic retrieval
Not as accurate as specialized models but honest about its limits

Ollama embeddings (if you’re running Ollama locally):

nomic-embed-text or mxbai-embed-large — both solid
Zero cost, runs on your GPU
Slower than LanceDB but more accurate
Recommended if you’ve got GPU headroom

OpenAI embeddings:

Overkill and costs money, but undeniably accurate
Defeats the purpose if your goal is privacy
Skip this

Other options: Mistral, Cohere, HuggingFace, LM Studio — all pluggable if you’re running those services.

Honest take: Run nomic-embed-text on Ollama if you’ve got the VRAM. It’s ~750MB, fast enough on a GPU, and gives you meaningful improvement over the default. If you’re on CPU only, LanceDB’s default is fine—it’s not fancy, but it works.

Vector Databases: Beyond the Default

By default, AnythingLLM uses LanceDB, an embedded vector database. It’s fast, requires zero extra setup, and works great for knowledge bases up to a few hundred thousand chunks.

But AnythingLLM also supports plugging in other vector stores:

Qdrant (open-source, standalone server, excellent query performance)
Weaviate (more complex, good for enterprise scale)
Pinecone (cloud-hosted, requires API key)
Milvus (self-hosted, heavyweight)

Unless you’re building something that scales to millions of documents, stick with LanceDB. It’s the right default. If you do want the flexibility of a separate vector DB (for easier backups or multi-instance access), Qdrant is the self-hosted move.

Agents: When Your Knowledge Base Gets Proactive

AnythingLLM has an agent system that lets your knowledge base do more than answer questions. Agents can:

Browse the web and fetch real-time information (useful when your local docs are stale)
Run custom tools (hooks into external APIs, databases, scripts)
Chain reasoning across multiple retrieval steps

This is where AnythingLLM starts looking like NotebookLM: you ask a complex question that requires synthesis, and the agent breaks it down, retrieves from your docs, fetches external context, and builds a coherent answer.

Example: “Summarize our Q3 sales report and compare it to industry benchmarks.” An agent could pull your Q3 report from the workspace, fetch some industry data, and write a comparison.

For most self-hosters, basic RAG (document chat) is enough. But the agent plumbing is there if you want to get fancy later.

Docker Deployment: Up in 20 Minutes

Here’s the docker-compose setup for a fully working AnythingLLM instance pointing at Ollama:

version: '3.8'

services:
  anythingllm:
    image: mintplexlabs/anythingllm:latest
    container_name: anythingllm
    ports:
      - "3001:3001"
    environment:
      STORAGE_DIR: /app/server/storage
      LLM_PROVIDER: ollama
      OLLAMA_BASE_PATH: http://host.docker.internal:11434
      EMBEDDING_ENGINE: ollama
      EMBEDDING_MODEL_PREF: nomic-embed-text
      EMBEDDING_BASE_PATH: http://host.docker.internal:11434
      JWT_SECRET: your-random-secret-here-change-this
      DISABLE_TELEMETRY: "true"
    volumes:
      - anythingllm_storage:/app/server/storage
    networks:
      - llm-network
    restart: unless-stopped

  qdrant:
    image: qdrant/qdrant:latest
    container_name: qdrant
    ports:
      - "6333:6333"
    volumes:
      - qdrant_storage:/qdrant/storage
    networks:
      - llm-network
    restart: unless-stopped
    environment:
      QDRANT_API_KEY: "your-qdrant-key"

volumes:
  anythingllm_storage:
  qdrant_storage:

networks:
  llm-network:
    driver: bridge

Key environment variables:

OLLAMA_BASE_PATH: Points to your Ollama instance. If Ollama is on another machine, use http://192.168.1.100:11434 instead of host.docker.internal
EMBEDDING_MODEL_PREF: The model to use for embeddings (make sure it’s pulled in Ollama: ollama pull nomic-embed-text)
JWT_SECRET: Change this to something random. It’s for session tokens.
DISABLE_TELEMETRY: Mingles with your privacy. Set it to true.

To start:

docker-compose up -d

Wait 30 seconds for startup, then visit http://localhost:3001. You’ll land on a setup wizard. Create an admin account, and you’re in.

Pro tip: If Ollama is on the same machine as Docker, use host.docker.internal:11434 on macOS/Windows or 172.17.0.1:11434 on Linux (or the host’s actual IP if Ollama binds to 0.0.0.0).

Creating Your First Workspace

Once you’re logged in:

Create a workspace (click the workspace selector, “New workspace”)
Give it a name (WorkDocs, PrivateNotes, whatever)
Configure the LLM: Ollama, pick your model (Mistral, Llama 3, whatever you’re running)
Configure embeddings: Ollama + nomic-embed-text (assuming you pulled it)
Pick a vector DB: LanceDB (default, zero setup) or Qdrant if you added it
Save and create

Now you’ve got an empty workspace. Time to feed it documents.

Ingesting Documents: PDFs and Beyond

Click “Upload documents” or “Add files” (the UI varies slightly per version).

Easy path:

Drag and drop a PDF, markdown file, or text doc
AnythingLLM chunks it, shows you the chunk count
Click “Save and ingest”
Wait for embedding to finish (depends on doc size and your embedding model’s speed)

Folder import:

Many versions support pointing AnythingLLM at a folder on disk
It’ll ingest all supported documents in that folder
Great for bulk uploads (a whole library of docs, for example)

Web scraping:

Paste a URL, configure depth, let it crawl and ingest
Useful for documentation sites or long articles

Real-world example: You’ve got 50 PDFs of company policies, engineering docs, and FAQs. Dump them in a folder, point AnythingLLM at it, walk away for 10 minutes. When it finishes, you’ve got a fully searchable, queryable knowledge base.

Chunking Config: The Gotcha That Bites Everyone

Here’s where most people stumble: chunk size and overlap.

The default is usually fine (1000 tokens, ~20% overlap), but here’s what goes wrong:

Chunks too small (250 tokens): Retrieval gets noisy. You pull snippets that are too granular; context is fractured.
Chunks too large (3000+ tokens): You lose specificity. A question about page 3 pulls all 20 pages of context.
No overlap: The LLM misses context that spans chunk boundaries.

Recommendation: Stick with the default. If you’re noticing bad retrieval (questions go unanswered or answers feel disconnected), then tweak:

Reduce chunk size if context feels bloated
Increase overlap if you’re missing boundary-spanning info

Don’t obsess over this on day one. Get it running, ask questions, and iterate.

Chatting with Your Documents

Once documents are ingested, click into the workspace and start chatting.

In the chat:

Type your question
AnythingLLM retrieves relevant chunks from your documents
The LLM sees both the question and the retrieved context
Generates an answer citing your documents

The interface shows you which documents were pulled for context (usually as a sidebar or expandable section). You can see exactly what the model was working with.

Example conversation:

You: “What’s our disaster recovery policy for database failures?”
AnythingLLM: [Retrieves your ops handbook, DR procedures document, incident reports]
Answer: “According to your disaster recovery policy, database failures trigger a 15-minute recovery window using automated snapshots. Your last incident report shows this was tested on 2026-05-15 with a 12-minute actual recovery time.”

That’s RAG working. Your documents are the ground truth, not the model’s training data.

Common Pitfalls and How to Avoid Them

Context bloat: You ask a simple question and get 10 pages of irrelevant context. Usually means:

Chunk size is too big
Embedding model is weak (using default LanceDB instead of nomic-embed-text)
Document corpus is poorly organized (mixing unrelated topics in one workspace)

Embedding model mismatches: You ingest documents with one embedding model, then switch to another. The vectors become useless. Solution: Don’t switch models mid-workspace. Decide upfront, rebuild if you need to change.

No documents actually ingested: The upload UI says “success” but chat finds nothing. Usually means:

File format isn’t supported (try converting unsupported formats to PDF or markdown)
Embedding is still processing (wait longer)
The vector DB failed silently (check logs in docker logs anythingllm)

Ollama not reachable: AnythingLLM can’t talk to Ollama. Solution: Test the connection from the docker container: docker exec anythingllm curl http://host.docker.internal:11434/api/tags (or use your actual Ollama IP if not local).

Open WebUI vs. AnythingLLM: When to Use Each

Use Open WebUI if:

You just want a nice chat interface for Ollama
You’re not focusing on documents
You want maximum flexibility and community mods

Use AnythingLLM if:

You’re building a real knowledge base
Document management and organization matter
You need workspaces, RAG, or agents
Privacy and document ownership are non-negotiable

Honestly, if you’re planning to chat with documents regularly, AnythingLLM is the obvious choice. Open WebUI is great, but it’s not built for this use case. AnythingLLM is.

Security and Privacy Wins

Running AnythingLLM locally means:

No document exfiltration. Your PDFs never touch OpenAI’s servers.
No data mining. Your documents aren’t used to train anyone else’s model.
No API costs. Ollama is free; embedding is free if you use Ollama’s models.
Full audit trail. Everything lives on your hardware. You own the logs.

This is the real payoff of self-hosting. You’re not paying per token or per document. You’re paying in upfront hardware cost and your own labor. That math wins if you’ve got a lot of documents and questions.

The 20-Minute Setup Checklist

Have Ollama running with at least one LLM model and nomic-embed-text pulled
Save the docker-compose.yml above to a folder
Edit OLLAMA_BASE_PATH if Ollama is on a different machine
Run docker-compose up -d
Wait 30 seconds
Visit http://localhost:3001
Create an admin account
Create a workspace
Upload a test document (a PDF, markdown file, whatever)
Ask it a question

Done. You now have a private, self-hosted knowledge base that understands your documents. No cloud vendor involved. No license fees. No data exfiltration.

Next Steps

Once the basics are working:

Organize by workspace. Create separate workspaces for different projects or knowledge domains.
Experiment with embedding models. Try different Ollama embeddings; see what gives you the best retrieval.
Set up agents. If you want proactive behavior, enable web browsing or custom tools.
Back it up. The anythingllm_storage volume holds everything. Backup regularly.

AnythingLLM is still maturing, and the self-hosted version gets updated regularly. Check the docs occasionally for new features.

Final Thoughts

This is what self-hosting should feel like. You wanted to use local LLMs without sending your data to Anthropic, OpenAI, or Google. AnythingLLM is the missing piece. It’s the knowledge base layer that makes local models actually useful for your own documents.

Your 2 AM self—the one who’s been meaning to organize all those PDFs for the past year—will appreciate having a tool that actually works and doesn’t phone home.

Get it running. Ingest your documents. Ask it something. You’ll understand immediately why this matters.