Skip to content
Go back

DIY Perplexity: SearXNG + Local LLM = Private Web Search

By SumGuy 9 min read
DIY Perplexity: SearXNG + Local LLM = Private Web Search

Your Search History Is a Product — Here’s How to Stop That

Perplexity is genuinely useful. Type a question, get a cited answer, move on. The problem? Every query goes to their servers, gets logged, gets used to train models, and eventually shows up in some ad targeting pipeline you never agreed to. That’s the deal.

Here’s the thing: the underlying architecture isn’t magic. It’s a metasearch engine + a web scraper + an LLM doing summarization with inline citations. You can build that yourself in an afternoon with SearXNG and Ollama — and everything stays local except the outbound search queries (which SearXNG aggressively anonymizes anyway).

This is the stack. Let’s get it running.

Full example: Clone the working files at github.com/KingPin/sumguy-examples/llm/local-perplexity-searxng-llm/


The Architecture (In Plain English)

Three moving parts:

  1. SearXNG — A metasearch frontend that fans out your query to Google, Bing, DuckDuckGo, Brave, etc. simultaneously. No account. No tracking. Results get deduplicated and ranked.
  2. A scraper layer — Fetches the top N result pages and extracts readable text (strips ads, nav, garbage). This is your RAG context.
  3. A local LLM — Reads the scraped context, generates a cited answer. Ollama with Mistral or Llama 3 handles this fine on a mid-range GPU (or even CPU if you’re patient).

The open-source projects that wire these together:

Pick one. I’ll walk through Perplexica and Morphic deploys — they cover 90% of use cases.


Why Not Just Use the Bing/Google Search API?

Good question. Short answer: money and terms.

Bing Search API: $3–$7 per 1,000 queries after the free tier. Google Programmable Search: similar pricing, terrible for news. Both have ToS restrictions on automated scraping and reselling results.

SearXNG sidesteps this entirely by acting like a user browser. It rotates user agents, respects rate limits across engines, and distributes load. The tradeoff is that aggressive queries can trigger CAPTCHAs on Google — but with multiple engines configured, you degrade gracefully. Real-world: for a personal instance with a few users, this never becomes an issue.


Step 1 — SearXNG

SearXNG is the foundation. Everything else builds on top of it.

docker-compose.yml
services:
searxng:
image: searxng/searxng:latest
container_name: searxng
ports:
- "8080:8080"
volumes:
- ./searxng:/etc/searxng
environment:
- SEARXNG_BASE_URL=http://localhost:8080
restart: unless-stopped
cap_drop:
- ALL
cap_add:
- CHOWN
- SETGID
- SETUID

First run generates a default config. You want to enable JSON output format — Perplexica and Morphic query SearXNG via its JSON API:

searxng/settings.yml
search:
formats:
- html
- json # required for API consumers
engines:
- name: google
engine: google
shortcut: g
- name: bing
engine: bing
shortcut: b
- name: duckduckgo
engine: duckduckgo
shortcut: ddg
- name: brave
engine: brave
shortcut: br

Start it:

Terminal window
docker compose up -d
curl "http://localhost:8080/search?q=searxng+test&format=json" | python3 -m json.tool | head -30

If you get back JSON with a results array, you’re good.


Perplexica is a full Perplexity-style UI with source cards, follow-up questions, and multiple search modes (web, academic, Reddit, YouTube). It connects to SearXNG for results and to any OpenAI-compatible API for the LLM step.

docker-compose.yml
services:
perplexica-backend:
image: itzcrazykns1337/perplexica-backend:main
container_name: perplexica-backend
ports:
- "3001:3001"
volumes:
- ./perplexica/config.toml:/home/perplexica/config.toml
depends_on:
- searxng
restart: unless-stopped
perplexica-frontend:
image: itzcrazykns1337/perplexica-frontend:main
container_name: perplexica-frontend
ports:
- "3000:3000"
environment:
- NEXT_PUBLIC_API_URL=http://localhost:3001
- NEXT_PUBLIC_WS_URL=ws://localhost:3001
depends_on:
- perplexica-backend
restart: unless-stopped

The config file is where you wire it to your LLM and SearXNG:

perplexica/config.toml
[GENERAL]
PORT = 3001
SIMILARITY_MEASURE = "cosine"
[API_KEYS]
OPENAI = ""
GROQ = ""
[API_ENDPOINTS]
SEARXNG = "http://searxng:8080"
OLLAMA = "http://host.docker.internal:11434"

For Ollama on the same host, host.docker.internal works on Linux with Docker Desktop. On bare Docker, use your host’s LAN IP instead.

Pull a model that’s good at summarization and citation:

Terminal window
ollama pull mistral:7b-instruct
# or if you have the VRAM:
ollama pull llama3.1:8b-instruct

Spin everything up, hit http://localhost:3000, and you’ve got Perplexity at home. Source cards link to the actual pages. Citations are inline. The whole thing runs on your hardware.


Step 2b — Morphic (Better If You Want to Hack on the Frontend)

Morphic is a Next.js app, which means if you know React you can customize the UI freely. The search logic is more transparent — easier to swap in different LLMs or tweak the prompt.

docker-compose.yml
services:
morphic:
image: ghcr.io/miurla/morphic:latest
container_name: morphic
ports:
- "3000:3000"
environment:
- SEARXNG_API_URL=http://searxng:8080
- OLLAMA_BASE_URL=http://host.docker.internal:11434
- OLLAMA_MODEL=llama3.1:8b-instruct
depends_on:
- searxng
restart: unless-stopped

Morphic’s env vars are simpler than Perplexica’s TOML. It supports Ollama natively without shimming through OpenAI-compatible endpoints. The default UI is cleaner too — fewer modes, less visual noise.

One thing Morphic does better: it streams the answer token-by-token with citations appearing inline as they resolve. Perplexica front-loads citations. Neither is objectively better — depends on whether you prefer seeing citations appear as you read or upfront.


Step 3 — Open WebUI (If You’re Already Running It)

If you’ve read the rag-on-a-budget post or the open-webui-tools-functions-pipelines walkthrough, you’re probably already running Open WebUI. Good news: it has a native Web Search tool that does exactly what Perplexica/Morphic do, just integrated into chat.

Go to Admin Panel → Settings → Web Search, toggle it on, set:

Enable it per-chat with the globe icon in the message bar. The model fetches results, reads them, and answers with sources. Latency is higher than Perplexica but you get the full chat UX with memory, tools, and everything else Open WebUI gives you.


The Privacy Story (Actually Good Here)

Let’s be honest about what “private” means in this stack:

Fully local:

Not local:

SearXNG mitigates the outbound part in a few ways: it rotates between configured engines so no single provider sees all your queries, it can use Tor for certain engines, it strips identifying headers, and it fires off requests in parallel without referrer chains. You’re not invisible to search engines, but you’re dramatically harder to profile than if you were querying Google directly from your browser.

If you’re on a Tailscale/WireGuard network and you run SearXNG on a VPS in a different jurisdiction, you can push the privacy envelope further. But for a home lab use case, the default setup is a massive improvement over logged accounts.


Latency, Accuracy, and Recency Tradeoffs

Real talk on where this stack falls short vs Perplexity paid tier:

Latency: Expect 10–30 seconds per query. SearXNG fires off 4–6 engines simultaneously (~1–3s), scraping the top 5 results adds another 2–5s, and LLM inference on 7–8B models locally takes 5–15s depending on your hardware. Perplexity returns in 3–5 seconds on a good day. If you’re on an RTX 3080 or better, you close this gap significantly.

Accuracy: Comparable for factual queries with stable answers. The LLM is reading the same pages Perplexity reads. Where it falls down is synthesis quality — bigger commercial models (GPT-4o, Claude Sonnet) do a better job of reconciling conflicting sources. Mistral 7B and Llama 3.1 8B are respectable but not magical.

Recency: SearXNG + scraping is inherently real-time — it’s hitting live pages. This is actually better than some commercial AI search products that cache results. Breaking news within the last hour shows up if it’s indexed by the engines you’ve configured.

Citation quality: Perplexica and Morphic both cite sources, but they cite at the document level. They don’t do passage-level attribution (bold inline highlights pointing to exact sentences). If that matters to you, you’re building something more custom.


Hardware Minimums Worth Knowing

ModelMin VRAMTokens/sec (approx)Query time
Mistral 7B Q46 GB25–4015–25s
Llama 3.1 8B Q46 GB20–3515–30s
Llama 3.1 8B fp1616 GB40–608–15s
Llama 3.1 70B Q440 GB10–1530–60s

CPU-only on Mistral 7B Q4 is ~3–5 tokens/sec. Usable for low-volume personal use. Not great.

The sweet spot for home lab use is a 7–8B Q4 model on an 8 GB GPU. You get acceptable latency and good answer quality without needing to remortgage your house for a 3090.


Putting It All Together

Full compose stack in one file:

docker-compose.yml
services:
searxng:
image: searxng/searxng:latest
container_name: searxng
ports:
- "8080:8080"
volumes:
- ./searxng:/etc/searxng
environment:
- SEARXNG_BASE_URL=http://localhost:8080
restart: unless-stopped
perplexica-backend:
image: itzcrazykns1337/perplexica-backend:main
container_name: perplexica-backend
ports:
- "3001:3001"
volumes:
- ./perplexica/config.toml:/home/perplexica/config.toml
depends_on:
- searxng
restart: unless-stopped
perplexica-frontend:
image: itzcrazykns1337/perplexica-frontend:main
container_name: perplexica-frontend
ports:
- "3000:3000"
environment:
- NEXT_PUBLIC_API_URL=http://localhost:3001
- NEXT_PUBLIC_WS_URL=ws://localhost:3001
depends_on:
- perplexica-backend
restart: unless-stopped
Terminal window
# Clone Perplexica config
mkdir -p perplexica searxng
# Generate SearXNG secret key
openssl rand -hex 32
# Paste into searxng/settings.yml server.secret_key
# Then:
docker compose up -d
# Pull your model
ollama pull mistral:7b-instruct
# Hit it
open http://localhost:3000

First query might be slow as models warm up and Docker layers settle. After that it’s consistent. Your 2 AM self researching weird kernel errors will appreciate not having those queries logged anywhere.


If this got your RAG brain going, check out rag-on-a-budget for building document RAG on commodity hardware, and open-webui-tools-functions-pipelines for extending Open WebUI with custom tools — the web search pipeline is one of the cleaner examples of how the function system works in practice.

The privacy angle here isn’t paranoia — it’s just good hygiene. You’re already hosting your own services. Your search queries shouldn’t be someone else’s training data.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Next Post
iperf3 + nload: Network Diagnosis

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts