DIY Perplexity: SearXNG + Local LLM = Private Web Search

Your Search History Is a Product: Here’s How to Stop That

Perplexity is genuinely useful. Type a question, get a cited answer, move on. The problem? Every query goes to their servers, gets logged, gets used to train models, and eventually shows up in some ad targeting pipeline you never agreed to. That’s the deal.

The underlying architecture isn’t magic. It’s a metasearch engine + a web scraper + an LLM doing summarization with inline citations. You can build that yourself in an afternoon with SearXNG and Ollama. Everything stays local except the outbound search queries (which SearXNG aggressively anonymizes anyway).

This is the stack. Let’s get it running.

Full example: Clone the working files at github.com/KingPin/sumguy-examples/llm/local-perplexity-searxng-llm/

The Architecture (In Plain English)

Three moving parts:

SearXNG: A metasearch frontend that fans out your query to Google, Bing, DuckDuckGo, Brave, etc. simultaneously. No account. No tracking. Results get deduplicated and ranked.
A scraper layer: Fetches the top N result pages and extracts readable text (strips ads, nav, garbage). This is your RAG context.
A local LLM: Reads the scraped context, generates a cited answer. Ollama with Mistral or Llama 3 handles this fine on a mid-range GPU (or even CPU if you’re patient).

The open-source projects that wire these together:

Perplexica: The most production-ready. Has a proper UI, source cards, follow-up questions. Think Perplexity clone, self-hosted.
Morphic: Next.js-based, cleaner frontend, slightly easier to modify if you want to customize the prompt engineering.
Open WebUI: You’re probably already running this. It has a Web Search tool pipeline that does the same thing with less setup friction if you’re already Ollama-native.

Pick one. I’ll walk through Perplexica and Morphic deploys. They cover 90% of use cases.

Why Not Just Use the Bing/Google Search API?

Good question. Short answer: money and terms.

Bing Search API: $3 to $7 per 1,000 queries after the free tier. Google Programmable Search: similar pricing, terrible for news. Both have ToS restrictions on automated scraping and reselling results.

SearXNG sidesteps this entirely by acting like a user browser. It rotates user agents, respects rate limits across engines, and distributes load. The tradeoff is that aggressive queries can trigger CAPTCHAs on Google, but with multiple engines configured, you degrade gracefully. Real-world: for a personal instance with a few users, this never becomes an issue.

Step 1: SearXNG

SearXNG is the foundation. Everything else builds on top of it.

services:
  searxng:
    image: searxng/searxng:latest
    container_name: searxng
    ports:
      - "8080:8080"
    volumes:
      - ./searxng:/etc/searxng
    environment:
      - SEARXNG_BASE_URL=http://localhost:8080
    restart: unless-stopped
    cap_drop:
      - ALL
    cap_add:
      - CHOWN
      - SETGID
      - SETUID

First run generates a default config. You want to enable JSON output format: Perplexica and Morphic query SearXNG via its JSON API:

search:
  formats:
    - html
    - json          # required for API consumers

engines:
  - name: google
    engine: google
    shortcut: g
  - name: bing
    engine: bing
    shortcut: b
  - name: duckduckgo
    engine: duckduckgo
    shortcut: ddg
  - name: brave
    engine: brave
    shortcut: br

Start it:

docker compose up -d
curl "http://localhost:8080/search?q=searxng+test&format=json" | python3 -m json.tool | head -30

If you get back JSON with a results array, you’re good.

Step 2a: Perplexica (Recommended for Most People)

Perplexica is a full Perplexity-style UI with source cards, follow-up questions, and multiple search modes (web, academic, Reddit, YouTube). It connects to SearXNG for results and to any OpenAI-compatible API for the LLM step.

Heads up: Perplexica used to ship as separate perplexica-backend and perplexica-frontend images with a pile of NEXT_PUBLIC_API_URL/NEXT_PUBLIC_WS_URL env vars to glue them together. That split is gone, it’s now a single image (itzcrazykns1337/perplexica) that runs the Next.js app and API in one container, and newer builds even bundle their own SearXNG internally. We’re pointing it at the standalone SearXNG from Step 1 so Morphic and Open WebUI can share the same instance.

services:
  perplexica:
    image: itzcrazykns1337/perplexica:latest
    container_name: perplexica
    ports:
      - "3000:3000"
    volumes:
      - ./perplexica/config.toml:/home/perplexica/config.toml
    depends_on:
      - searxng
    restart: unless-stopped

The config file is where you wire it to your LLM and SearXNG:

[GENERAL]
PORT = 3001
SIMILARITY_MEASURE = "cosine"

[API_KEYS]
OPENAI = ""
GROQ = ""

[API_ENDPOINTS]
SEARXNG = "http://searxng:8080"
OLLAMA = "http://host.docker.internal:11434"

For Ollama on the same host, host.docker.internal works on Linux with Docker Desktop. On bare Docker, use your host’s LAN IP instead.

Pull a model that’s good at summarization and citation:

ollama pull mistral:7b-instruct
# or if you have the VRAM:
ollama pull llama3.1:8b-instruct

Spin everything up, hit http://localhost:3000, and you’ve got Perplexity at home. Source cards link to the actual pages. Citations are inline. The whole thing runs on your hardware.

Step 2b: Morphic (Better If You Want to Hack on the Frontend)

Morphic is a Next.js app, which means if you know React you can customize the UI freely. The search logic is more transparent: easier to swap in different LLMs or tweak the prompt.

services:
  morphic:
    image: ghcr.io/miurla/morphic:latest
    container_name: morphic
    ports:
      - "3000:3000"
    environment:
      - SEARXNG_API_URL=http://searxng:8080
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
      - OLLAMA_MODEL=llama3.1:8b-instruct
    depends_on:
      - searxng
    restart: unless-stopped

Morphic’s env vars are simpler than Perplexica’s TOML. It supports Ollama natively without shimming through OpenAI-compatible endpoints. The default UI is cleaner too: fewer modes, less visual noise.

One thing Morphic does better: it streams the answer token-by-token with citations appearing inline as they resolve. Perplexica front-loads citations. Neither is objectively better: depends on whether you prefer seeing citations appear as you read or upfront.

Step 3: Open WebUI (If You’re Already Running It)

If you’ve read the rag-on-a-budget post or the open-webui-tools-functions-pipelines walkthrough, you’re probably already running Open WebUI. Good news: it has a native Web Search tool that does exactly what Perplexica/Morphic do, just integrated into chat.

Go to Admin Panel → Settings → Web Search, toggle it on, set:

Web Search Engine: SearXNG
SearXNG Query URL: http://searxng:8080/search?q=<query>&format=json
Result Count: 5 to 10 (more = longer context = slower)

Enable it per-chat with the globe icon in the message bar. The model fetches results, reads them, and answers with sources. Latency is higher than Perplexica but you get the full chat UX with memory, tools, and everything else Open WebUI gives you.

The Privacy Story (Actually Good Here)

Let’s be honest about what “private” means in this stack:

Fully local:

Your query text after it hits SearXNG
All scraping of result pages
LLM inference
Conversation history

Not local:

The outbound search queries to Google/Bing/DDG: these leave your network

SearXNG mitigates the outbound part in a few ways: it rotates between configured engines so no single provider sees all your queries, it can use Tor for certain engines, it strips identifying headers, and it fires off requests in parallel without referrer chains. You’re not invisible to search engines, but you’re dramatically harder to profile than if you were querying Google directly from your browser.

If you’re on a Tailscale/WireGuard network and you run SearXNG on a VPS in a different jurisdiction, you can push the privacy envelope further. But for a home lab use case, the default setup is a massive improvement over logged accounts.

Latency, Accuracy, and Recency Tradeoffs

Real talk on where this stack falls short vs Perplexity paid tier:

Latency: Expect 10 to 30 seconds per query. SearXNG fires off 4 to 6 engines simultaneously (~1 to 3s), scraping the top 5 results adds another 2 to 5s, and LLM inference on 7 to 8B models locally takes 5 to 15s depending on your hardware. Perplexity returns in 3 to 5 seconds on a good day. If you’re on an RTX 3080 or better, you close this gap significantly.

Accuracy: Comparable for factual queries with stable answers. The LLM is reading the same pages Perplexity reads. Where it falls down is synthesis quality: bigger commercial models (GPT-4o, Claude Sonnet) do a better job of reconciling conflicting sources. Mistral 7B and Llama 3.1 8B are respectable but not magical.

Recency: SearXNG + scraping is inherently real-time: it’s hitting live pages. This is actually better than some commercial AI search products that cache results. Breaking news within the last hour shows up if it’s indexed by the engines you’ve configured.

Citation quality: Perplexica and Morphic both cite sources, but they cite at the document level. They don’t do passage-level attribution (bold inline highlights pointing to exact sentences). If that matters to you, you’re building something more custom.

Hardware Minimums Worth Knowing

Model	Min VRAM	Tokens/sec (approx)	Query time
Mistral 7B Q4	6 GB	25 to 40	15 to 25s
Llama 3.1 8B Q4	6 GB	20 to 35	15 to 30s
Llama 3.1 8B fp16	16 GB	40 to 60	8 to 15s
Llama 3.1 70B Q4	40 GB	10 to 15	30 to 60s

CPU-only on Mistral 7B Q4 is ~3 to 5 tokens/sec. Usable for low-volume personal use. Not great.

The sweet spot for home lab use is a 7 to 8B Q4 model on an 8 GB GPU. You get acceptable latency and good answer quality without needing to remortgage your house for a 3090.

Putting It All Together

Full compose stack in one file:

services:
  searxng:
    image: searxng/searxng:latest
    container_name: searxng
    ports:
      - "8080:8080"
    volumes:
      - ./searxng:/etc/searxng
    environment:
      - SEARXNG_BASE_URL=http://localhost:8080
    restart: unless-stopped

  perplexica:
    image: itzcrazykns1337/perplexica:latest
    container_name: perplexica
    ports:
      - "3000:3000"
    volumes:
      - ./perplexica/config.toml:/home/perplexica/config.toml
    depends_on:
      - searxng
    restart: unless-stopped

# Clone Perplexica config
mkdir -p perplexica searxng

# Generate SearXNG secret key
openssl rand -hex 32

# Paste into searxng/settings.yml server.secret_key
# Then:
docker compose up -d

# Pull your model
ollama pull mistral:7b-instruct

# Hit it
open http://localhost:3000

First query might be slow as models warm up and Docker layers settle. After that it’s consistent. Your 2 AM self researching weird kernel errors will appreciate not having those queries logged anywhere.

If this got your RAG brain going, check out rag-on-a-budget for building document RAG on commodity hardware, and open-webui-tools-functions-pipelines for extending Open WebUI with custom tools: the web search pipeline is one of the cleaner examples of how the function system works in practice.

The privacy angle here isn’t paranoia: it’s just good hygiene. You’re already hosting your own services. Your search queries shouldn’t be someone else’s training data.

DIY Perplexity: SearXNG + Local LLM = Private Web Search

Your Search History Is a Product: Here’s How to Stop That

The Architecture (In Plain English)

Why Not Just Use the Bing/Google Search API?

Step 1: SearXNG

Step 2a: Perplexica (Recommended for Most People)

Step 2b: Morphic (Better If You Want to Hack on the Frontend)

Step 3: Open WebUI (If You’re Already Running It)

The Privacy Story (Actually Good Here)

Latency, Accuracy, and Recency Tradeoffs

Hardware Minimums Worth Knowing

Putting It All Together

Responses from around the web

Discussion

Related Posts

Claude Code + SearXNG: Private Web Search

KV Cache Quantization: Free LLM Context, Almost

Mixture of Experts (MoE) for Self-Hosters, Demystified

Speculative Decoding: Faster LLMs With a Tiny Sidekick

DIY Perplexity: SearXNG + Local LLM = Private Web Search

Your Search History Is a Product: Here’s How to Stop That

The Architecture (In Plain English)

Why Not Just Use the Bing/Google Search API?

Step 1: SearXNG

Step 2a: Perplexica (Recommended for Most People)

Step 2b: Morphic (Better If You Want to Hack on the Frontend)

Step 3: Open WebUI (If You’re Already Running It)

The Privacy Story (Actually Good Here)

Latency, Accuracy, and Recency Tradeoffs

Hardware Minimums Worth Knowing

Putting It All Together

Related Posts

Related Reading

Responses from around the web

Discussion

Related Posts

Claude Code + SearXNG: Private Web Search

KV Cache Quantization: Free LLM Context, Almost

Mixture of Experts (MoE) for Self-Hosters, Demystified

Speculative Decoding: Faster LLMs With a Tiny Sidekick