Google Knows Too Much — So Host Your Own Search
Here’s the thing: every search query you fire at Google is a little data point. Stack enough of them up and you’ve handed someone a pretty detailed map of your life — your health questions, your financial anxieties, your 2 AM “is this normal?” spirals. DuckDuckGo is better, but it’s still a third-party you’re trusting. Brave Search is decent. Kagi costs money.
Or — hear me out — you run your own search frontend on a $5 VPS and stop worrying about it.
Two projects dominate this space: SearXNG and Whoogle Search. They solve the same problem from different angles, and depending on what you actually want from a private search setup, one is clearly better for you. Let’s work through both.
What We’re Comparing
SearXNG is an actively maintained fork of the long-dead Searx project. It’s a meta-search engine — it fans your query out to 100+ different backends (Google, Bing, Brave, Mojeek, DuckDuckGo, Wikipedia, GitHub, Reddit, and a lot more you’ll never use), aggregates the results, deduplicates them, and presents a combined view. Python/Flask under the hood, Redis for caching and rate-limit coordination.
Whoogle Search is simpler by design. It’s a pure Google proxy. It strips tracking params, removes ads, kills JavaScript bloat, and gives you clean Google results without handing Google your IP or browser fingerprint. One backend, zero aggregation overhead.
Both are self-hosted, both are actively maintained, both run fine in Docker. The differences are in the details.
SearXNG: The Multi-Engine Approach
Why You’d Want It
If you search for niche technical stuff, Wikipedia articles, GitHub repos, or Reddit threads, SearXNG is genuinely useful. You can configure which engines fire per query type — web searches pull from Brave + Mojeek + DuckDuckGo, image searches hit Google Images + Bing Images, code searches tap GitHub and GitLab. Result diversity is the whole pitch.
It also has a reasonably polished UI, light/dark themes, result categories (General, Images, News, Videos, Maps, Science, Social), and browser autocomplete support out of the box.
Setting It Up
services: searxng: image: searxng/searxng:2024.12.1-1 container_name: searxng restart: unless-stopped ports: - "8080:8080" volumes: - ./searxng:/etc/searxng:rw environment: - SEARXNG_BASE_URL=https://search.yourdomain.com/ - SEARXNG_SECRET_KEY=changeme_use_openssl_rand_hex_32 depends_on: - redis cap_drop: - ALL cap_add: - CHOWN - SETGID - SETUID
redis: image: redis:7.2-alpine container_name: searxng-redis restart: unless-stopped command: redis-server --save "" --appendonly no volumes: - redis-data:/data
volumes: redis-data:First run, generate the config:
mkdir -p searxngdocker run --rm \ -v $(pwd)/searxng:/etc/searxng \ searxng/searxng:2024.12.1-1 \ sh -c "cp /usr/local/searxng/searx/settings.yml /etc/searxng/settings.yml"Then edit searxng/settings.yml to taste. The important bits:
general: instance_name: "My Search" contact_url: false enable_metrics: false
search: safe_search: 0 autocomplete: "duckduckgo" default_lang: "auto" ban_time_on_fail: 5 max_ban_time_on_fail: 120
server: base_url: https://search.yourdomain.com/ port: 8080 bind_address: "0.0.0.0" secret_key: "your_secret_key_here" limiter: true image_proxy: true
ui: static_use_hash: true default_theme: simple theme_args: simple_style: dark
# Enable/disable engines selectivelyengines: - name: google engine: google shortcut: g disabled: false - name: brave engine: brave shortcut: brave disabled: false - name: duckduckgo engine: duckduckgo shortcut: ddg disabled: false - name: bing engine: bing shortcut: b disabled: true # disable what you don't want - name: mojeek engine: mojeek shortcut: moj disabled: false - name: github engine: github shortcut: gh disabled: false - name: reddit engine: reddit shortcut: re disabled: falseFire it up: docker compose -f searxng-compose.yml up -d
The Rate-Limit Reality
Here’s where SearXNG gets annoying. When you fan out to Google and Bing from the same IP every time someone searches, those engines notice. You’ll start hitting CAPTCHAs and getting soft-blocked, especially from Google. Your instance’s exit IP is the problem.
Options:
- Disable Google entirely — lean on Brave, Mojeek, and DDG, which are friendlier to programmatic access
- Configure a proxy chain — SearXNG supports per-engine proxy settings in
settings.yml - Tor sidecar — route specific engines through Tor (adds latency, but anonymizes your exit IP)
engines: - name: google engine: google shortcut: g proxies: all://: - socks5://tor:9050With a Tor container in your Compose stack:
tor: image: peterdavehello/tor-socks-proxy:latest container_name: searxng-tor restart: unless-stoppedHonestly, disabling Google is the easier path. Brave’s results are solid and they don’t block automated queries the same way.
Whoogle: Google Without the Surveillance
Why You’d Want It
If you just want Google results without Google tracking you — that’s it, that’s Whoogle. No multi-engine complexity, no aggregation delays, no engine config to manage. It’s fast because it’s hitting one backend. Setup is dead simple.
The tradeoff: you’re still dependent on Google’s index. If Google decides your query results should be SEO slop, Whoogle faithfully delivers that slop. It doesn’t diversify sources.
Setting It Up
services: whoogle: image: benbusby/whoogle-search:0.9.1 container_name: whoogle restart: unless-stopped ports: - "5000:5000" environment: # Appearance - WHOOGLE_CONFIG_THEME=system - WHOOGLE_CONFIG_SEARCH_LANGUAGE=lang_en - WHOOGLE_CONFIG_COUNTRY=US # Privacy - WHOOGLE_CONFIG_BLOCK= - WHOOGLE_RESULTS_PER_PAGE=10 # Proxy (optional — point at Tor or your VPN SOCKS proxy) # - WHOOGLE_PROXY_USER= # - WHOOGLE_PROXY_PASS= # - WHOOGLE_PROXY_TYPE=socks5 # - WHOOGLE_PROXY_LOC=tor:9050 security_opt: - no-new-privileges:true read_only: true tmpfs: - /config - /var/lib/tor/ - /run/tor/ - /tmpWith an optional Tor sidecar for full anonymization:
services: whoogle: image: benbusby/whoogle-search:0.9.1 container_name: whoogle restart: unless-stopped ports: - "5000:5000" environment: - WHOOGLE_CONFIG_THEME=dark - WHOOGLE_PROXY_TYPE=socks5 - WHOOGLE_PROXY_LOC=tor:9050 depends_on: - tor security_opt: - no-new-privileges:true read_only: true tmpfs: - /config - /var/lib/tor/ - /run/tor/ - /tmp
tor: image: peterdavehello/tor-socks-proxy:latest container_name: whoogle-tor restart: unless-stoppedThat’s it. Seriously. There’s no settings.yml to manage, no engine list, nothing. Set the env vars and you’re done.
Setting Either as Your Default Browser Search Engine
Both support OpenSearch, so you can add them as a browser search engine and set them as default.
For Firefox: go to your instance URL, click the address bar, look for the engine icon, add it. Or manually via about:preferences#search.
For Chromium-based browsers, add a custom search engine in Settings → Search engine → Manage search engines:
- SearXNG:
https://search.yourdomain.com/search?q=%s - Whoogle:
https://whoogle.yourdomain.com/search?q=%s
Set it as default and your address bar queries route through your instance. This is the point where self-hosting search goes from a cool experiment to something you actually use daily.
Putting Them Behind a Reverse Proxy
Don’t expose these directly on a port. Wrap them in Caddy or Traefik.
Caddy config for SearXNG:
search.yourdomain.com { reverse_proxy searxng:8080}Caddy config for Whoogle:
whoogle.yourdomain.com { reverse_proxy whoogle:5000}Caddy handles TLS automatically via Let’s Encrypt. If you’re running Traefik, you know the drill — add the standard labels to the compose service.
The Privacy Honest Talk
Both tools improve your privacy situation. Neither makes you invisible to Google and Bing.
When SearXNG queries Google, Google sees the request — it just sees it coming from your server’s IP, not your laptop. If you’re self-hosting on a VPS with a static IP, Google can still build a profile of “someone at this IP searches for X.” It just can’t tie that to your browser fingerprint, cookies, or Google account.
Adding Tor as a relay changes this. Your queries exit through a Tor exit node, which rotates. Now Google sees searches coming from different IPs with no correlation. The tradeoff is latency — Tor adds 1-3 seconds to every search.
The real privacy win from self-hosting is eliminating the third-party tracker problem. With a shared public instance (there’s a list at searx.space), you’re trusting whoever runs that instance. They see all your queries in plaintext. Self-hosted means only you do.
If you use a shared public SearXNG instance, you’re trading Google knowing your queries for some random VPS operator knowing them. That might be an acceptable tradeoff, or it might not be. Self-hosting sidesteps the question entirely.
Head-to-Head: When to Pick Which
| SearXNG | Whoogle | |
|---|---|---|
| Result sources | 100+ engines aggregated | Google only |
| Result quality | Diverse, sometimes inconsistent | Consistent Google results |
| Speed | Slower (parallel engine requests) | Fast (single backend) |
| Rate-limit risk | High with Google/Bing enabled | Medium (single IP, still Google) |
| Setup complexity | Medium (settings.yml, Redis) | Low (env vars only) |
| Niche searches | Excellent (GitHub, Reddit, Sci-Hub) | Limited to Google index |
| Maintenance | More config to manage | Fire and forget |
| RAM footprint | ~300MB (app + Redis) | ~150MB |
Should You Bother?
Yes, with some caveats.
If you want Google results without handing Google your identity, Whoogle is the right tool. It’s low-maintenance, fast, and the operational burden is basically zero. Set it up once, point your browser at it, forget about it. You’re not going to get better Google results — you’re getting the same results, just anonymized. That’s the whole pitch.
If you want result diversity — different perspectives from Brave, Mojeek, DDG, and specialist engines like GitHub — and you’re willing to manage a slightly more complex setup, SearXNG is worth it. Disable Google and Bing engines to avoid rate-limit headaches, lean on the engines that handle programmatic access gracefully, and you’ll end up with something genuinely more useful than any single search engine.
Either way, self-hosting beats relying on a public shared instance. The whole point of running your own search frontend is that you control the exit IP, you see the query logs (or don’t — configure accordingly), and you’re not dependent on a stranger’s VPS staying online.
The barrier to entry is a VPS, a domain, and about 20 minutes. Your 2 AM “is this mole normal” query can stay between you and your server.
Full example: Clone working Compose configs at github.com/KingPin/sumguy-examples/privacy/searxng-vs-whoogle-private-search